skip to Main Content

Comet is now available natively within AWS SageMaker!

Learn More

Mastering Output Parsing in LangChain

Transforming Raw Language Model Responses into Structured Insights

Photo by Victor Barrios on Unsplash

In language models, the raw output is often just the beginning. While these outputs provide valuable insights, they often need to be structured, formatted, or parsed to be useful in real-world applications. Enter LangChain’s output parsers — a powerful toolset to transform raw text into structured, actionable data.

Whether you want to convert text into JSON, Python objects, or even database rows, LangChain has got you covered. This guide delves deep into the world of output parsing in LangChain, exploring its significance, applications, and the various parsers available. From the List Parser to the DateTime Parser and the StructuredOutputParser, we’ll walk you through the nuances of each, ensuring you have the knowledge and tools to make the most of your language model outputs.

Dive in and discover the art of parsing in LangChain!

What are output parsers?

Depending on the downstream uses, raw text from a language model might not be needed.

Output parsers are classes in Langchain that help structure the text responses from language models into more useful formats. Output parsers allow you to convert the text into JSON, Python data classes, database rows, and more.

What are they used for?

Output parsers have two primary uses:

1) Convert unstructured text into structured data. For example, parsing text into a JSON or Python object.

2) Inject instructions into prompts to tell language models how to format their responses. The parser can provide a get_format_instructions() method that returns text for the prompt.

When should I use them?

You should use output parsers when:

• You want to convert the text response into structured data like JSON, list, or other custom Python objects.

• You want the language model to respond in a custom format your application defines. The parser can provide formatting instructions.

  • You want to validate or clean up the language model’s response before using it.

Types of output parsers in LangChain

LangChain offers several types of output parsers.

In this notebook, we’ll focus on just a few:

  • List parser — Parses a comma-separated list into a Python list.
  • DateTime parser — Parses a datetime string into a Python datetime object.
  • Structured output parser — Parses into a dict based on a provided schema. Useful for text-only custom schemas.

Want to learn how to build modern software with LLMs using the newest tools and techniques in the field? Check out this free LLMOps course from industry expert Elvis Saravia of DAIR.AI.


List Parser

This output parser can be used to return a list of comma-separated items.

What do I use it for?

You would use the ListOutputParser when you want the LLM to return a simple list of items in its response.

For example: “apples, bananas, oranges” -> [“apples”, “bananas”, “oranges”]

The parser handles splitting up the comma-separated string into a clean Python list.

When would I use it?

Any time you want the LLM to return a list of items, the ListOutputParser is useful.

Some examples are asking for movie recommendations, retrieving a list of related search terms, or getting a recipe’s ingredients list.

Code example:

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.output_parsers.list import ListOutputParser

Example response without parsing output:

llm = OpenAI()

prompt = PromptTemplate(
    template="List 3 {things}",
    input_variables=["things"])

llm.predict(text=prompt.format(things="sports that don't use balls"))
1. Swimming
2. Archery
3. Running

Let’s instantiate the parser and look at the format instructions:

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()

print(format_instructions)
Your response should be a list of comma separated values, eg: `foo, bar, baz`

Now let’s see how to use the parsers instructions in the prompt:

prompt = PromptTemplate(
    template="List 3 {things}.\n{format_instructions}",
    input_variables=["things"],
    partial_variables={"format_instructions": format_instructions})

output = llm.predict(text=prompt.format(things="sports that don't use balls"))

print(output)
Skiing, Swimming, Archery

The output from the LLM is just a string, as expected:

type(output)
str

And finally, we can parse the output to a list:

output_parser.parse(output)
['Skiing', 'Swimming', 'Archery']
type(output_parser.parse(output))
list

DateTime Parser

What is it?

The DatetimeOutputParser is a built-in parser that parses a string containing a date, time, or datetime into a Python datetime object.

What do I use it for?

You would use the DatetimeOutputParser when you want the LLM to return a date, time, or datetime in its response that you can then use for date calculations, formatting, etc in Python.

When should I use it?

Anytime you prompt the LLM to return a date, time, or datetime string, the DatetimeOutputParser is useful to parse that into a proper datetime object.

Code example:

from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI
llm = OpenAI()
output_parser = DatetimeOutputParser()
print(output_parser.get_format_instructions())
Write a datetime string that matches the following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". 
Examples: 1132-06-09T00:45:21.019257Z, 1187-12-04T11:36:39.086472Z, 302-06-14T05:02:44.486807Z
template = """Answer the users question:

{question}

{format_instructions}"""
prompt = PromptTemplate.from_template(
    template,
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

output = llm.predict(text = prompt.format(question="When was Back to the Future released?"))
print(output)
1985-07-03T00:00:00.000000Z
output_parser.parse(output)
datetime.datetime(1985, 7, 3, 0, 0)

StructuredOutputParser

What is it?

The StructuredOutputParser is an output parser that allows parsing raw text from an LLM into a Python dictionary or other object based on a provided schema.

What is it used for?

It is used when you want to parse an LLM’s response into a structured format like a dict, or JSON.

The StructuredOutputParser allows you to define a custom schema that matches the expected structure of the LLM’s response.

When would I use it?

You would use the StructuredOutputParser when:

  • The LLM’s response contains multiple fields/values you want to extract
  • The fields have predictable names you can define in a schema
  • You want the output parsed into a dict rather than raw text
  • The built-in parsers don’t handle the structure you need

Code example:

from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

llm = OpenAI()
chat_model = ChatOpenAI()

response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="fact", description="an interesting fact about the answer the user's question")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()

print(format_instructions)
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
 "answer": string  // answer to the user's question
 "fact": string  // an interesting fact about the answer the user's question
}
```
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

_input = prompt.format_prompt(question="what's the capital of Manitoba?")
output = chat_model(_input.to_messages())

output_parser.parse(output.content)
{'answer': 'The capital of Manitoba is Winnipeg.',
 'fact': 'Winnipeg is the seventh-largest city in Canada.'}

Concluding Thoughts on Parsing with LangChain

The world of language models is vast and intricate, but with tools like LangChain’s output parsers, we can harness their power in more structured and meaningful ways.

As we’ve explored, these parsers enhance the usability of raw outputs and pave the way for more advanced applications and integrations. Whether you aim to convert simple lists, extract precise datetime information, or structure complex responses, LangChain offers a tailored solution. As language models continue to evolve and find their place in diverse sectors, having the ability to parse and structure their outputs will remain invaluable. With LangChain by our side, we’re well-equipped to navigate this journey, ensuring that we extract the maximum value from our models while maintaining clarity and precision.

Happy parsing!


Harpreet Sahota

Back To Top