October 8, 2024
OpenAI’s Python API is quickly becoming one of the most-downloaded Python packages. With…
In language models, the raw output is often just the beginning. While these outputs provide valuable insights, they often need to be structured, formatted, or parsed to be useful in real-world applications. Enter LangChain’s output parsers — a powerful toolset to transform raw text into structured, actionable data.
Whether you want to convert text into JSON, Python objects, or even database rows, LangChain has got you covered. This guide delves deep into the world of output parsing in LangChain, exploring its significance, applications, and the various parsers available. From the List Parser to the DateTime Parser and the StructuredOutputParser, we’ll walk you through the nuances of each, ensuring you have the knowledge and tools to make the most of your language model outputs.
Dive in and discover the art of parsing in LangChain!
Depending on the downstream uses, raw text from a language model might not be needed.
Output parsers are classes in Langchain that help structure the text responses from language models into more useful formats. Output parsers allow you to convert the text into JSON, Python data classes, database rows, and more.
Output parsers have two primary uses:
1) Convert unstructured text into structured data. For example, parsing text into a JSON or Python object.
2) Inject instructions into prompts to tell language models how to format their responses. The parser can provide a get_format_instructions()
method that returns text for the prompt.
You should use output parsers when:
• You want to convert the text response into structured data like JSON, list, or other custom Python objects.
• You want the language model to respond in a custom format your application defines. The parser can provide formatting instructions.
LangChain offers several types of output parsers.
In this notebook, we’ll focus on just a few:
Want to learn how to build modern software with LLMs using the newest tools and techniques in the field? Check out this free LLMOps course from industry expert Elvis Saravia of DAIR.AI.
This output parser can be used to return a list of comma-separated items.
You would use the ListOutputParser when you want the LLM to return a simple list of items in its response.
For example: “apples, bananas, oranges” -> [“apples”, “bananas”, “oranges”]
The parser handles splitting up the comma-separated string into a clean Python list.
Any time you want the LLM to return a list of items, the ListOutputParser is useful.
Some examples are asking for movie recommendations, retrieving a list of related search terms, or getting a recipe’s ingredients list.
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.output_parsers.list import ListOutputParser
Example response without parsing output:
llm = OpenAI()
prompt = PromptTemplate(
template="List 3 {things}",
input_variables=["things"])
llm.predict(text=prompt.format(things="sports that don't use balls"))
1. Swimming
2. Archery
3. Running
Let’s instantiate the parser and look at the format instructions:
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
print(format_instructions)
Your response should be a list of comma separated values, eg: `foo, bar, baz`
Now let’s see how to use the parsers instructions in the prompt:
prompt = PromptTemplate(
template="List 3 {things}.\n{format_instructions}",
input_variables=["things"],
partial_variables={"format_instructions": format_instructions})
output = llm.predict(text=prompt.format(things="sports that don't use balls"))
print(output)
Skiing, Swimming, Archery
The output from the LLM is just a string, as expected:
type(output)
str
And finally, we can parse the output to a list:
output_parser.parse(output)
['Skiing', 'Swimming', 'Archery']
type(output_parser.parse(output))
list
The DatetimeOutputParser
is a built-in parser that parses a string containing a date, time, or datetime into a Python datetime object.
You would use the DatetimeOutputParser
when you want the LLM to return a date, time, or datetime in its response that you can then use for date calculations, formatting, etc in Python.
Anytime you prompt the LLM to return a date, time, or datetime string, the DatetimeOutputParser is useful to parse that into a proper datetime object.
from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI
llm = OpenAI()
output_parser = DatetimeOutputParser() print(output_parser.get_format_instructions())
Write a datetime string that matches the following pattern: "%Y-%m-%dT%H:%M:%S.%fZ".
Examples: 1132-06-09T00:45:21.019257Z, 1187-12-04T11:36:39.086472Z, 302-06-14T05:02:44.486807Z
template = """Answer the users question:
{question}
{format_instructions}"""
prompt = PromptTemplate.from_template(
template,
partial_variables={"format_instructions": output_parser.get_format_instructions()},
)
output = llm.predict(text = prompt.format(question="When was Back to the Future released?"))
print(output)
1985-07-03T00:00:00.000000Z
output_parser.parse(output)
datetime.datetime(1985, 7, 3, 0, 0)
The StructuredOutputParser
is an output parser that allows parsing raw text from an LLM into a Python dictionary or other object based on a provided schema.
It is used when you want to parse an LLM’s response into a structured format like a dict, or JSON.
The StructuredOutputParser
allows you to define a custom schema that matches the expected structure of the LLM’s response.
You would use the StructuredOutputParser when:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
llm = OpenAI()
chat_model = ChatOpenAI()
response_schemas = [
ResponseSchema(name="answer", description="answer to the user's question"),
ResponseSchema(name="fact", description="an interesting fact about the answer the user's question")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()
print(format_instructions)
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":
```json
{
"answer": string // answer to the user's question
"fact": string // an interesting fact about the answer the user's question
}
```
prompt = PromptTemplate(
template="answer the users question as best as possible.\n{format_instructions}\n{question}",
input_variables=["question"],
partial_variables={"format_instructions": format_instructions}
)
_input = prompt.format_prompt(question="what's the capital of Manitoba?")
output = chat_model(_input.to_messages())
output_parser.parse(output.content)
{'answer': 'The capital of Manitoba is Winnipeg.',
'fact': 'Winnipeg is the seventh-largest city in Canada.'}
The world of language models is vast and intricate, but with tools like LangChain’s output parsers, we can harness their power in more structured and meaningful ways.
As we’ve explored, these parsers enhance the usability of raw outputs and pave the way for more advanced applications and integrations. Whether you aim to convert simple lists, extract precise datetime information, or structure complex responses, LangChain offers a tailored solution. As language models continue to evolve and find their place in diverse sectors, having the ability to parse and structure their outputs will remain invaluable. With LangChain by our side, we’re well-equipped to navigate this journey, ensuring that we extract the maximum value from our models while maintaining clarity and precision.
Happy parsing!