October 8, 2024
OpenAI’s Python API is quickly becoming one of the most-downloaded Python packages. With…
Traditional “Action Agents” followed a framework where user input was received, the agent decided on a tool to use, and this process was repeated until the agent responded to the user.
However, the need for a more advanced system arose as user objectives became more intricate and developers began to rely more on agents.
This led to developing “Plan-and-Execute” agents that separate planning from execution, allowing for more focused and reliable operations.
Plan-and-execute agents are a new breed of agents designed to address the limitations of their traditional counterparts.
They accomplish objectives by planning what to do and executing the sub-tasks.
Drawing inspiration from the BabyAGI concept and the “Plan-and-Solve” paper, these agents represent a significant leap in agent technology.
The core of this agent framework consists of a planner and an executor.
The planner, typically a language model, plans the steps and deals with ambiguities.
Conversely, the executor is an Action Agent that takes the high-level objective from the planner and determines the tools to achieve it.
Planning: Typically done by a Language Model (LLM), this phase involves mapping out the steps required to achieve the objective. The LLM’s reasoning ability is harnessed to plan and deal with ambiguities or edge cases.
Execution: A separate agent, equipped with the necessary tools, takes over in this phase. It receives the high-level objectives from the planner and determines the tools and methods to achieve them.
This separation allows for more specialized attention to planning and execution, potentially leading to better results.
Want to learn how to build modern software with LLMs using the newest tools and techniques in the field? Check out this free LLMOps course from industry expert Elvis Saravia of DAIR.AI!
They are particularly effective when the objective requires complex planning and coordination of multiple sub-tasks.
These agents can handle tasks that involve multiple steps and dependencies between them.
To determine if you should use plan-and-execute agents, consider the complexity of your objective and the need for planning and coordination.
If your objective can be achieved through a simple sequence of steps without much planning, other types of agents may be more suitable.
Let’s get some preliminaries out of the way…
%%capture
!pip install langchain langchain_experimental openai duckduckgo-search youtube_search wikipedia
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key:")
from langchain.chat_models import ChatOpenAI
from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner
from langchain.llms import OpenAI
from langchain.tools import DuckDuckGoSearchRun
from langchain.agents.tools import Tool
from langchain import LLMMathChain
Start by defining the tools the agent will use to execute its sub-tasks.
These tools can be functions or APIs that perform specific actions.
For example, you can define a tool for searching the internet or performing calculations.
llm = ChatOpenAI(model="gpt-4-1106-preview",temperature=0)
llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)
search = DuckDuckGoSearchRun()
search_tool = Tool(
name="Search",
func=search.run,
description="Useful for answering questions about up to the minute and current events"
)
calcuator_tool = Tool(
name="Calculator",
func=llm_math_chain.run,
description="Useful for when you need to do math"
)
agent_tools = [search_tool, calcuator_tool]
This method takes in the intermediate steps taken by the agent and user inputs as arguments.
It should analyze the current state and decide what action or tool to use next.
The plan method should return a list of AgentAction objects specifying the tools to use.
model = ChatOpenAI(model="gpt-4",temperature=0)
planner = load_chat_planner(model)
planner.llm_chain.prompt.messages[0].content
Let's first understand the problem and devise a plan to solve the problem. Please output the plan starting with the header 'Plan:' and then followed by a numbered list of steps. Please make the plan the minimum number of steps required to accurately complete the task. If the task is a question, the final step should almost always be 'Given the above steps taken, please respond to the users original question'. At the end of your plan, say '<END_OF_PLAN>'
Finally, create an instance of the agent executor class, passing in the agent and tools as arguments.
The agent executor handles the execution of the agent’s actions and tools.
executor = load_agent_executor(model, agent_tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)
print(executor.chain.agent.llm_chain.prompt.messages[0].prompt.template)
Respond to the human as helpfully and accurately as possible. You have access to the following tools:
Search: Useful for answering questions about current events, args: {{'tool_input': {{'type': 'string'}}}}
Calculator: Useful for when you need to do math, args: {{'tool_input': {{'type': 'string'}}}}
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or Search, Calculator
Provide only ONE action per $JSON_BLOB, as shown:
```
{{
"action": $TOOL_NAME,
"action_input": $INPUT
}}
```
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
"action": "Final Answer",
"action_input": "Final response to human"
}}
```
Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.
Thought:
agent.run("Who is the current Premier of Manitoba? What is their current age raised to the 0.43 power?")
Response: The current Premier of Manitoba, Wab Kinew, is 41 years old as of 2023.
> Entering new AgentExecutor chain...
Thought: The current age of the Premier of Manitoba is 41 years old. The task is to calculate the age of the Premier raised to the 0.43 power. I can use the Calculator tool to perform this calculation.
Action:
```
{
"action": "Calculator",
"action_input": "41^0.43"
}
```
> Entering new LLMMathChain chain...
41^0.43```text
41**0.43
```
...numexpr.evaluate("41**0.43")...
Answer: 4.9373857399466665
> Finished chain.
Observation: Answer: 4.9373857399466665
Thought:The calculator tool has provided the result of 41 raised to the power of 0.43, which is approximately 4.94. I can now provide this as the final answer.
Action:
```
{
"action": "Final Answer",
"action_input": "The age of the Premier raised to the 0.43 power is approximately 4.94."
}
```
> Finished chain.
*****
Step: Calculate the age of the Premier raised to the 0.43 power.
Response: The age of the Premier raised to the 0.43 power is approximately 4.94.
> Entering new AgentExecutor chain...
Thought: The user's original question isn't provided in the prompt, but based on the steps taken, it seems like they were asking for the age of the Premier of Manitoba raised to the 0.43 power. The answer to that has been calculated as approximately 4.94.
Action:
```
{
"action": "Final Answer",
"action_input": "The age of the Premier of Manitoba, Wab Kinew, raised to the 0.43 power is approximately 4.94."
}
```
> Finished chain.
*****
Step: Given the above steps taken, please respond to the users original question.
Response: The age of the Premier of Manitoba, Wab Kinew, raised to the 0.43 power is approximately 4.94.
> Finished chain.
The age of the Premier of Manitoba, Wab Kinew, raised to the 0.43 power is approximately 4.94.
To conclude, throughout this blog, I’ve taken you through the transformative journey from traditional “Action Agents” to the more advanced “Plan-and-Execute” agents. This new generation of agents, which separates planning from execution, is a game-changer in managing complex tasks more efficiently and reliably.
I’ve broken down the components that make up these agents: a planner, usually a language model, which outlines the steps and deals with uncertainties, and an executor, an Action Agent that implements the high-level plan using various tools. This division of labor between planning and execution is key to their enhanced performance.
Plan-and-Execute agents’ major advantage over traditional agents is their ability to handle intricate tasks involving multiple steps and dependencies. They are not just a step up in complexity but also in reliability and scalability, crucial factors as we increasingly rely on AI agents in various domains.
In the blog, I’ve also walked you through the preliminary steps for setting these agents up, from defining tools to creating planner and executor agents. The real-world example I provided illustrates how effective these agents can be in analyzing, planning, and executing complex queries.
This evolution from simple action-oriented agents to sophisticated Plan-and-Execute agents marks a significant advancement in AI capabilities, offering us smarter, more reliable tools for tackling increasingly complex challenges.