Agents
The core idea of agents is to use an LLM to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.
Some important terminology (and schema) to know:
AgentAction
: This is a dataclass that represents the action an agent should take. It has atool
property (which is the name of the tool that should be invoked) and atool_input
property (the input to that tool)AgentFinish
: This is a dataclass that signifies that the agent has finished and should return to the user. It has areturn_values
parameter, which is a dictionary to return. It often only has one key -output
- that is a string, and so often it is just this key that is returned.intermediate_steps
: These represent previous agent actions and corresponding outputs that are passed around. These are important to pass to future iteration so the agent knows what work it has already done. This is typed as aList[Tuple[AgentAction, Any]]
. Note that observation is currently left as typeAny
to be maximally flexible. In practice, this is often a string.
There are several key components here:
Agent
This is the chain responsible for deciding what step to take next. This is powered by a language model and a prompt. The inputs to this chain are:
- List of available tools
- User input
- Any previously executed steps (
intermediate_steps
)
This chain then returns either the next action to take or the final response to send to the user (AgentAction
or AgentFinish
).
Different agents have different prompting styles for reasoning, different ways of encoding input, and different ways of parsing the output. For a full list of agent types see agent types
Tools
Tools are functions that an agent calls. There are two important considerations here:
- Giving the agent access to the right tools
- Describing the tools in a way that is most helpful to the agent
Without both, the agent you are trying to build will not work. If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objective. If you don't describe the tools properly, the agent won't know how to properly use them.
LangChain provides a wide set of tools to get started, but also makes it easy to define your own (including custom descriptions). For a full list of tools, see here
Toolkits
Often the set of tools an agent has access to is more important than a single tool. For this LangChain provides the concept of toolkits - groups of tools needed to accomplish specific objectives. There are generally around 3-5 tools in a toolkit.
LangChain provides a wide set of toolkits to get started. For a full list of toolkits, see here
AgentExecutor
The agent executor is the runtime for an agent. This is what actually calls the agent and executes the actions it chooses. Pseudocode for this runtime is below:
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
While this may seem simple, there are several complexities this runtime handles for you, including:
- Handling cases where the agent selects a non-existent tool
- Handling cases where the tool errors
- Handling cases where the agent produces output that cannot be parsed into a tool invocation
- Logging and observability at all levels (agent decisions, tool calls) either to stdout or LangSmith.
Other types of agent runtimes
The AgentExecutor
class is the main agent runtime supported by LangChain.
However, there are other, more experimental runtimes we also support.
These include:
Get started
This will go over how to get started building an agent.
We will create this agent from scratch, using LangChain Expression Language.
We will then define custom tools, and then run it in a custom loop (we will also show how to use the standard LangChain AgentExecutor
).
Set up the agent
We first need to create our agent. This is the chain responsible for determining what action to take next.
In this example, we will use OpenAI Function Calling to create this agent. This is generally the most reliable way create agents. In this example we will show what it is like to construct this agent from scratch, using LangChain Expression Language.
For this guide, we will construct a custom agent that has access to a custom tool. We are choosing this example because we think for most use cases you will NEED to customize either the agent or the tools. The tool we will give the agent is a tool to calculate the length of a word. This is useful because this is actually something LLMs can mess up due to tokenization. We will first create it WITHOUT memory, but we will then show how to add memory in. Memory is needed to enable conversation.
First, let's load the language model we're going to use to control the agent.
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0)
Next, let's define some tools to use. Let's write a really simple Python function to calculate the length of a word that is passed in.
from langchain.agents import tool
@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)
tools = [get_word_length]
Now let us create the prompt.
Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format.
We will just have two input variables: input
(for the user question) and agent_scratchpad
(for any previous steps taken)
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
("system", "You are very powerful assistant, but bad at calculating lengths of words."),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
How does the agent know what tools it can use? Those are passed in as a separate argument, so we can bind those as key word arguments to the LLM.
from langchain.tools.render import format_tool_to_openai_function
llm_with_tools = llm.bind(
functions=[format_tool_to_openai_function(t) for t in tools]
)
Putting those pieces together, we can now create the agent. We will import two last utility functions: a component for formatting intermediate steps to messages, and a component for converting the output message into an agent action/agent finish.
from langchain.agents.format_scratchpad import format_to_openai_functions
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
agent = {
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_functions(x['intermediate_steps'])
} | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
Now that we have our agent, let's play around with it! Let's pass in a simple question and empty intermediate steps and see what it returns:
agent.invoke({
"input": "how many letters in the word educa?",
"intermediate_steps": []
})
We can see that it responds with an AgentAction
to take (it's actually an AgentActionMessageLog
- a subclass of AgentAction
which also tracks the full message log).
So this is just the first step - now we need to write a runtime for this.
The simplest one is just one that continuously loops, calling the agent, then taking the action, and repeating until an AgentFinish
is returned.
Let's code that up below:
from langchain.schema.agent import AgentFinish
intermediate_steps = []
while True:
output = agent.invoke({
"input": "how many letters in the word educa?",
"intermediate_steps": intermediate_steps
})
if isinstance(output, AgentFinish):
final_result = output.return_values["output"]
break
else:
print(output.tool, output.tool_input)
tool = {
"get_word_length": get_word_length
}[output.tool]
observation = tool.run(output.tool_input)
intermediate_steps.append((output, observation))
print(final_result)
We can see this prints out the following:
get_word_length {'word': 'educa'}
There are 5 letters in the word "educa".
Woo! It's working.
To simplify this a bit, we can import and use the AgentExecutor
class.
This bundles up all of the above and adds in error handling, early stopping, tracing, and other quality-of-life improvements that reduce safeguards you need to write.
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
Now let's test it out!
agent_executor.invoke({"input": "how many letters in the word educa?"})
> Entering new AgentExecutor chain...
Invoking: `get_word_length` with `{'word': 'educa'}`
5
There are 5 letters in the word "educa".
> Finished chain.
'There are 5 letters in the word "educa".'
This is great - we have an agent! However, this agent is stateless - it doesn't remember anything about previous interactions. This means you can't ask follow up questions easily. Let's fix that by adding in memory.
In order to do this, we need to do two things:
- Add a place for memory variables to go in the prompt
- Keep track of the chat history
First, let's add a place for memory in the prompt.
We do this by adding a placeholder for messages with the key "chat_history"
.
Notice that we put this ABOVE the new user input (to follow the conversation flow).
from langchain.prompts import MessagesPlaceholder
MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages([
("system", "You are very powerful assistant, but bad at calculating lengths of words."),
MessagesPlaceholder(variable_name=MEMORY_KEY),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
We can then set up a list to track the chat history
from langchain.schema.messages import HumanMessage, AIMessage
chat_history = []
We can then put it all together!
agent = {
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_functions(x['intermediate_steps']),
"chat_history": lambda x: x["chat_history"]
} | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser()
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
When running, we now need to track the inputs and outputs as chat history
input1 = "how many letters in the word educa?"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.append(HumanMessage(content=input1))
chat_history.append(AIMessage(content=result['output']))
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})
Next Steps
Awesome! You've now run your first end-to-end agent. To dive deeper, you can:
- Check out all the different agent types supported
- Learn all the controls for AgentExecutor
- See a full list of all the off-the-shelf toolkits we provide
- Explore all the individual tools supported