Introduction
LangGraph is a powerful library built by LangChain Inc. for building stateful, multi-actor applications with Large Language Models. It models your LLM application as a graph — specifically a state machine — where computation is organized as nodes connected by edges, with a shared state object flowing through the graph.
Unlike simple chain-based approaches, LangGraph introduces graph-based workflows that allow for cycles, conditional routing, and persistent state management. This makes it ideal for building AI agents that need to reason, use tools, and maintain context across complex multi-step interactions.
In this guide, we will walk through the fundamentals of LangGraph, build a practical AI agent from scratch, and cover best practices for production deployments.
LangGraph vs. LangChain: Understanding the Relationship
LangGraph is part of the broader LangChain ecosystem but is a separate package with its own release cycle. Here is how they relate:
- LangChain provides abstractions for LLM calls, prompt templates, chains, retrievers, and tool integrations. It is designed primarily for linear, sequential pipelines.
- LangGraph extends this with cyclic graph execution, which is essential for agent-like behavior where an LLM may need to loop — call a tool, observe the result, and decide whether to call another tool or finish.
- LangGraph replaced the older
AgentExecutorfrom LangChain as the recommended way to build agents.
- Use LangChain (LCEL) for linear pipelines: prompt → LLM → parser, simple RAG, summarization, translation
- Use LangGraph for agentic behavior, tool-use loops, multi-step reasoning, conversation memory, human-in-the-loop, and multi-agent systems
Core Concepts
StateGraph
StateGraph is the primary class for building a LangGraph application. You define a state schema (typically a TypedDict or Pydantic model), and the graph manages how that state evolves as it flows through nodes.
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from operator import add
class AgentState(TypedDict):
messages: Annotated[list, add] # 'add' reducer appends to list
next_step: str
graph = StateGraph(AgentState)
The Annotated[list, add] pattern uses a reducer function. When a node returns {"messages": [new_message]}, the reducer determines how the returned value merges with existing state. The add reducer appends rather than replacing. Without a reducer, values are overwritten.
Nodes
Nodes are Python functions that receive the current state and return a partial state update:
def call_model(state: AgentState):
messages = state["messages"]
response = model.invoke(messages)
return {"messages": [response]} # Only return what changed
def call_tool(state: AgentState):
last_message = state["messages"][-1]
tool_call = last_message.tool_calls[0]
result = tool_executor.invoke(tool_call)
return {"messages": [result]}
graph.add_node("agent", call_model)
graph.add_node("tools", call_tool)
Edges and Conditional Routing
Edges define transitions between nodes. Normal edges are unconditional, while conditional edges route dynamically based on state:
# Normal edges
graph.add_edge(START, "agent")
graph.add_edge("tools", "agent")
# Conditional edge - routes based on whether the LLM called tools
def should_continue(state: AgentState):
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
graph.add_conditional_edges(
"agent",
should_continue,
{"tools": "tools", END: END}
)
Installation and Setup
# Basic installation
pip install langgraph
# With an LLM provider
pip install langgraph langchain-openai # For OpenAI
pip install langgraph langchain-anthropic # For Anthropic
# With persistence backends
pip install langgraph-checkpoint-sqlite # SQLite (local)
pip install langgraph-checkpoint-postgres # PostgreSQL (production)
# Environment setup
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
# Optional: Enable LangSmith tracing for debugging
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
Building a ReAct Agent: Practical Example
The ReAct (Reasoning + Acting) pattern is the most fundamental LangGraph pattern. The agent reasons about the current state, selects and calls a tool, observes the result, and loops back until it decides to respond. Let's build one:
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
# --- Define Tools ---
@tool
def search(query: str) -> str:
"""Search the web for current information."""
if "weather" in query.lower():
return "Current weather in San Francisco: 65F, partly cloudy."
return f"Search results for: {query}"
@tool
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
return str(eval(expression))
except Exception as e:
return f"Error: {e}"
tools = [search, calculator]
# --- Initialize Model with Tools ---
model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)
model_with_tools = model.bind_tools(tools)
# --- Define the Graph ---
def call_agent(state: MessagesState):
system = SystemMessage(content=(
"You are a helpful research assistant. "
"Use the search tool for current info, calculator for math."
))
messages = [system] + state["messages"]
response = model_with_tools.invoke(messages)
return {"messages": [response]}
def should_continue(state: MessagesState):
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
# Build the graph
workflow = StateGraph(MessagesState)
workflow.add_node("agent", call_agent)
workflow.add_node("tools", ToolNode(tools))
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")
# Compile with memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
# --- Run the Agent ---
config = {"configurable": {"thread_id": "user-session-1"}}
response = app.invoke(
{"messages": [HumanMessage(content="What's the weather in San Francisco?")]},
config=config,
)
print(response["messages"][-1].content)
For simpler use cases, LangGraph provides a prebuilt shortcut:
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(model, tools, checkpointer=memory)
result = agent.invoke(
{"messages": [("user", "Search for the latest news about AI")]},
config={"configurable": {"thread_id": "session-1"}},
)
Key Patterns
Multi-Agent Orchestration
LangGraph supports multi-agent systems where specialized agents collaborate. In the supervisor pattern, one agent routes tasks to specialized sub-agents:
workflow = StateGraph(MessagesState)
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
workflow.add_edge(START, "supervisor")
workflow.add_conditional_edges("supervisor", route_to_agent, {
"researcher": "researcher",
"coder": "coder",
END: END,
})
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("coder", "supervisor")
Human-in-the-Loop
LangGraph provides first-class support for human intervention via the interrupt mechanism:
from langgraph.types import interrupt, Command
def sensitive_action(state: MessagesState):
approval = interrupt({
"question": "Do you approve this action?",
"proposed_action": state["messages"][-1].content,
})
if approval == "yes":
return {"messages": [AIMessage(content="Action approved and executed.")]}
return {"messages": [AIMessage(content="Action cancelled by user.")]}
# Resume after interrupt
result = app.invoke(Command(resume="yes"), config)
Checkpointing and Persistence
Every time the graph executes a step, the state is saved to a checkpoint. This enables conversation memory, time-travel debugging, fault tolerance, and branching.
- MemorySaver — In-memory, development only. State lost on restart.
- SqliteSaver — File-based SQLite for local persistence.
- PostgresSaver — Production-grade with connection pooling and async support.
# Production: PostgreSQL
from langgraph.checkpoint.postgres import PostgresSaver
DB_URI = "postgresql://user:pass@localhost:5432/langgraph"
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
checkpointer.setup() # Creates tables (run once)
app = workflow.compile(checkpointer=checkpointer)
# Time-travel: inspect state history
config = {"configurable": {"thread_id": "user-123"}}
all_states = list(app.get_state_history(config))
for state in all_states:
print(f"Step: {state.next}, Checkpoint: {state.config['configurable']['checkpoint_id']}")
Streaming
LangGraph offers multiple streaming modes for real-time output:
# Stream full state after each node
for state_snapshot in app.stream(inputs, config, stream_mode="values"):
print(state_snapshot["messages"][-1].content)
# Stream only incremental updates
for node_name, update in app.stream(inputs, config, stream_mode="updates"):
print(f"Node '{node_name}': {update}")
# Stream individual LLM tokens
for chunk, metadata in app.stream(inputs, config, stream_mode="messages"):
if chunk.content:
print(chunk.content, end="", flush=True)
Production Deployment
LangGraph Platform provides the official deployment infrastructure with a REST API server, task queue, and built-in PostgreSQL persistence.
# langgraph.json - deployment configuration
{
"dependencies": ["."],
"graphs": {
"my_agent": "./my_agent/graph.py:app"
},
"env": ".env"
}
# Build and run locally
pip install langgraph-cli
langgraph dev # Development server with auto-reload
langgraph build -t my-agent-image # Docker image for production
- Use PostgreSQL for checkpointing, not SQLite or MemorySaver
- Enable LangSmith tracing for observability of every graph execution
- Set recursion limits to prevent infinite loops:
config={"recursion_limit": 25} - Use async (
ainvoke,astream) for better concurrency - Keep nodes focused — each node should do one thing
- Test individual nodes independently before assembling the graph
Common Pitfalls
- Forgetting the reducer: Without
Annotated[list, add_messages], each node’s return value replaces the entire messages list, losing conversation history. - Infinite loops: If the LLM always calls tools, the agent-tool loop runs forever. Always set a recursion limit and consider a max-iterations counter in your state.
- Missing thread_id: Every invoke without a
thread_idstarts a fresh conversation, even with a checkpointer. - Returning full state: Nodes should return only the keys they want to update, not the entire state object.
- Unhandled tool errors: Wrap tool execution in try/catch to prevent graph crashes.
Conclusion
LangGraph represents a significant evolution in LLM application development, moving from simple chains to sophisticated, stateful agent architectures. Its graph-based approach provides the control and flexibility needed for production AI systems, while built-in checkpointing and streaming make it practical for real-world deployment.
Whether you are building a simple chatbot with tool use or a complex multi-agent orchestration system, LangGraph provides the primitives needed to create reliable, maintainable AI applications. Start with the prebuilt create_react_agent for quick prototyping, then graduate to the full StateGraph API as your requirements grow.