Comparisons / DSPy vs LangChain
DSPy vs LangChain: Which Agent Framework to Use?
DSPy dspy replaces hand-written prompts with compiled modules. LangChain langchain is the most popular agent framework. Here is how they compare — and what the same patterns look like in plain Python.
By the numbers
DSPy
33.4k
2.8k
Python
MIT
2023-01-09
Stanford NLP (Omar Khattab)
LangChain
132.3k
21.8k
Python
MIT
2022-10-17
Harrison Chase
Sequoia Capital, Benchmark
$25M Series A (2023), $25M Series B (2024)
3.5M
LangSmith (observability), LangServe (deployment)
Yes
Used by: Notion, Elastic, Instacart
github.com/langchain-ai/langchain →GitHub stats as of April 2026. Stars indicate community interest, not necessarily quality or fit for your use case.
| Concept | DSPy | LangChain | Plain Python |
|---|---|---|---|
| Agent | dspy.ReAct module with signature and tools | AgentExecutor with LLMChain, PromptTemplate, OutputParser | A function that POSTs to /chat/completions with a system prompt |
| Prompts | dspy.Signature defines input/output fields, compiled to optimized prompts | — | An f-string template: prompt = f"Given {input}, return {output}" |
| Optimization | dspy.BootstrapFewShot, MIPROv2 auto-tune prompts against a metric | — | Manual iteration: try different prompts, measure accuracy, pick the best one |
| Tools | Tools passed to ReAct module as callable list | @tool decorator, StructuredTool, BaseTool class hierarchy | A dict of callables: tools = {"search": search, "calc": calculate} |
| Chaining | dspy.ChainOfThought, dspy.Module with forward() composition | — | Function calls in sequence: step1 = summarize(text); step2 = classify(step1) |
| Evaluation | dspy.Evaluate with metric functions and dev sets | — | A for loop over test cases: scores = [metric(predict(x), y) for x, y in test_set] |
| Agent Loop | — | AgentExecutor.invoke() with internal iteration | A while loop: call LLM, check for tool_calls, execute, repeat |
| Conversation | — | ConversationBufferMemory, ConversationSummaryMemory | A messages list that persists outside the function |
| State | — | LangGraph state channels with typed reducers | A dict updated inside the loop: state["turns"] += 1 |
| Memory | — | VectorStoreRetrieverMemory, ConversationEntityMemory | A dict injected into the system prompt, saved via a remember() tool |
| Guardrails | — | OutputParser, PydanticOutputParser, custom validators | Two lists of lambda rules checked before and after the LLM call |
What both do in plain Python
Every concept in the table above — agent, tools, loop, memory, state — maps to a handful of Python primitives: a function, a dict, a list, and a while loop. Both DSPy and LangChain wrap these primitives in their own class hierarchies and APIs. The underlying pattern is the same ~60 lines of code. The difference is how much ceremony each framework adds on top.
When to use DSPy
DSPy's real innovation is automated prompt optimization — replacing manual prompt engineering with algorithmic tuning. This is genuinely novel and valuable for production systems where prompt quality matters at scale. For simple agents or learning, hand-written prompts are easier to understand and modify.
What DSPy does
DSPy takes a fundamentally different approach from other agent frameworks. Instead of providing agent orchestration abstractions, it replaces the prompt engineering process itself. You define a Signature — a typed declaration of inputs and outputs like "question -> answer" — and DSPy compiles it into an optimized prompt. The framework provides modules like ChainOfThought (adds reasoning steps), ReAct (adds tool use), and ProgramOfThought (generates code). The key innovation is Optimizers: algorithms like BootstrapFewShot and MIPROv2 that automatically find the best instructions and few-shot examples by evaluating against a metric you define. This means prompts improve systematically rather than through trial-and-error. DSPy treats prompts as a compilation target, not a hand-authored artifact.
The plain Python equivalent
A Signature is an f-string template with named placeholders. ChainOfThought adds "Let's think step by step" to your prompt — literally one line. ReAct is the standard agent loop: call the LLM, parse tool calls, execute them, repeat. The real difference is optimization. In plain Python, you manually write prompts, test them against examples, adjust wording, and repeat. DSPy automates this cycle with search algorithms. The plain equivalent is a script that tries N prompt variants, scores each against a test set, and picks the winner. This is tedious but conceptually simple — a for loop over prompt templates with an accuracy check. The agent pattern itself (function + dict + loop) is identical to every other framework.
When to use LangChain
LangChain adds value when you need production integrations (vector stores, specific LLM providers, deployment tooling). But if you want to understand what's happening — or your use case is straightforward — the plain Python version is easier to debug, modify, and reason about.
What LangChain does
LangChain provides a unifying interface across LLM providers, a class hierarchy for tools and memory, and orchestration via AgentExecutor and LangGraph. The core value proposition is interchangeable components: swap OpenAI for Anthropic by changing one class, plug in a vector store for retrieval, add memory without rewriting your loop. It also ships with dozens of integrations — document loaders, text splitters, embedding models, vector stores — that save you from writing boilerplate HTTP calls. For teams that need to compose many integrations quickly, this catalog is genuinely useful. The tradeoff is that you inherit a large dependency tree and a set of abstractions that sit between you and the actual API calls.
The plain Python equivalent
Every LangChain abstraction maps to a small piece of plain Python. AgentExecutor is a while loop that calls the LLM, checks for tool_calls in the response, executes the matching function from a tools dict, appends the result to a messages array, and repeats. Memory is a dict you inject into the system prompt. Output parsing is a function that validates the LLM's response before returning it. The entire agent — tool dispatch, conversation history, state tracking, guardrails — fits in about 60 lines of Python. No base classes, no decorators, no chain composition. Just a function, a dict, a list, and a loop. When something breaks, you read your 60 lines instead of navigating a class hierarchy.
Or build your own in 60 lines
Both DSPy and LangChain implement the same 8 patterns. An agent is a function. Tools are a dict. The loop is a while loop. The whole thing composes in ~60 lines of Python.
No framework. No dependencies. No opinions. Just the code.
Build it from scratch →