What is CrewAI and how does it work?

CrewAI organizes AI agents into Crews with defined Agents (role, goal, tools) and Tasks (work items). The Crew orchestrates execution either sequentially or hierarchically. Under the hood, each Agent runs the same while loop pattern: call LLM, dispatch tools, repeat.

Do I need CrewAI for multi-agent systems?

Not necessarily. Most multi-agent systems are one agent function called with different system prompts. CrewAI adds value when you need complex orchestration, role-based delegation, or hierarchical task management. For simpler cases, plain Python functions with a task queue work fine.

What is the difference between CrewAI and LangChain?

LangChain focuses on single-agent tool use with broad integrations (vector stores, LLM providers). CrewAI focuses on multi-agent orchestration with named roles. LangChain is better for RAG pipelines; CrewAI is better for workflows where agents with different specialties collaborate.

What is DSPy and how is it different from LangChain?

DSPy is a Stanford NLP framework that replaces hand-written prompts with compiled modules. You define input/output Signatures and let Optimizers auto-tune prompts against a metric. LangChain focuses on agent orchestration and integrations. DSPy focuses on making prompts better algorithmically — they solve different problems.

Can DSPy build AI agents?

Yes. DSPy provides a ReAct module that implements the standard agent loop (reason, act, observe) with tool calling. However, DSPy's primary value is prompt optimization, not agent orchestration. The agent capabilities are a module within the broader framework, not the core focus.

Do I need DSPy for prompt engineering?

No. Most prompts work well enough with manual iteration. DSPy adds value when you have evaluation datasets, clear quality metrics, and need systematic prompt improvement at scale — especially when switching between LLM providers. For simple or prototype use cases, f-string templates are faster.

Comparisons / CrewAI vs DSPy

CrewAI vs DSPy: Which Agent Framework to Use?

CrewAI crewai organizes work into agents, tasks, and crews. DSPy dspy replaces hand-written prompts with compiled modules. Here is how they compare — and what the same patterns look like in plain Python.

By the numbers

CrewAI

GitHub Stars

48.0k

Forks

6.5k

Language

Python

License

MIT

Created

2023-10-27

Created by

João Moura

github.com/crewAIInc/crewAI →

DSPy

GitHub Stars

33.4k

Forks

2.8k

Language

Python

License

MIT

Created

2023-01-09

Created by

Stanford NLP (Omar Khattab)

github.com/stanfordnlp/dspy →

GitHub stats as of April 2026. Stars indicate community interest, not necessarily quality or fit for your use case.

Concept	CrewAI	DSPy	Plain Python
Agent	Agent(role, goal, backstory, tools, llm)	dspy.ReAct module with signature and tools	A function with a system prompt and a tools dict
Tools	Tool registration with @tool decorator, custom Tool classes	Tools passed to ReAct module as callable list	A dict: tools[name](**args)
Agent Loop	Internal to Agent execution, hidden from user	—	A while loop over messages with tool_calls check
Task Delegation	Crew(agents, tasks, process=sequential/hierarchical)	—	A task queue processed in a while loop with a budget cap
Memory	ShortTermMemory, LongTermMemory, EntityMemory	—	A dict injected into the system prompt
State	Task output passed between agents via Crew orchestration	—	A dict tracking tool calls and results
Prompts	—	dspy.Signature defines input/output fields, compiled to optimized prompts	An f-string template: prompt = f"Given {input}, return {output}"
Optimization	—	dspy.BootstrapFewShot, MIPROv2 auto-tune prompts against a metric	Manual iteration: try different prompts, measure accuracy, pick the best one
Chaining	—	dspy.ChainOfThought, dspy.Module with forward() composition	Function calls in sequence: step1 = summarize(text); step2 = classify(step1)
Evaluation	—	dspy.Evaluate with metric functions and dev sets	A for loop over test cases: scores = [metric(predict(x), y) for x, y in test_set]

What both do in plain Python

Every concept in the table above — agent, tools, loop, memory, state — maps to a handful of Python primitives: a function, a dict, a list, and a while loop. Both CrewAI and DSPy wrap these primitives in their own class hierarchies and APIs. The underlying pattern is the same ~60 lines of code. The difference is how much ceremony each framework adds on top.

When to use CrewAI

CrewAI shines for multi-agent setups where you want named roles ("researcher", "writer"). But the core mechanics — tool dispatch, the agent loop, task scheduling — are the same patterns you can build in plain Python.

What CrewAI does

CrewAI models multi-agent systems as a crew of specialists. Each Agent has a role ("Senior Researcher"), a goal ("Find the best data sources"), a backstory that shapes its behavior, and a set of tools it can use. Tasks define discrete units of work with expected outputs. The Crew orchestrates execution — sequentially, hierarchically, or with a custom process. CrewAI also provides memory systems (short-term, long-term, entity) and delegation, where one agent can hand off subtasks to another. The mental model is a team of people collaborating on a project. For prototyping multi-agent workflows where you want to reason about roles and responsibilities, it provides a clean vocabulary.

The plain Python equivalent

An Agent in CrewAI is a function with a system prompt that includes the role, goal, and backstory. The tools dict maps names to callables. Task delegation is a list of tasks processed in order — each task calls the assigned agent function with the task description appended to the messages. Hierarchical execution is a manager agent that decides which sub-agent to call next (just another tool choice). Memory is a dict injected into the system prompt. The entire crew pattern — multiple agents, task queue, delegation — is a for-loop over tasks, where each iteration calls the right agent function. No Crew class, no process kwarg. Just functions calling functions with a shared state dict passed between them.

Full CrewAI comparison →

When to use DSPy

DSPy's real innovation is automated prompt optimization — replacing manual prompt engineering with algorithmic tuning. This is genuinely novel and valuable for production systems where prompt quality matters at scale. For simple agents or learning, hand-written prompts are easier to understand and modify.

What DSPy does

DSPy takes a fundamentally different approach from other agent frameworks. Instead of providing agent orchestration abstractions, it replaces the prompt engineering process itself. You define a Signature — a typed declaration of inputs and outputs like "question -> answer" — and DSPy compiles it into an optimized prompt. The framework provides modules like ChainOfThought (adds reasoning steps), ReAct (adds tool use), and ProgramOfThought (generates code). The key innovation is Optimizers: algorithms like BootstrapFewShot and MIPROv2 that automatically find the best instructions and few-shot examples by evaluating against a metric you define. This means prompts improve systematically rather than through trial-and-error. DSPy treats prompts as a compilation target, not a hand-authored artifact.

The plain Python equivalent

A Signature is an f-string template with named placeholders. ChainOfThought adds "Let's think step by step" to your prompt — literally one line. ReAct is the standard agent loop: call the LLM, parse tool calls, execute them, repeat. The real difference is optimization. In plain Python, you manually write prompts, test them against examples, adjust wording, and repeat. DSPy automates this cycle with search algorithms. The plain equivalent is a script that tries N prompt variants, scores each against a test set, and picks the winner. This is tedious but conceptually simple — a for loop over prompt templates with an accuracy check. The agent pattern itself (function + dict + loop) is identical to every other framework.

Full DSPy comparison →

Or build your own in 60 lines

Both CrewAI and DSPy implement the same 8 patterns. An agent is a function. Tools are a dict. The loop is a while loop. The whole thing composes in ~60 lines of Python.

No framework. No dependencies. No opinions. Just the code.

Build it from scratch →