Comparisons / AutoGen vs LlamaIndex
AutoGen vs LlamaIndex: Which Agent Framework to Use?
AutoGen autogen by microsoft models agents as conversableagents that chat with each other. LlamaIndex llamaindex started as a rag framework — connect your data, query it with an llm. Here is how they compare — and what the same patterns look like in plain Python.
By the numbers
AutoGen
56.7k
8.5k
Python
CC-BY-4.0
2023-08-18
Microsoft Research
LlamaIndex
48.3k
7.2k
Python
MIT
2022-11-02
Jerry Liu
GitHub stats as of April 2026. Stars indicate community interest, not necessarily quality or fit for your use case.
| Concept | AutoGen | LlamaIndex | Plain Python |
|---|---|---|---|
| Agent | ConversableAgent with system_message, llm_config | AgentRunner with AgentWorker, or ReActAgent for tool-calling agents | A function with a system prompt that POSTs to the LLM API |
| Tools | register_for_llm() and register_for_execution() | FunctionTool for custom tools, QueryEngineTool to query an index as a tool | A dict of callables + JSON schema descriptions |
| Conversation | Two-agent chat with initiate_chat(), message history | — | A messages array that grows with each turn |
| Multi-Agent | GroupChat with GroupChatManager, speaker selection | — | Multiple agent functions called in sequence on shared messages |
| Nested Chats | register_nested_chats() for sub-task handling | — | A task queue (BFS) — agent schedules follow-ups via a tool |
| Termination | is_termination_msg callback, max_consecutive_auto_reply | — | The while loop exits when no tool_calls or max_turns reached |
| Agent Loop | — | AgentRunner.chat() manages step-by-step execution via AgentWorker tasks | A while loop: call LLM, check for tool_calls, execute, repeat |
| RAG Integration | — | VectorStoreIndex + QueryEngineTool — the agent can query your data as a tool call | A tool function that embeds the query, searches a vector store, and returns top-k results |
| Memory | — | ChatMemoryBuffer with token limit, or custom memory modules | A messages list with optional truncation: messages = messages[-max_turns:] |
| Orchestration | — | AgentRunner step API for custom control flow, or multi-agent pipelines | Sequential function calls with results passed between them |
What both do in plain Python
Every concept in the table above — agent, tools, loop, memory, state — maps to a handful of Python primitives: a function, a dict, a list, and a while loop. Both AutoGen and LlamaIndex wrap these primitives in their own class hierarchies and APIs. The underlying pattern is the same ~60 lines of code. The difference is how much ceremony each framework adds on top.
When to use AutoGen
AutoGen excels at complex multi-agent workflows where agents need to debate or collaborate. For single-agent use cases or simple tool-calling agents, the plain Python version is significantly simpler.
What AutoGen does
AutoGen's core abstraction is the ConversableAgent — an agent that can send and receive messages. Two agents chat by alternating turns on a shared message history. GroupChat extends this to N agents, with a GroupChatManager that selects the next speaker (round-robin, random, or LLM-based selection). Nested chats allow an agent to spin up a sub-conversation to handle a complex subtask before returning to the main thread. AutoGen also provides code execution sandboxes, letting agents write and run code as part of their conversation. The framework thinks in terms of conversations, not chains or graphs. This makes it natural for workflows where agents need to debate, critique, or iteratively refine outputs together.
The plain Python equivalent
A ConversableAgent is a function that takes a messages array, calls the LLM with a system prompt, and returns the assistant message. Two-agent chat is a while loop where you alternate between calling agent_a(messages) and agent_b(messages), appending each response. GroupChat is the same loop but with a speaker selection step — either rotate through a list or ask the LLM "who should speak next?" and call that agent function. Nested chats are a function call within the loop: pause the main conversation, run a sub-loop with different agents, and inject the result back. Tool registration is adding functions to a tools dict with their JSON schemas. The conversation-as-primitive model is just messages arrays passed between functions.
When to use LlamaIndex
LlamaIndex adds genuine value when your agent needs to query structured or unstructured data as part of its reasoning — that's the index-as-tool pattern, and it's well-executed. But if you're building a general-purpose agent that doesn't need RAG, the agent framework is overhead. The plain Python version of the agent loop is the same 60 lines either way.
What LlamaIndex agents do
LlamaIndex's agent system builds on its core strength: data indexing. You create a VectorStoreIndex over your documents, wrap it in a QueryEngineTool, and hand it to a ReActAgent. The agent can then query your data as a tool call — the same way it might call a calculator or web search. AgentRunner manages the execution loop: it sends messages to the LLM, parses tool calls, dispatches them (including index queries), and accumulates results. FunctionTool lets you wrap any Python function as a tool. The unique value over other frameworks is the tight integration between retrieval and agent reasoning — your data becomes a first-class tool, not an afterthought bolted onto a generic agent loop.
The plain Python equivalent
The agent loop is the same pattern as every other framework: a while loop that calls the LLM, checks for tool_calls, dispatches from a dict, and repeats. What LlamaIndex adds is the retrieval tool. In plain Python, that's a function: embed the query with an API call, search your vector store (Pinecone, pgvector, FAISS — all have simple clients), return the top-k chunks as a string. You put that function in your tools dict alongside everything else. The agent doesn't know or care that one tool queries an index — it's just another callable. The total code is about 60 lines for the agent loop plus 15-20 lines for the retrieval function. No AgentRunner, no AgentWorker, no QueryEngineTool.
Or build your own in 60 lines
Both AutoGen and LlamaIndex implement the same 8 patterns. An agent is a function. Tools are a dict. The loop is a while loop. The whole thing composes in ~60 lines of Python.
No framework. No dependencies. No opinions. Just the code.
Build it from scratch →