Comparisons / AutoGen vs LlamaIndex

AutoGen vs LlamaIndex: Which Agent Framework to Use?

AutoGen autogen by microsoft models agents as conversableagents that chat with each other. LlamaIndex llamaindex started as a rag framework — connect your data, query it with an llm. Here is how they compare — and what the same patterns look like in plain Python.

By the numbers

AutoGen

GitHub Stars

56.7k

Forks

8.5k

Language

Python

License

CC-BY-4.0

Created

2023-08-18

Created by

Microsoft Research

github.com/microsoft/autogen

LlamaIndex

GitHub Stars

48.3k

Forks

7.2k

Language

Python

License

MIT

Created

2022-11-02

Created by

Jerry Liu

github.com/run-llama/llama_index

GitHub stats as of April 2026. Stars indicate community interest, not necessarily quality or fit for your use case.

ConceptAutoGenLlamaIndexPlain Python
AgentConversableAgent with system_message, llm_configAgentRunner with AgentWorker, or ReActAgent for tool-calling agentsA function with a system prompt that POSTs to the LLM API
Toolsregister_for_llm() and register_for_execution()FunctionTool for custom tools, QueryEngineTool to query an index as a toolA dict of callables + JSON schema descriptions
ConversationTwo-agent chat with initiate_chat(), message historyA messages array that grows with each turn
Multi-AgentGroupChat with GroupChatManager, speaker selectionMultiple agent functions called in sequence on shared messages
Nested Chatsregister_nested_chats() for sub-task handlingA task queue (BFS) — agent schedules follow-ups via a tool
Terminationis_termination_msg callback, max_consecutive_auto_replyThe while loop exits when no tool_calls or max_turns reached
Agent LoopAgentRunner.chat() manages step-by-step execution via AgentWorker tasksA while loop: call LLM, check for tool_calls, execute, repeat
RAG IntegrationVectorStoreIndex + QueryEngineTool — the agent can query your data as a tool callA tool function that embeds the query, searches a vector store, and returns top-k results
MemoryChatMemoryBuffer with token limit, or custom memory modulesA messages list with optional truncation: messages = messages[-max_turns:]
OrchestrationAgentRunner step API for custom control flow, or multi-agent pipelinesSequential function calls with results passed between them

What both do in plain Python

Every concept in the table above — agent, tools, loop, memory, state — maps to a handful of Python primitives: a function, a dict, a list, and a while loop. Both AutoGen and LlamaIndex wrap these primitives in their own class hierarchies and APIs. The underlying pattern is the same ~60 lines of code. The difference is how much ceremony each framework adds on top.

When to use AutoGen

AutoGen excels at complex multi-agent workflows where agents need to debate or collaborate. For single-agent use cases or simple tool-calling agents, the plain Python version is significantly simpler.

What AutoGen does

AutoGen's core abstraction is the ConversableAgent — an agent that can send and receive messages. Two agents chat by alternating turns on a shared message history. GroupChat extends this to N agents, with a GroupChatManager that selects the next speaker (round-robin, random, or LLM-based selection). Nested chats allow an agent to spin up a sub-conversation to handle a complex subtask before returning to the main thread. AutoGen also provides code execution sandboxes, letting agents write and run code as part of their conversation. The framework thinks in terms of conversations, not chains or graphs. This makes it natural for workflows where agents need to debate, critique, or iteratively refine outputs together.

The plain Python equivalent

A ConversableAgent is a function that takes a messages array, calls the LLM with a system prompt, and returns the assistant message. Two-agent chat is a while loop where you alternate between calling agent_a(messages) and agent_b(messages), appending each response. GroupChat is the same loop but with a speaker selection step — either rotate through a list or ask the LLM "who should speak next?" and call that agent function. Nested chats are a function call within the loop: pause the main conversation, run a sub-loop with different agents, and inject the result back. Tool registration is adding functions to a tools dict with their JSON schemas. The conversation-as-primitive model is just messages arrays passed between functions.

Full AutoGen comparison →

When to use LlamaIndex

LlamaIndex adds genuine value when your agent needs to query structured or unstructured data as part of its reasoning — that's the index-as-tool pattern, and it's well-executed. But if you're building a general-purpose agent that doesn't need RAG, the agent framework is overhead. The plain Python version of the agent loop is the same 60 lines either way.

What LlamaIndex agents do

LlamaIndex's agent system builds on its core strength: data indexing. You create a VectorStoreIndex over your documents, wrap it in a QueryEngineTool, and hand it to a ReActAgent. The agent can then query your data as a tool call — the same way it might call a calculator or web search. AgentRunner manages the execution loop: it sends messages to the LLM, parses tool calls, dispatches them (including index queries), and accumulates results. FunctionTool lets you wrap any Python function as a tool. The unique value over other frameworks is the tight integration between retrieval and agent reasoning — your data becomes a first-class tool, not an afterthought bolted onto a generic agent loop.

The plain Python equivalent

The agent loop is the same pattern as every other framework: a while loop that calls the LLM, checks for tool_calls, dispatches from a dict, and repeats. What LlamaIndex adds is the retrieval tool. In plain Python, that's a function: embed the query with an API call, search your vector store (Pinecone, pgvector, FAISS — all have simple clients), return the top-k chunks as a string. You put that function in your tools dict alongside everything else. The agent doesn't know or care that one tool queries an index — it's just another callable. The total code is about 60 lines for the agent loop plus 15-20 lines for the retrieval function. No AgentRunner, no AgentWorker, no QueryEngineTool.

Full LlamaIndex comparison →

Or build your own in 60 lines

Both AutoGen and LlamaIndex implement the same 8 patterns. An agent is a function. Tools are a dict. The loop is a while loop. The whole thing composes in ~60 lines of Python.

No framework. No dependencies. No opinions. Just the code.

Build it from scratch →