May 19, 2026

Why Multi-Agent AI Beat a Single Agent by 90% at Anthropic

When an enterprise AI agent fails, the instinct is to upgrade the model. Here, three case studies show what happens when teams prioritize multi-agent architecture instead.

8 min read

Most AI agent failures come from poor coordination, not weak technology.
Anthropic's multi-agent setup beat a single agent by 90.2 percent on the same task.
Vendasta recovered $1 million in pipeline revenue by replacing manual handoffs with agents.

Staff writer

From AI to FinOps, our team's collective brainpower fuels this blog.

When Aaron Sneed launched his defense technology startup, he could not afford to hire lawyers, accountants, or HR staff. So he built them.

Working from Florida, the 40-year-old solopreneur spent months training a group of fifteen AI agents he calls "The Council." Each agent handles a different job. There are agents for HR, finance, legal work, supply chain, manufacturing, security, and field operations. At the head of the metaphorical table sits (again, metaphorically) a chief-of-staff agent, which Sneed describes as "the voice that sets priority based on parameters like risks, issues, and opportunities."

There’s a hierarchical organizational structure, even amongst the agents. The chief of staff enforces the rules, while recommendations from legal, compliance, and security agents carry more weight. When Sneed faces a conundrum, he posts a document in a shared chat and watches all 15 agents weigh in at once. He calls it a roundtable. The setup, he told Business Insider, saves him roughly 20 hours a week.

Multi-agent AI workflows allow many agents to orchestrate workflows to arrive at better solutions faster. Here we see a flowchart graphic: on one side, “Single Agent” with a vertical flow of Task—Agent—Solution, with an arrow redirecting from Agent back to Task. On the right, we see “Multi Agent” with a top-down flow of Task—Supervisor Agent—six different agents—Solution. — *Source:* *Towards AI* *— Multi-agent AI workflows allow many agents to orchestrate workflows to arrive at better solutions faster.*

It’s an unusual business operation, but what Sneed learned along the way is invaluable for organizations with ten or ten thousand employees alike. When he first started, he tried to assign everything to one agent. It did not work. The agent kept getting confused, missing details, and giving answers that sounded right but fell apart under pressure from his actual lawyer. The fix was structural, not a smarter model. Sneed split the work, gave each agent a defined role, set parameters for weighing recommendations from each agent, and built an infrastructure that let them cross-check each other.

That insight, at scale, is one of the most important shifts happening in enterprise AI right now. Most enterprise AI agent failures are not caused by weak models, but by asking one agent to do work that should be split across several agents that hand off to each other cleanly.

Butting Up Against the Context Window

An AI agent is a program built on top of a large language model, such as those that power ChatGPT or Claude. The model gives the agent the ability to read, write, and reason. The agent has memory, tools, and a job description. You can tell an agent to search the web, pull data from a database, write an email, or take an action in another system.

The trouble starts when companies try to make one agent do everything. A single agent, given a complex job, has to hold all the work in a single “mental workspace,” known as a context window. It has to remember the work it's already done, decide which tool to use next, and keep track of dozens of partial answers.

And the context window can only hold so much. When the workspace fills up, the agent starts dropping threads: forgetting past work, repeating tasks, or ceasing to respond altogether.

A multi-agent AI system splits a single large job among several specialist agents. Each agent gets its own workspace. A lead agent, sometimes called a supervisor, breaks the job into pieces, hands the pieces to the specialists, and combines the results when they come back.

Structure is critical to agentic and multi-agent AI systems: the most common pattern is “supervisor and worker,” where a lead agent directs specialists. Another is a sequential pipeline, where agents pass work down a line, each one finishing a step before the next begins. There are also hierarchical setups in which teams of agents are managed by other agents. All of them share a common idea: separating the work so that no single agent gets buried.

Problems Arise When Agents “Go It Alone”

Most enterprise leaders reach for a better model when their AI agent underperforms. The instinct makes sense: if the output is bad, upgrade the brain. But the data tells a different story. According to Anthropic's engineering team, token usage alone (the inputs, outputs, and context of a single conversation) explains 80 percent of the performance variance on complex tasks. The model does matter, but the architecture matters more.

Three specific problems keep showing up in enterprise deployments. Each one has a multi-agent fix.

Workspace fills up. A single agent working on a long, multi-source task runs out of room. It cannot hold the full picture.
Handoffs break down. When one step of a workflow ends, and another begins, information must be transferred. In human-only workflows, each handoff is a risk of data loss, misrepresentation, or delay.
Single agents are assigned too many tasks. Some enterprise teams build a single agent and give it access to many tools across many departments. The agent then has to decide which tool to use for each request. It guesses wrong.

The three case studies that follow show how real enterprise teams use multi-agent AI to solve each of these problems.

Anthropic's Multi-Agent AI System Beat Single Agents by 90%

Anthropic, the AI company that builds Claude, ran a direct test. The task: identify every board member at every IT company in the S&P 500. For a single AI agent, this is brutal. The agent has to research hundreds of companies, decide which sources to trust, handle different website structures, and hold the partial answers from every search in its memory at the same time.

When Anthropic ran this with a single Claude Opus 4 agent working through the list one company at a time, the agent struggled to keep the full picture in view and could not reliably finish the job.

The team then built a multi-agent AI system using a supervisor and worker pattern. The lead agent, running on Claude Opus 4, broke the task into smaller pieces. It spawned subagents running on Claude Sonnet 4 to handle each piece in parallel. Each subagent worked in its own fresh workspace, focused on one part of the list, used a tight set of search tools, and returned its findings. The lead agent then combined everything into a final answer.

The multi-agent system outperformed the single-agent setup by 90.2 percent on Anthropic's internal research evaluation. The engineering team explained the result this way: the architecture distributes work across agents with separate context windows, which adds capacity for parallel reasoning. The same problem that broke a single agent became solvable when the workload was split.

When a single agent is asked to handle too many sources, too much volume, or too many partial results, performance falls apart. Splitting the work across specialist subagents, each with a narrow focus and its own workspace, removes the overload.

The complete workflow of Anthropic’s multi-agent AI research system. Blocks at the top read (L–R) User, Systems, Lead Researcher, Subagent1, Subagent2, Memory, CitationAgent. A block in the center of the chart reads “Iterative Research Process.” The rest of the diagram includes arrows and loops showing exactly how the research system functions. — *Source —* *Anthropic: The complete workflow of Anthropic’s multi-agent AI research system.*

The trade-off, of course, is cost. Multi-agent systems use roughly 15 times as many tokens as single-chat interactions, so they make sense for high-value tasks but are overkill for simple ones.

How Vendasta Recovered $1 Million in Sales Pipeline

Vendasta, a software company that serves local businesses, had a problem with its sales team. Sales development reps (SDRs) research new prospects, set up first meetings, and pass qualified leads to closers. Vendasta's SDRs were losing huge amounts of time to manual work. The company calculated that its team was losing 282 working days a year to this kind of administrative work, costing more than $1 million in missed pipeline.

The handoffs were the issue. Every step required a human to gather information, clean it up, and pass it to the next step. But each handoff slowed things down, and sometimes data got lost along the way.

Using Zapier, Vendasta built a sequential pipeline that works like an assembly line. When a new prospect comes in through a form or event, the system pulls in contact details from Apollo and Clay, two outside data tools that fill in company information. An AI agent summarizes what is known about the prospect. The record is logged in Vendasta's customer relationship management system (CRM). A routing agent then sends the prospect to the right sales rep instantly.

A second pipeline handles work after a sales call. The call transcript gets summarized automatically, key takeaways are logged in the CRM, and a personalized follow-up email is drafted and queued up for the rep to review before sending.

"Our sales reps are able to focus on deals and not have to worry about doing all these tedious tasks before they're able to get to the next deal."

‍—Jacob Sirrs, Marketing Operations Specialist, Vendasta

The results came from removing the handoff problem. Information now moves from one step to the next without a human in the middle. Vendasta saves 15 minutes per call, frees up 1,200 minutes (20 hours) of work each day across 20 reps, and credits about $1 million in reclaimed pipeline revenue to the change.

BASF Coatings Used Multi-Agent AI to Replace Five Tools With One

BASF is one of the largest chemical companies in the world, with more than 11,000 employees across over 70 sites. Its Coatings division makes automotive and industrial coatings, and its sales team has a research problem common to large enterprises: too many tools, too many data sources, and no unified way to ask questions.

Before multi-agent AI, a BASF Coatings sales rep preparing for a customer meeting had to pull together information from very different places. Customer visit reports lived in Salesforce, market consumption insights were stored in internal data tables, and external market news lived as PDF files and free-text reports.

This is a tool and domain overload problem. A single AI agent given access to all those tools would struggle to pick the right one for each question. Ask it for last quarter's sales numbers, and it might search the PDFs. Ask it about industry news, and it might query the database. There were just too many ways to fail.

Here we see the user chat window for Marketmind, BASF Coating’s multi-agent AI system. The user has asked, “What has happened in my market this week?” and Marketmind Control Center responded with information categorized by “DEFEND” and “OPPORTUNITY.” — *Source:* *Databricks* *— Marketmind, BASF Coating’s multi-agent AI system.*

‍In partnership with Databricks, BASF Coatings built Marketmind, a multi-agent AI system delivered to sales reps through Microsoft Teams. Marketmind is the assistant that the reps actually talk to. Behind it sits a supervisor agent that routes each question to the right specialist. One specialist, called a Genie agent, handles structured data, including the Salesforce visit reports and sales tables. Another specialist handles unstructured data, scanning PDFs and market news through vector search. The supervisor reads each question, decides which specialist (or both) should answer, and combines the results.

"Marketmind turns our field interactions into timely, AI-driven actions, nudging smart follow-ups, surfacing relevant opportunities, and connecting peers facing similar challenges. The result: faster prep, sharper customer conversations, and more time selling where it counts."

‍—Adrian Fierro, Head of Global Market Intelligence, BASF Coatings

BASF Coatings began scoping in April 2025, ran a five-to-six-week proof of concept, piloted with 25 users, and launched in North America that October. The rollout target is more than 1,000 sales representatives worldwide.

What This Means for Enterprise Leaders

All of the case studies cited here started in the same place. A team had a complex job, tried to solve it with one AI agent, the agent fell short, and the fix turned out to be a better structure.

Before approving the next model upgrade, enterprise leaders can ask three diagnostic questions about any underperforming AI deployment.

Is the agent running out of workspace? If the task involves long documents, many sources, or extended sessions, the answer is probably yes. A multi-agent AI system with separate workspaces for each specialist will help more than a smarter model.
Are handoffs between steps losing data? If a workflow has multiple stages and humans are passing information between them, the answer is yes. A sequential pipeline of agents will close the gap.
Is one agent being asked to route across too many tools and domains? If the agent has access to a wide range of systems and frequently picks the wrong one, the answer is yes. A supervisor and worker pattern will reduce the misrouting.

Aaron Sneed figured this out on his own, by trial and error, with fifteen agents on a roundtable. Enterprise teams have the benefit of his lesson and of the production examples that have followed it. The next best AI investment is probably not a new model, but a better structure for combining the tools already in hand.

Frequent Asked Questions

Is multi-agent AI more expensive than a single AI agent?

Multi-agent AI systems use roughly 15 times more tokens than single chat interactions, according to Anthropic. The trade-off makes sense for high-value, complex tasks where coordination problems would otherwise cause failures. For simple, short tasks that one agent can handle in a single context window, multi-agent setups are overkill.

What are the most common multi-agent AI architecture patterns?

The three most common patterns in production are supervisor and worker, sequential pipeline, and hierarchical multi-agent. Supervisor and worker models use a lead agent to coordinate specialists. Sequential pipeline structures pass work from one agent to the next like an assembly line. Hierarchical setups have teams of agents managed by other agents.

What is the supervisor and worker pattern in multi-agent AI?

A supervisor and worker pattern uses a lead agent to direct several specialist agents. The supervisor reads each request, decides which specialist should handle it, sends the work, and combines the results. BASF Coatings uses this pattern in Marketmind to route between structured data and unstructured data agents.

When does multi-agent AI outperform a single AI agent?

Multi-agent AI beats a single agent when the task is large, multi-source, or requires routing across different tools. Anthropic found that multi-agent setups outperformed a single agent by 90.2 percent on a complex research task. But single agents remain a better fit for short, narrow jobs that fit comfortably in one context window.

What is a multi-agent AI system?

A multi-agent AI system splits one job across several specialist agents that each work in their own context window. A lead agent breaks the job into pieces, hands the pieces to the specialists, and combines the results.

More from Enterprise AI

3 Ways to Achieve Model-to-Mission Fit in Enterprise AI

AI Enterprise external link icon with diagonal arrow expanding outward

Expert Systems in AI: OpenAI's Latest Update Changes Everything

Increasing Click-Through Rates by 254%: Why Every Business Needs AI in Marketing

Open-Source AI Models May Be the Biggest Shift Since Cloud Computing

The New AI Workforce: How Domain Experts Are Training NexGen AI Models