The handoff problem: When agent A finishes and hands off to agent B, what exactly does B receive? A text summary loses specificity. The full context window is too large. Every handoff loses information — this is the电话 game problem for agents.
Error cascades: In a 5-agent pipeline, if agent 3 has a subtle misunderstanding, agents 4 and 5 build on that error. By the time you detect it, you need to roll back all 5. Single-agent approaches fail in one place. Multi-agent approaches fail in cascading ways.
Debugging multi-agent systems is hell: Which agent was wrong? The one that made the decision, or the one that executed it? When a pipeline fails, you need to inspect each agent's reasoning independently — requiring tooling that doesn't exist yet.
Token costs multiply: A 3-step pipeline where each agent uses 50K tokens costs 150K tokens minimum, even for simple tasks. For tasks a single agent could handle, multi-agent is 3x the cost for 1.1x the quality.
Coordination is skilled labor: The senior developer who can orchestrate multiple agents effectively is doing the same work as a tech lead — they're just using a different interface. You haven't replaced the tech lead.
The supervisor pattern works: A coordinator agent receives the task, decomposes it into bounded subtasks, spawns specialist agents with explicit context and explicit output schemas, then synthesizes the results. This is how CrewAI and LangGraph are designed.
Structured output eliminates handoff ambiguity: Use JSON Schema to define exactly what each specialist returns. If the schema says 'returns { files_changed: string[], summary: string, test_coverage: number }', there's no ambiguity about what was handed off.
Error recovery with supervisor: When a specialist fails, the supervisor catches it, marks that branch as failed, and tries an alternative approach — different specialist, different decomposition. This is the 'try again with more context' loop that humans do naturally.
Parallelism where it counts: Embarrassingly parallel tasks (lint all files, run all tests, check all endpoints) run simultaneously in specialist agents. Sequential dependency only exists where it must. This is how you get 10x throughput on the right problems.
Mac Mini viable: Run the supervisor in the cloud (Claude API), specialists as Claude Code processes on the Mac Mini. The Mac Mini handles execution (terminal, git, file ops) while cloud handles reasoning. Best of both worlds — fast local execution, powerful cloud reasoning.