The human bottleneck: Someone still has to write the task decomposition. Asking an agent to 'build a SaaS app' produces a flat todo list that ignores technical debt, migration sequences, and cross-cutting concerns like auth that touch everything.
Subtask explosion: A feature like 'add real-time notifications' touches backend websockets, database schema changes, frontend state management, push infrastructure, and permission checks. Agents decompose this into 30 sub-tasks that each require knowing decisions made in the other 29.
No rollback intuition: When subtask C depends on subtask B which depends on subtask A, a failure in A requires rolling forward, not backward. Agents don't have robust strategies for partial failure recovery in multi-step plans.
Test-driven decomposition is circular: If agents write tests first, those tests encode the agent's misunderstanding of the system. If humans write tests, the human is doing the hard decomposition work that the agent was supposed to handle.
Empirical data: Recent studies show agentic task decomposition has a ~30% success rate on realistic software engineering tasks with >5 subtask dependencies. That's not production-ready.
SOTA approach: Use a dedicated planner agent that generates a task tree in JSON/Mermaid format. Human reviews and approves. Executor agents handle leaf tasks. This human-in-the-loop planning is reliable today and doesn't require the planner to be perfect.
Dependency graphs are tractable: Modern agentic frameworks (LangGraph, AutoGen, CrewAI) implement state machines where task dependencies are explicit edges in a graph. Failures propagate correctly — if task C fails, the system knows exactly which downstream tasks are affected.
Iterative refinement works: Start with a coarse 5-task plan, execute, observe failures, refine the plan. This is how human engineers work too — nobody writes a perfect spec upfront. Agents are actually better at this loop than humans because they never get tired of re-planning.
The Mac Mini advantage: For local workflows, you can run a small planner model (like a fine-tuned 7B) that outputs structured plans, then use Claude Code for execution. Cheap, fast, and the plan is auditable.
Real pattern: The most successful agentic setups use 'intention → plan → execute → verify' where the plan is a first-class artifact. This maps to how senior developers actually work — think, then type.