D
The Dev Blog
ai agents architecture distributed-systems multi-agent orchestration

Modern AI Agent Architectures: How Multi-Agent Systems Like OpenHands and Claude Flow Work

M
S
Mohan & Sai Rasmi
· · 18 min read
Share
Modern AI Agent Architectures: How Multi-Agent Systems Like OpenHands and Claude Flow Work

Modern AI systems use multi-agent architectures where a planner decomposes tasks and specialized agents (research, coding, testing, review) execute them in parallel. This approach improves scalability, modularity, and output quality compared to single-LLM workflows. We compare two real approaches: OpenHands (dynamic agent spawning) and Claude Flow (role-based orchestration).


The way AI systems solve complex tasks is changing.

Instead of a single model grinding through a long prompt, modern frameworks break work across multiple coordinated agents - a planner that delegates, workers that execute, and an integrator that merges results.

If you’ve built microservices or worked with task queues, the pattern will feel familiar. These are distributed systems, except the workers are LLMs.

“These are distributed systems, except the workers are LLMs.”

I’ve been studying two approaches that represent different points on this spectrum: OpenHands (dynamic agent spawning) and Claude Flow (role-based agent orchestration). Here’s what I learned.


The Architecture at a Glance

Before diving into specifics, here’s the high-level pattern every multi-agent system follows:

Multi-Agent Architecture Overview

User
"Build an e-commerce page with auth, catalog, and checkout"
Planner Agent
Decomposes task into subtasks
Worker A
Auth
Worker B
Catalog
Worker C
Checkout
Integrator
Merges results & runs verification
Final Output

This is the scatter-gather pattern: decompose, fan out to parallel workers, fan in to merge. The two frameworks below implement this differently - but the skeleton is the same.

System-Level Component Map

Every production multi-agent system has these layers. The diagram below shows how data and control flow between them:

System Architecture: Components + Data/Control Flow

User Input

Orchestration Layer

Planner

Task graph + deps

Router

Model selection

Scheduler

Parallel dispatch

control
control
control

Agent Execution Layer

LLM Agent

Reasoning + generation

LLM Agent

Reasoning + generation

LLM Agent

Reasoning + generation

data
tools
RAG
eval

Infrastructure Layer

Tools & APIs Memory / Vector Store RAG Pipeline Guardrails Observability

Control flows top-down from orchestrator to agents. Data flows bidirectionally between agents and infrastructure. The orchestrator never touches tools directly — agents do.


OpenHands: Dynamic Sub-Agent Delegation

OpenHands uses a main planning agent that dynamically spawns short-lived sub-agents to handle pieces of a larger task. Think of it as a supervisor process forking workers.

The Components

Main Agent (Supervisor) - The orchestrator at the top. It receives the user request, holds the full context of the codebase, breaks the task into steps, and decides what to handle directly versus what to delegate.

Sub-Agents (Workers) - Ephemeral executors. Created dynamically via SDK calls like create_agent() or delegate(). Each one receives a focused prompt, does one thing, returns results, and terminates.

They can run on the same model or a cheaper one - cost optimization is built into the architecture.

Key Insight: Sub-agents don’t need full context. They receive only the minimum information required for their task - reducing token cost and improving output quality.

How It Flows

Say a user asks: “Build a simple e-commerce page.”

The main agent analyzes the request and breaks it down:

  1. Create UI components
  2. Build API endpoints
  3. Generate product seed data
  4. Write tests

Then it delegates. Each sub-agent gets minimal context - just enough to complete its task. They run in parallel, return their outputs, and terminate.

The main agent collects everything, merges the changes, runs tests, and produces the final result.

OpenHands: Dynamic Sub-Agent Delegation

User Request
Main Agent (Supervisor)
Analyzes & decomposes task
UI Components
Sub-Agent 1
API Endpoints
Sub-Agent 2
Seed Data
Sub-Agent 3
Tests
Sub-Agent 4
Main Agent (Merge & Verify)
Final Result

Real Example: E-Commerce Page Build

Let’s trace a concrete request through this system. A developer asks: “Build an e-commerce product page with a shopping cart.”

Here’s exactly what each agent receives and produces:

OpenHands: Real Delegation Example

MAIN AGENT DECOMPOSITION

Input: "Build an e-commerce product page with shopping cart"

Task 1: ProductCard component Task 2: Cart API + state Task 3: Product seed data Task 4: Integration tests

UI Agent

Input: Design specs
Model: Claude Sonnet

Output:

ProductCard.tsx
CartDrawer.tsx
ProductGrid.tsx

API Agent

Input: Data schema
Model: Claude Haiku

Output:

cart.ts (store)
api/products.ts
api/cart.ts

Data Agent

Input: Product model
Model: Claude Haiku

Output:

seed/products.json
(50 products)

Test Agent

Input: All file paths
Model: Claude Sonnet

Output:

__tests__/cart.test.ts
__tests__/products.test.ts

Main Agent: Merge & Verify

Resolves import conflicts • Runs npm test • Verifies build passes • Returns working project

Notice each sub-agent uses the minimum model needed. Data generation and simple API routes go to Haiku (cheap, fast). UI components and tests go to Sonnet (better reasoning). The planner uses the most capable model. This is how you optimize cost without sacrificing quality.

Why This Works

The power here is parallelism with isolation. Each sub-agent operates in a narrow scope, which means:

  • Less token waste - small context windows per worker
  • Fewer hallucinations - focused prompts produce more reliable outputs
  • Faster execution - parallel, not sequential
  • Cost control - workers can use cheaper models for straightforward tasks

It’s the same reason we don’t run monoliths in production anymore. Decomposition works.

Sequence: How a Request Flows Through OpenHands

Here’s the time-ordered sequence showing how the supervisor spawns, manages, and collects from sub-agents:

OpenHands: Request Lifecycle (Time →)

t=0 t=1 t=2 t=3 t=4
Supervisor
parse & plan
spawn()
collect & merge
verify
Worker A
UI components (Sonnet)
Worker B
API endpoints (Haiku)
Worker C
Seed data (Haiku)
Worker D
Tests (Sonnet)

Workers A-C run in parallel immediately after spawn. Worker D starts later (depends on A+B outputs). The supervisor blocks on collect(), then merges and verifies.

Key details in this sequence: the supervisor is not idle during worker execution - it monitors for early failures and can abort/respawn workers. Worker D (tests) has a dependency on A and B, so it starts later. This is a DAG, not a flat fan-out.


Claude Flow: Role-Based Agent Orchestration

Claude Flow takes a different approach. Instead of spawning temporary agents on the fly, it orchestrates persistent specialized roles - each responsible for a distinct phase of reasoning.

The Roles

RoleResponsibility
Planner AgentUnderstands the problem, breaks it into subtasks, assigns work
Research AgentGathers external information, summarizes docs, retrieves knowledge
Builder AgentImplements code, modifies files, generates architecture
Critic AgentReviews output for correctness, identifies errors, suggests improvements
Integrator AgentMerges outputs, resolves conflicts, produces the final deliverable

Agent Roles in Detail

Each role has distinct capabilities, tools, and model configurations:

Claude Flow: Agent Role Responsibilities

P
Planner

Responsibility

Understands the full request. Breaks it into ordered subtasks with dependencies. Decides which agent handles each piece. Holds the execution plan.

Tools: task planner, dependency graph Model: Opus (highest reasoning)
R
Research

Responsibility

Searches documentation, APIs, and knowledge bases. Gathers context the Builder needs. Summarizes findings into structured briefs.

Tools: web search, docs reader, RAG Model: Sonnet (balanced)
B
Builder

Responsibility

Writes production code based on research and specs. Creates files, modifies existing code, generates architecture. Focuses purely on implementation.

Tools: file editor, terminal, git Model: Sonnet (fast coding)
C
Critic

Responsibility

Reviews Builder output for bugs, security issues, and logic errors. Suggests improvements. Can reject and send work back for revision.

Tools: code analyzer, linter, test runner Model: Opus (deep analysis)
I
Integrator

Responsibility

Merges all agent outputs into a cohesive deliverable. Resolves conflicts between agents. Ensures consistency across the final result.

Tools: merge resolver, build system Model: Opus (coordination)

How It Flows

A request enters the system and hits the Planner. The Planner routes subtasks to the appropriate role. Each agent processes its piece and passes results forward.

The key difference from OpenHands: these agents aren’t spawned and killed - they’re predefined reasoning stages. The architecture emphasizes role specialization over dynamic delegation. Each agent is tuned (via system prompts, tool access, or temperature settings) for its specific job.

Claude Flow: Role-Based Agent Orchestration

User Request
Planner Agent
Decomposes problem & routes subtasks
context
specs
criteria
Research
Gathers info &
retrieves knowledge
Builder
Implements code &
generates architecture
Critic
Reviews for
correctness
findings
code
review
Integrator Agent
Merges outputs & resolves conflicts
Final Deliverable

Artifact Pipeline: What Each Stage Produces

The key to understanding Claude Flow is tracking what artifacts each role produces and consumes. Each stage transforms its input into a specific deliverable:

Claude Flow: Artifact Pipeline (Input → Stage → Output)

INPUT

user_request.txt

Planner

Opus

OUTPUT

task_graph.json

dependency_map.json

INPUT

task_graph.json

Research

Sonnet

OUTPUT

research_brief.md

api_references.json

INPUT

research_brief.md

task_graph.json

Builder

Sonnet

OUTPUT

src/**/*.ts (code)

schema.prisma

INPUT

src/**/*.ts

test_results.log

Critic

Opus

OUTPUT

review.md

fix_requests[]

if fix_requests.length > 0 → loop back to Builder

INPUT

approved code + review

Integrator

Opus

OUTPUT

merged_repo/

build_log.txt ✓

Each stage produces typed artifacts that flow to the next. The Critic→Builder loop is the key quality gate - it runs until review passes or max retries are hit.

Notice the feedback loop between Critic and Builder. This is what makes Claude Flow’s sequential pipeline more rigorous than a flat fan-out: work is iteratively refined before it reaches the integrator.


Comparing the Two Approaches

FeatureOpenHandsClaude Flow
Agent creationDynamic, on-demandPredefined roles
Agent lifespanShort-lived (ephemeral)Persistent across the workflow
Execution modelParallel tasksSequential reasoning stages
StrengthsSpeed, cost efficiencyDepth, structured review
Best forCoding automationComplex reasoning workflows

Neither approach is strictly better. OpenHands optimizes for throughput - get many things done fast. Claude Flow optimizes for quality - route work through specialized stages that each add rigor.

Side-by-Side: How Each Handles the Same Task

OpenHands

Task In
Supervisor
W1 W2 W3
Merge
Done

Parallel execution
Workers are ephemeral
Optimized for speed

Claude Flow

Task In
Planner
Research
Build
Review
Integrate

Sequential pipeline
Roles are persistent
Optimized for quality

In practice, production systems will likely blend both.


Real-World Example: Building an E-Commerce App

Let’s walk through a complete example to make this concrete. A developer asks an AI system: “Build an e-commerce app with product listings, user auth, and a checkout flow.”

First, here’s the task dependency graph (DAG) the planner would produce. Not all tasks can run in parallel - some have hard dependencies:

E-Commerce Task DAG: Dependencies Between Tasks

Tier 0: No dependencies (run immediately)
T1 DB Schema
Haiku • ~2min
T2 Auth Module
Sonnet • ~4min
T3 Seed Data
Haiku • ~1min
T4 UI Shell
Sonnet • ~3min
T1,T3
T2
T4
Tier 1: Blocked until dependencies resolve
T5 Product API
needs T1,T3
T6 Cart + Checkout
needs T2 (sessions)
T7 Product Pages
needs T4 (shell)
T5,T6,T7
Tier 2: Final integration (all prior tasks complete)
T8 Integration Tests
needs T5,T6,T7
T9 Final Merge
needs T8 (all pass)

The planner builds this DAG before any agent starts. Tier 0 tasks run in parallel. Tier 1 tasks start as soon as their specific dependencies resolve - not when all of Tier 0 finishes. This is the critical optimization over naive sequential execution.

Here’s how each framework would then distribute this work:

E-Commerce Build: Task Distribution Across Agents

1

Planning Phase

Planner identifies 3 independent workstreams: UI/Frontend, Backend/API, and Testing/QA. Creates a dependency graph: Auth must complete before Checkout can reference user sessions.

2

Parallel Execution

UI Agent

  • ProductCard component
  • ProductGrid with filters
  • CartDrawer with quantities
  • CheckoutForm with validation
  • LoginModal + SignupFlow

Backend Agent

  • POST /api/auth/login
  • POST /api/auth/register
  • GET /api/products
  • POST /api/cart/add
  • POST /api/checkout

Test Agent

  • Auth flow e2e tests
  • Cart CRUD unit tests
  • Checkout integration tests
  • Product API load tests
  • UI component snapshots
3

Review Phase

Critic agent reviews all outputs: catches SQL injection in auth endpoint, identifies missing CSRF protection on checkout, flags inconsistent error handling between API routes. Sends backend work back for revision.

4

Integration Phase

Integrator merges all files, resolves import paths, wires API calls to frontend components, runs full test suite. Final output: a working repo with npm run dev ready to go.

The key insight: a single LLM attempting this would lose context by the checkout phase. Multi-agent systems avoid this by keeping each worker’s context window focused on one piece of the problem.


The Bigger Pattern

Strip away the AI-specific details and what you have is a well-known distributed systems pattern:

  1. Plan - decompose work
  2. Delegate - assign to workers
  3. Execute - run independently
  4. Integrate - merge and verify

This is MapReduce. This is scatter-gather. This is a workflow engine.

The Universal Pattern

Plan
Decompose work
Delegate
Assign to workers
Execute
Run independently
Integrate
Merge & verify

The difference is that the workers are language models instead of deterministic functions. That introduces new failure modes - hallucination, drift, inconsistency - but the architectural response is the same: isolate, specialize, verify, integrate.

“Modern AI agent frameworks aren’t inventing new patterns - they’re applying battle-tested distributed systems thinking to LLM orchestration.”


When Things Go Wrong: Partial Failure and Recovery

The diagrams above show the happy path. In practice, agents fail. Code doesn’t compile. Tests don’t pass. An LLM hallucinates an import that doesn’t exist. What matters is how the system recovers.

Here’s a concrete failure scenario:

Failure Recovery: Builder Agent Produces Broken Code

Builder
checkout.ts (generated)
Test Run
FAIL: 3 tests failed — TypeError: Cannot read 'session'
Retry 1
Builder gets: error log + original prompt + "fix the session import"
Test Run
FAIL: 1 test failed — progress, but edge case remains
Retry 2
PASS: All 12 tests passed

Escalation policy (if Retry 2 also fails):

1. Upgrade model: re-run with Opus instead of Sonnet
2. Expand context: include related files the agent didn't originally see
3. Human-in-the-loop: pause execution, surface the error to the developer with a diff, ask for guidance
4. Abort & report: mark task as failed, log diagnostics, continue with remaining tasks

The retry loop is bounded (typically max_retries=3). Each retry includes the previous error in the prompt so the agent doesn't repeat the same mistake. Exponential backoff prevents rate limit issues.

The recovery pattern follows a standard escalation ladder: retry with context → upgrade model → expand scope → human intervention → abort. Production systems typically set max_retries=3 per agent with exponential backoff (1s, 4s, 16s) between attempts.


Smart Model Routing: Cost vs. Capability

Not every subtask needs your most expensive model. A well-designed orchestrator routes tasks based on complexity, risk, and cost — the same way you’d choose between a senior engineer and a junior dev.

Here’s a routing policy table used in practice:

Task TypeModelCost/1K tokensWhy
Seed data generationHaiku~$0.00025Structured output, low reasoning needed
CRUD API endpointsHaiku~$0.00025Template-based, well-defined patterns
UI component creationSonnet~$0.003Needs design sense, moderate reasoning
Auth / security logicSonnet~$0.003Higher stakes, needs careful implementation
Architecture planningOpus~$0.015Complex decomposition, dependency analysis
Code review / critiqueOpus~$0.015Deep analysis, needs to catch subtle bugs
Conflict resolutionOpus~$0.015Cross-module reasoning, integration logic

The router makes this decision per-task, not per-session. A single workflow might use all three tiers:

Cost Optimization: Model Routing for E-Commerce Build

Opus

Planning
Code review
Integration

~$0.45 total

3 tasks • ~30K tokens

Sonnet

UI components
Auth module
Cart logic

~$0.12 total

3 tasks • ~40K tokens

Haiku

Seed data
CRUD routes
Schema gen

~$0.01 total

3 tasks • ~40K tokens

Total: ~$0.58  vs  ~$1.65 (all Opus)

65% cost reduction with no quality loss on routine tasks

The routing decision can be simple — a mapping of task category to model tier — or learned from historical performance data. The key insight: most tokens in a multi-agent workflow are spent on routine work that doesn’t need your best model.


Orchestration in Practice: Pseudo-Code

Here’s the orchestration loop that ties everything together. This is simplified, but it captures the real control flow of a production multi-agent system:

async def orchestrate(user_request: str) -> Result:
    # 1. Plan: decompose into a task DAG
    task_graph = await planner.decompose(
        request=user_request,
        model="opus"  # planning needs the strongest model
    )

    # 2. Execute: run tasks respecting dependencies
    results = {}
    for tier in task_graph.tiers():
        # Tasks within a tier run in parallel
        tier_tasks = [
            execute_agent(
                task=task,
                model=router.select_model(task),  # Haiku/Sonnet/Opus
                context=gather_context(task, results),
                max_retries=3
            )
            for task in tier.tasks
        ]
        tier_results = await asyncio.gather(*tier_tasks)

        # Check for failures
        for task, result in zip(tier.tasks, tier_results):
            if result.failed:
                result = await escalate(task, result)  # retry → upgrade → human
            results[task.id] = result

    # 3. Review: critic checks all outputs
    review = await critic.review(
        results=results,
        model="opus"  # review needs deep analysis
    )

    if review.has_fixes:
        # Loop: send fix requests back to builder
        for fix in review.fixes:
            results[fix.task_id] = await execute_agent(
                task=fix.revised_task,
                model="sonnet",
                context=fix.error_context + results[fix.task_id]
            )

    # 4. Integrate: merge everything into final output
    return await integrator.merge(
        results=results,
        model="opus"
    )

The key patterns in this code:

  • Tiered execution — tasks run in parallel within tiers, sequentially across tiers
  • Smart routingrouter.select_model() picks the cheapest model that can handle the task
  • Bounded retry — failures escalate through retry → model upgrade → human → abort
  • Critic loop — review happens after all tasks complete, not inline with each task

This is ~50 lines but it captures the architecture of systems processing millions of agent tasks per day.


Core Components of Modern AI Agent Systems

Regardless of which framework you use, production AI agent systems share the same foundational layers:

LayerPurposeExamples
Planner / OrchestratorDecomposes tasks, manages execution order, handles dependenciesTask graphs, DAG schedulers
LLM Reasoning EngineCore intelligence - understands instructions, generates outputsClaude, GPT-4, Gemini
Tools & APIsExternal capabilities agents can invokeFile system, terminal, web browser, databases
Memory SystemsShort-term (conversation) and long-term (persistent) contextContext windows, vector stores, session state
Retrieval (RAG)Grounds agent responses in real dataEmbeddings search, document retrieval, knowledge bases
Guardrails & SafetyPrevents harmful outputs, enforces constraintsContent filters, output validation, permission scoping
Observability & EvaluationMonitors agent behavior, measures qualityLogging, tracing, automated evals, cost tracking

These layers are common across frameworks like LangGraph, CrewAI, AutoGen, and OpenHands. The difference between frameworks is primarily in how they wire these layers together - not which layers they include.

Key insight: If you’re evaluating agent frameworks, don’t just compare features. Compare how they handle the hard problems: error recovery, context management, cost optimization, and human-in-the-loop checkpoints.


What This Means for Developers

If you’re building AI-powered tools, the single-agent-in-a-loop pattern will hit a ceiling fast. The path forward looks a lot like the path backend engineering already took:

  • Decompose tasks instead of stuffing everything into one prompt
  • Specialize agents instead of asking one model to do everything
  • Run in parallel where tasks are independent
  • Add review stages where correctness matters
  • Use cheaper models for routine work, expensive ones for planning and judgment

The tooling is still early, but the architecture is clear. The best AI systems will be the ones that look most like well-designed distributed systems.


Further Reading & Frameworks

If you want to start building with multi-agent architectures, here are the frameworks worth exploring:

  • OpenHands - Open-source platform for AI-powered software development agents
  • LangGraph - Framework for building stateful, multi-agent applications with LLMs
  • CrewAI - Role-based multi-agent orchestration framework
  • AutoGen - Microsoft’s framework for building multi-agent conversational systems

Each takes a slightly different approach to the patterns described in this article - but they all converge on the same core idea: specialized agents, coordinated by a planner, are more capable than a single model working alone.


Originally published at aiagentlab.dev

M

Mohan

Software engineer writing about AI, distributed systems, and the craft of building great software.

S

Sai Rasmi

Co-author

Share

Stay up to date

Get notified when I publish new articles. No spam, unsubscribe anytime.

No spam. Unsubscribe anytime.