Modern AI Agent Architectures: How Multi-Agent Systems Like OpenHands and Claude Flow Work

Modern AI systems use multi-agent architectures where a planner decomposes tasks and specialized agents (research, coding, testing, review) execute them in parallel. This approach improves scalability, modularity, and output quality compared to single-LLM workflows. We compare two real approaches: OpenHands (dynamic agent spawning) and Claude Flow (role-based orchestration).

The way AI systems solve complex tasks is changing.

Instead of a single model grinding through a long prompt, modern frameworks break work across multiple coordinated agents - a planner that delegates, workers that execute, and an integrator that merges results.

If you’ve built microservices or worked with task queues, the pattern will feel familiar. These are distributed systems, except the workers are LLMs.

“These are distributed systems, except the workers are LLMs.”

I’ve been studying two approaches that represent different points on this spectrum: OpenHands (dynamic agent spawning) and Claude Flow (role-based agent orchestration). Here’s what I learned.

The Architecture at a Glance

Before diving into specifics, here’s the high-level pattern every multi-agent system follows:

Multi-Agent Architecture Overview

User

"Build an e-commerce page with auth, catalog, and checkout"

Planner Agent
Decomposes task into subtasks

Worker A
Auth

Worker B
Catalog

Worker C
Checkout

Integrator
Merges results & runs verification

Final Output

This is the scatter-gather pattern: decompose, fan out to parallel workers, fan in to merge. The two frameworks below implement this differently - but the skeleton is the same.

System-Level Component Map

Every production multi-agent system has these layers. The diagram below shows how data and control flow between them:

System Architecture: Components + Data/Control Flow

User Input

Orchestration Layer

Planner

Task graph + deps

Router

Model selection

Scheduler

Parallel dispatch

control

Agent Execution Layer

LLM Agent

Reasoning + generation

LLM Agent

Reasoning + generation

LLM Agent

Reasoning + generation

data

tools

RAG

eval

Infrastructure Layer

Tools & APIs Memory / Vector Store RAG Pipeline Guardrails Observability

Control flows top-down from orchestrator to agents. Data flows bidirectionally between agents and infrastructure. The orchestrator never touches tools directly — agents do.

OpenHands: Dynamic Sub-Agent Delegation

OpenHands uses a main planning agent that dynamically spawns short-lived sub-agents to handle pieces of a larger task. Think of it as a supervisor process forking workers.

The Components

Main Agent (Supervisor) - The orchestrator at the top. It receives the user request, holds the full context of the codebase, breaks the task into steps, and decides what to handle directly versus what to delegate.

Sub-Agents (Workers) - Ephemeral executors. Created dynamically via SDK calls like create_agent() or delegate(). Each one receives a focused prompt, does one thing, returns results, and terminates.

They can run on the same model or a cheaper one - cost optimization is built into the architecture.

Key Insight: Sub-agents don’t need full context. They receive only the minimum information required for their task - reducing token cost and improving output quality.

How It Flows

Say a user asks: “Build a simple e-commerce page.”

The main agent analyzes the request and breaks it down:

Create UI components
Build API endpoints
Generate product seed data
Write tests

Then it delegates. Each sub-agent gets minimal context - just enough to complete its task. They run in parallel, return their outputs, and terminate.

The main agent collects everything, merges the changes, runs tests, and produces the final result.

OpenHands: Dynamic Sub-Agent Delegation

User Request

Main Agent (Supervisor)
Analyzes & decomposes task

UI Components
Sub-Agent 1

API Endpoints
Sub-Agent 2

Seed Data
Sub-Agent 3

Tests
Sub-Agent 4

Main Agent (Merge & Verify)

Final Result

Real Example: E-Commerce Page Build

Let’s trace a concrete request through this system. A developer asks: “Build an e-commerce product page with a shopping cart.”

Here’s exactly what each agent receives and produces:

OpenHands: Real Delegation Example

MAIN AGENT DECOMPOSITION

Input: "Build an e-commerce product page with shopping cart"

Task 1: ProductCard component Task 2: Cart API + state Task 3: Product seed data Task 4: Integration tests

UI Agent

Input: Design specs
Model: Claude Sonnet

Output:

ProductCard.tsx
CartDrawer.tsx
ProductGrid.tsx

API Agent

Input: Data schema
Model: Claude Haiku

Output:

cart.ts (store)
api/products.ts
api/cart.ts

Data Agent

Input: Product model
Model: Claude Haiku

Output:

seed/products.json
(50 products)

Test Agent

Input: All file paths
Model: Claude Sonnet

Output:

__tests__/cart.test.ts
__tests__/products.test.ts

Main Agent: Merge & Verify

Resolves import conflicts • Runs npm test • Verifies build passes • Returns working project

Notice each sub-agent uses the minimum model needed. Data generation and simple API routes go to Haiku (cheap, fast). UI components and tests go to Sonnet (better reasoning). The planner uses the most capable model. This is how you optimize cost without sacrificing quality.

Why This Works

The power here is parallelism with isolation. Each sub-agent operates in a narrow scope, which means:

Less token waste - small context windows per worker
Fewer hallucinations - focused prompts produce more reliable outputs
Faster execution - parallel, not sequential
Cost control - workers can use cheaper models for straightforward tasks

It’s the same reason we don’t run monoliths in production anymore. Decomposition works.

Sequence: How a Request Flows Through OpenHands

Here’s the time-ordered sequence showing how the supervisor spawns, manages, and collects from sub-agents:

OpenHands: Request Lifecycle (Time →)

t=0 t=1 t=2 t=3 t=4

Supervisor

parse & plan

spawn()

collect & merge

verify

Worker A

UI components (Sonnet)

Worker B

API endpoints (Haiku)

Worker C

Seed data (Haiku)

Worker D

Tests (Sonnet)

Workers A-C run in parallel immediately after spawn. Worker D starts later (depends on A+B outputs). The supervisor blocks on collect(), then merges and verifies.

Key details in this sequence: the supervisor is not idle during worker execution - it monitors for early failures and can abort/respawn workers. Worker D (tests) has a dependency on A and B, so it starts later. This is a DAG, not a flat fan-out.

Claude Flow: Role-Based Agent Orchestration

Claude Flow takes a different approach. Instead of spawning temporary agents on the fly, it orchestrates persistent specialized roles - each responsible for a distinct phase of reasoning.

The Roles

Role	Responsibility
Planner Agent	Understands the problem, breaks it into subtasks, assigns work
Research Agent	Gathers external information, summarizes docs, retrieves knowledge
Builder Agent	Implements code, modifies files, generates architecture
Critic Agent	Reviews output for correctness, identifies errors, suggests improvements
Integrator Agent	Merges outputs, resolves conflicts, produces the final deliverable

Agent Roles in Detail

Each role has distinct capabilities, tools, and model configurations:

Claude Flow: Agent Role Responsibilities

Planner

Responsibility

Understands the full request. Breaks it into ordered subtasks with dependencies. Decides which agent handles each piece. Holds the execution plan.

Tools: task planner, dependency graph Model: Opus (highest reasoning)

Research

Responsibility

Searches documentation, APIs, and knowledge bases. Gathers context the Builder needs. Summarizes findings into structured briefs.

Tools: web search, docs reader, RAG Model: Sonnet (balanced)

Builder

Responsibility

Writes production code based on research and specs. Creates files, modifies existing code, generates architecture. Focuses purely on implementation.

Tools: file editor, terminal, git Model: Sonnet (fast coding)

Critic

Responsibility

Reviews Builder output for bugs, security issues, and logic errors. Suggests improvements. Can reject and send work back for revision.

Tools: code analyzer, linter, test runner Model: Opus (deep analysis)

Integrator

Responsibility

Merges all agent outputs into a cohesive deliverable. Resolves conflicts between agents. Ensures consistency across the final result.

Tools: merge resolver, build system Model: Opus (coordination)

How It Flows

A request enters the system and hits the Planner. The Planner routes subtasks to the appropriate role. Each agent processes its piece and passes results forward.

The key difference from OpenHands: these agents aren’t spawned and killed - they’re predefined reasoning stages. The architecture emphasizes role specialization over dynamic delegation. Each agent is tuned (via system prompts, tool access, or temperature settings) for its specific job.

Claude Flow: Role-Based Agent Orchestration

User Request

Planner Agent
Decomposes problem & routes subtasks

context

specs

criteria

Research
Gathers info &
retrieves knowledge

Builder
Implements code &
generates architecture

Critic
Reviews for
correctness

findings

code

review

Integrator Agent
Merges outputs & resolves conflicts

Final Deliverable

Artifact Pipeline: What Each Stage Produces

The key to understanding Claude Flow is tracking what artifacts each role produces and consumes. Each stage transforms its input into a specific deliverable:

Claude Flow: Artifact Pipeline (Input → Stage → Output)

INPUT

user_request.txt

Planner

Opus

OUTPUT

task_graph.json

dependency_map.json

INPUT

task_graph.json

Research

Sonnet

OUTPUT

research_brief.md

api_references.json

INPUT

research_brief.md

task_graph.json

Builder

Sonnet

OUTPUT

src/**/*.ts (code)

schema.prisma

INPUT

src/**/*.ts

test_results.log

Critic

Opus

OUTPUT

review.md

fix_requests[]

if fix_requests.length > 0 → loop back to Builder

INPUT

approved code + review

Integrator

Opus

OUTPUT

merged_repo/

build_log.txt ✓

Each stage produces typed artifacts that flow to the next. The Critic→Builder loop is the key quality gate - it runs until review passes or max retries are hit.

Notice the feedback loop between Critic and Builder. This is what makes Claude Flow’s sequential pipeline more rigorous than a flat fan-out: work is iteratively refined before it reaches the integrator.

Comparing the Two Approaches

Feature	OpenHands	Claude Flow
Agent creation	Dynamic, on-demand	Predefined roles
Agent lifespan	Short-lived (ephemeral)	Persistent across the workflow
Execution model	Parallel tasks	Sequential reasoning stages
Strengths	Speed, cost efficiency	Depth, structured review
Best for	Coding automation	Complex reasoning workflows

Neither approach is strictly better. OpenHands optimizes for throughput - get many things done fast. Claude Flow optimizes for quality - route work through specialized stages that each add rigor.

Side-by-Side: How Each Handles the Same Task

OpenHands

Task In

Supervisor

W1 W2 W3

Merge

Done

Parallel execution
Workers are ephemeral
Optimized for speed

Claude Flow

Task In

Planner

Research

Build

Review

Integrate

Sequential pipeline
Roles are persistent
Optimized for quality

In practice, production systems will likely blend both.

Real-World Example: Building an E-Commerce App

Let’s walk through a complete example to make this concrete. A developer asks an AI system: “Build an e-commerce app with product listings, user auth, and a checkout flow.”

First, here’s the task dependency graph (DAG) the planner would produce. Not all tasks can run in parallel - some have hard dependencies:

E-Commerce Task DAG: Dependencies Between Tasks

Tier 0: No dependencies (run immediately)

T1 DB Schema
Haiku • ~2min

T2 Auth Module
Sonnet • ~4min

T3 Seed Data
Haiku • ~1min

T4 UI Shell
Sonnet • ~3min

T1,T3

Tier 1: Blocked until dependencies resolve

T5 Product API
needs T1,T3

T6 Cart + Checkout
needs T2 (sessions)

T7 Product Pages
needs T4 (shell)

T5,T6,T7

Tier 2: Final integration (all prior tasks complete)

T8 Integration Tests
needs T5,T6,T7

T9 Final Merge
needs T8 (all pass)

The planner builds this DAG before any agent starts. Tier 0 tasks run in parallel. Tier 1 tasks start as soon as their specific dependencies resolve - not when all of Tier 0 finishes. This is the critical optimization over naive sequential execution.

Here’s how each framework would then distribute this work:

E-Commerce Build: Task Distribution Across Agents

Planning Phase

Planner identifies 3 independent workstreams: UI/Frontend, Backend/API, and Testing/QA. Creates a dependency graph: Auth must complete before Checkout can reference user sessions.

Parallel Execution

UI Agent

ProductCard component
ProductGrid with filters
CartDrawer with quantities
CheckoutForm with validation
LoginModal + SignupFlow

Backend Agent

POST /api/auth/login
POST /api/auth/register
GET /api/products
POST /api/cart/add
POST /api/checkout

Test Agent

Auth flow e2e tests
Cart CRUD unit tests
Checkout integration tests
Product API load tests
UI component snapshots

Review Phase

Critic agent reviews all outputs: catches SQL injection in auth endpoint, identifies missing CSRF protection on checkout, flags inconsistent error handling between API routes. Sends backend work back for revision.

Integration Phase

Integrator merges all files, resolves import paths, wires API calls to frontend components, runs full test suite. Final output: a working repo with npm run dev ready to go.

The key insight: a single LLM attempting this would lose context by the checkout phase. Multi-agent systems avoid this by keeping each worker’s context window focused on one piece of the problem.

The Bigger Pattern

Strip away the AI-specific details and what you have is a well-known distributed systems pattern:

Plan - decompose work
Delegate - assign to workers
Execute - run independently
Integrate - merge and verify

This is MapReduce. This is scatter-gather. This is a workflow engine.

The Universal Pattern

Plan

Decompose work

Delegate

Assign to workers

Execute

Run independently

Integrate

Merge & verify

The difference is that the workers are language models instead of deterministic functions. That introduces new failure modes - hallucination, drift, inconsistency - but the architectural response is the same: isolate, specialize, verify, integrate.

“Modern AI agent frameworks aren’t inventing new patterns - they’re applying battle-tested distributed systems thinking to LLM orchestration.”

When Things Go Wrong: Partial Failure and Recovery

The diagrams above show the happy path. In practice, agents fail. Code doesn’t compile. Tests don’t pass. An LLM hallucinates an import that doesn’t exist. What matters is how the system recovers.

Here’s a concrete failure scenario:

Failure Recovery: Builder Agent Produces Broken Code

Builder

checkout.ts (generated)

Test Run

FAIL: 3 tests failed — TypeError: Cannot read 'session'

Retry 1

Builder gets: error log + original prompt + "fix the session import"

Test Run

FAIL: 1 test failed — progress, but edge case remains

Retry 2

PASS: All 12 tests passed ✓

Escalation policy (if Retry 2 also fails):

1. Upgrade model: re-run with Opus instead of Sonnet
2. Expand context: include related files the agent didn't originally see
3. Human-in-the-loop: pause execution, surface the error to the developer with a diff, ask for guidance
4. Abort & report: mark task as failed, log diagnostics, continue with remaining tasks

The retry loop is bounded (typically max_retries=3). Each retry includes the previous error in the prompt so the agent doesn't repeat the same mistake. Exponential backoff prevents rate limit issues.

The recovery pattern follows a standard escalation ladder: retry with context → upgrade model → expand scope → human intervention → abort. Production systems typically set max_retries=3 per agent with exponential backoff (1s, 4s, 16s) between attempts.

Smart Model Routing: Cost vs. Capability

Not every subtask needs your most expensive model. A well-designed orchestrator routes tasks based on complexity, risk, and cost — the same way you’d choose between a senior engineer and a junior dev.

Here’s a routing policy table used in practice:

Task Type	Model	Cost/1K tokens	Why
Seed data generation	Haiku	~$0.00025	Structured output, low reasoning needed
CRUD API endpoints	Haiku	~$0.00025	Template-based, well-defined patterns
UI component creation	Sonnet	~$0.003	Needs design sense, moderate reasoning
Auth / security logic	Sonnet	~$0.003	Higher stakes, needs careful implementation
Architecture planning	Opus	~$0.015	Complex decomposition, dependency analysis
Code review / critique	Opus	~$0.015	Deep analysis, needs to catch subtle bugs
Conflict resolution	Opus	~$0.015	Cross-module reasoning, integration logic

The router makes this decision per-task, not per-session. A single workflow might use all three tiers:

Cost Optimization: Model Routing for E-Commerce Build

Opus

Planning
Code review
Integration

~$0.45 total

3 tasks • ~30K tokens

Sonnet

UI components
Auth module
Cart logic

~$0.12 total

3 tasks • ~40K tokens

Haiku

Seed data
CRUD routes
Schema gen

~$0.01 total

3 tasks • ~40K tokens

Total: ~$0.58 vs ~$1.65 (all Opus)

65% cost reduction with no quality loss on routine tasks

The routing decision can be simple — a mapping of task category to model tier — or learned from historical performance data. The key insight: most tokens in a multi-agent workflow are spent on routine work that doesn’t need your best model.

Orchestration in Practice: Pseudo-Code

Here’s the orchestration loop that ties everything together. This is simplified, but it captures the real control flow of a production multi-agent system:

async def orchestrate(user_request: str) -> Result:
    # 1. Plan: decompose into a task DAG
    task_graph = await planner.decompose(
        request=user_request,
        model="opus"  # planning needs the strongest model
    )

    # 2. Execute: run tasks respecting dependencies
    results = {}
    for tier in task_graph.tiers():
        # Tasks within a tier run in parallel
        tier_tasks = [
            execute_agent(
                task=task,
                model=router.select_model(task),  # Haiku/Sonnet/Opus
                context=gather_context(task, results),
                max_retries=3
            )
            for task in tier.tasks
        ]
        tier_results = await asyncio.gather(*tier_tasks)

        # Check for failures
        for task, result in zip(tier.tasks, tier_results):
            if result.failed:
                result = await escalate(task, result)  # retry → upgrade → human
            results[task.id] = result

    # 3. Review: critic checks all outputs
    review = await critic.review(
        results=results,
        model="opus"  # review needs deep analysis
    )

    if review.has_fixes:
        # Loop: send fix requests back to builder
        for fix in review.fixes:
            results[fix.task_id] = await execute_agent(
                task=fix.revised_task,
                model="sonnet",
                context=fix.error_context + results[fix.task_id]
            )

    # 4. Integrate: merge everything into final output
    return await integrator.merge(
        results=results,
        model="opus"
    )

The key patterns in this code:

Tiered execution — tasks run in parallel within tiers, sequentially across tiers
Smart routing — router.select_model() picks the cheapest model that can handle the task
Bounded retry — failures escalate through retry → model upgrade → human → abort
Critic loop — review happens after all tasks complete, not inline with each task

This is ~50 lines but it captures the architecture of systems processing millions of agent tasks per day.

Core Components of Modern AI Agent Systems

Regardless of which framework you use, production AI agent systems share the same foundational layers:

Layer	Purpose	Examples
Planner / Orchestrator	Decomposes tasks, manages execution order, handles dependencies	Task graphs, DAG schedulers
LLM Reasoning Engine	Core intelligence - understands instructions, generates outputs	Claude, GPT-4, Gemini
Tools & APIs	External capabilities agents can invoke	File system, terminal, web browser, databases
Memory Systems	Short-term (conversation) and long-term (persistent) context	Context windows, vector stores, session state
Retrieval (RAG)	Grounds agent responses in real data	Embeddings search, document retrieval, knowledge bases
Guardrails & Safety	Prevents harmful outputs, enforces constraints	Content filters, output validation, permission scoping
Observability & Evaluation	Monitors agent behavior, measures quality	Logging, tracing, automated evals, cost tracking

These layers are common across frameworks like LangGraph, CrewAI, AutoGen, and OpenHands. The difference between frameworks is primarily in how they wire these layers together - not which layers they include.

Key insight: If you’re evaluating agent frameworks, don’t just compare features. Compare how they handle the hard problems: error recovery, context management, cost optimization, and human-in-the-loop checkpoints.

What This Means for Developers

If you’re building AI-powered tools, the single-agent-in-a-loop pattern will hit a ceiling fast. The path forward looks a lot like the path backend engineering already took:

Decompose tasks instead of stuffing everything into one prompt
Specialize agents instead of asking one model to do everything
Run in parallel where tasks are independent
Add review stages where correctness matters
Use cheaper models for routine work, expensive ones for planning and judgment

The tooling is still early, but the architecture is clear. The best AI systems will be the ones that look most like well-designed distributed systems.

Modern AI Agent Architectures: How Multi-Agent Systems Like OpenHands and Claude Flow Work

The Architecture at a Glance

System-Level Component Map

OpenHands: Dynamic Sub-Agent Delegation

The Components

How It Flows

Real Example: E-Commerce Page Build

Why This Works

Sequence: How a Request Flows Through OpenHands

Claude Flow: Role-Based Agent Orchestration

The Roles

Agent Roles in Detail

How It Flows

Artifact Pipeline: What Each Stage Produces

Comparing the Two Approaches

Real-World Example: Building an E-Commerce App

The Bigger Pattern

When Things Go Wrong: Partial Failure and Recovery

Smart Model Routing: Cost vs. Capability

Orchestration in Practice: Pseudo-Code

Core Components of Modern AI Agent Systems

What This Means for Developers

Further Reading & Frameworks

Related Articles

Distributed Systems Fundamentals Every Developer Should Know

TypeScript Patterns That Scale: Lessons from Large Codebases

React Server Components: What They Actually Change

Stay up to date