Delegate Routine Work to Cheaper Models
Goal: Add a task-delegation pattern where a large orchestrator model fans out routine subtasks to a smaller, cheaper worker model via the openrouter:subagent server tool.
Outcome: Your app sends complex tasks to an orchestrator that automatically delegates focused work (summarization, extraction, reformatting, drafting) to a cheap worker, cutting token cost on bulk generation while keeping planning quality high.
Want your coding agent to add this workflow to your app? Copy this prompt.
Subagent is a beta server tool. Each delegated task runs a separate model
call, so it adds cost and latency per delegation. This recipe keeps the worker
model cheap and caps output tokens. Use returned usage.cost when present, or
estimate spend from the worker model’s current pricing before widening the
delegation scope.
Before you start
You need:
- Node.js 20 or newer
- An OpenRouter API key in
OPENROUTER_API_KEY - A workflow that already calls OpenRouter (Chat Completions, Responses, or Agent SDK)
- An orchestrator model that supports tool calling (e.g.
~anthropic/claude-opus-latest). Check the model’s capabilities on the model page before choosing. - A cheaper worker model for subtasks (e.g.
~anthropic/claude-haiku-latest). Browse model pricing to find the cheapest model that meets your quality bar.
Tilde-latest aliases like ~anthropic/claude-haiku-latest auto-resolve to the newest version in that model family. Find available aliases on each model’s page at /models. You can also use exact slugs (e.g. anthropic/claude-haiku-4.5) when you need to pin a specific version.
If you’re starting a new TypeScript agent, use the Agent SDK callModel API for the orchestrator loop. The samples below use Chat Completions so the server-tool request shape is visible, but the delegation pattern works the same way inside an Agent SDK workflow.
Use these references for exact schemas:
- Subagent server tool
- Agent SDK
callModeloverview - Create a chat completion
- Create a response
- TypeScript SDK Chat reference
What you’re building
This recipe adds a task-delegation layer to a multi-step workflow.
The orchestrator model receives a complex request and decides how to break it apart. For each piece that doesn’t need its full capability, it calls openrouter:subagent with a task_name and a task_description. The worker executes the task and returns its outcome. The orchestrator integrates all outcomes into the final response.
The cost split: the orchestrator handles planning and integration (small token budget), while the worker handles bulk generation (cheap per token). A request with 3 delegated subtasks on a 50x cheaper worker model can cut total cost by 80%+ on the generation-heavy portions.
Why server-side delegation instead of two separate API calls? You could orchestrate client-side: call the big model, parse its plan, call the small model yourself, feed results back. Server-side subagent collapses that into a single request. The orchestrator invokes workers mid-generation without a round-trip to your server, keeps intermediate prompts private within OpenRouter’s agentic loop, and can run multiple delegations in one generation pass. Your app makes one call and gets the integrated answer back.
1. Add the subagent tool to your request
The minimal setup: one openrouter:subagent entry in the tools array with the worker model pinned in parameters.
Wire the request body into your app’s existing request path. Here’s the shape of the call and response parsing:
The response follows the standard Chat Completions format. Server tools resolve server-side: the orchestrator’s subagent calls happen inside OpenRouter’s agentic loop, so the client response contains only the final integrated answer in message.content. The usage object reflects the combined token spend per Server tools: Usage Tracking.
The orchestrator decides whether and when to delegate. Each delegation passes two arguments:
task_name: a short label (e.g.summarize-breaking-changes)task_description: everything the worker needs, including all context, inputs, and the expected output format
The worker sees only task_description. It has no access to the parent conversation, so the orchestrator must be explicit about what it wants back.
2. Read the tool result
On success, the subagent returns a JSON result with the worker’s output:
On failure:
The orchestrator receives the result as a tool response and continues generating. It can delegate more tasks, integrate outcomes it already has, or finish the response. Subagent calls are capped per request (see the reference page for current limits).
3. Give the worker its own tools
When a subtask needs external data, pass server tools to the worker. The worker runs as a mini agent over those tools before producing its outcome.
The worker’s tool use happens inside the subagent call. Only its final text is returned to the orchestrator. The orchestrator never sees the worker’s intermediate tool calls or search results, just the finished outcome.
Two constraints on nested tools:
- Only OpenRouter server tools work (e.g.
openrouter:web_search,openrouter:web_fetch,openrouter:datetime). Function tools are rejected with a400because the worker has no client-side executor. - The subagent tool can’t list itself. Recursion guards prevent the worker from re-entering the subagent.
4. Tune the worker for cost and quality
The subagent’s parameters let you control how the worker generates. Use them to keep cost predictable.
The full parameter reference is at Subagent server tool.
Subagent works with both non-streaming and streaming requests. With streaming
(stream: true), the server sends : OPENROUTER PROCESSING SSE comments as
heartbeats while workers execute. Content chunks resume once the orchestrator
continues generating. The final chunk includes the aggregated usage object.
See Server tools overview for how server
tool usage appears in the response.
With stream: true, expect this pattern in the SSE stream:
Most SSE client libraries ignore comment lines (lines starting with :) automatically. Here’s a minimal consumer:
5. Log delegation routing, not task content
Add telemetry where your app already records model calls. Log the routing decision and cost, not the content.
Log:
orchestrator_modelworker_modeldid_enable_delegation(whether you configured the subagent tool on this request)finish_reasonusage.prompt_tokens(orusage.input_tokens),usage.completion_tokens(orusage.output_tokens),usage.total_tokens, andusage.costwhen returned- route or feature name, such as
delegated_analysis
Do not log:
- API keys
- cookies
- full task descriptions
- full worker outcomes
- user content (unless your product already has an explicit retention policy)
The usage object in the response reflects the combined token spend of the orchestrator plus all worker calls, per Server tools: Usage Tracking. You don’t need to track inner costs separately.
Next steps
- Read the Subagent reference for exact parameters, recursion guards, worker tool constraints, and invocation caps.
- Pair subagent with Advisor for a two-tier pattern: cheap worker for routine tasks, strong advisor for uncertain decisions.
- Give the worker Web Search when subtasks need current data.
- Add Response Caching for repeated orchestrator prefixes across similar tasks.
- Use Fusion when subtasks need multi-model deliberation instead of single-worker execution.
- Browse the Model list to compare worker model pricing and find the cheapest model that meets your subtask quality bar.
- Add Structured Outputs to the orchestrator request when you need the final answer in a specific JSON schema.