Primary model

Calls per day

A typical small fleet runs 20-200/day. Subagent runs ~600/day.

Days per month

Input tokens / call

System prompt + retrieval + user message. Typical: 1k-10k.

Output tokens / call

Generated response. Typical: 200-2000.

Enable prompt caching (90% off cached input tokens)

Caching pays off when system prompt ≥ ~4,096 tokens AND repeated. For Claude API this is automatic with cache_control.

% of input tokens cached

System prompts are usually fully cacheable. User messages aren't. 80% is realistic for fleet-style agents.

Image gens / day (optional)

OpenAI gpt-image · $0.05 / standard 1024×1024 image.

Pricing reference 2026 rates

Confirmed against Anthropic and OpenAI pricing pages, April 2026. Update as providers change rates. Cache write costs the same as a normal input token; cache reads are 90% off (Anthropic) or 50% off (OpenAI Batch API).

Model

Input ($/1M)

Output ($/1M)

Cached input ($/1M)

Best fit

Claude Haiku

$0.25

$1.25

$0.025

Routing, classification, short rewrites

Claude Sonnet

$3.00

$15.00

$0.30

Drafting, editing, code generation

Claude Opus

$15.00

$75.00

$1.50

Multi-source synthesis, vision verification, hard reasoning

GPT-4o mini

$0.15

$0.60

$0.075

OpenAI-side classification

GPT-4o

$2.50

$10.00

$1.25

OpenAI-side reasoning

What does running an AI agent fleet actually cost?

Pricing reference 2026 rates

How does Subagent run an agent fleet on $0.70/day?