COMPUTER ACCESS SYSTEM

Agent Cost Optimization: A Practical Guide to Reducing API Spend


Agent Cost Optimization: A Practical Guide to Reducing API Spend

Your agent works great. It handles 200 requests per day, and users are happy. Then you check the API bill: $3,400 this month. You dig into the numbers and realize the agent makes an average of 12 API calls per request, each with a 4,000-token system prompt. That’s 9.6 million input tokens per day just for system prompts. At $3 per million tokens, that’s $864/month on repeated content alone.

Cost is the number one reason production agent deployments get scaled back or killed entirely. Optimization isn’t premature — it’s survival. The good news: most agent deployments have 50–80% cost reduction available through straightforward changes that don’t require rearchitecting your entire system.

In this guide, you’ll learn seven cost reduction strategies, ordered by ROI (Return on Investment). Start at the top and work your way down until you hit your budget target. Each section includes concrete numbers so you can estimate savings before writing a single line of code.


1. Token Accounting: Know Where Your Money Goes

You can’t optimize what you don’t measure. Before changing anything, build a complete picture of where every dollar goes in your agent workflow.

Input Token Breakdown

Every API call to Claude includes several token categories on the input side:

  • System prompt — Instructions, persona, constraints. Often 1,000–5,000 tokens and repeated on every call.
  • Tool definitions — JSON schemas for each tool the agent can use. 10 tools can easily consume 2,000–3,000 tokens.
  • Conversation history — All prior messages in the conversation. Grows with each step.
  • Tool results — Outputs from previous tool calls injected back into context. Can be massive (full web pages, database results).

Output Token Breakdown

Output tokens are 3–5x more expensive than input tokens, making them a critical optimization target:

  • Agent reasoning — Internal chain-of-thought (especially with extended thinking).
  • Tool call generation — The JSON for tool invocations.
  • Final response — The user-facing answer.

Per-Task Cost Calculation

The total cost for a single agent task is:

Total Cost = Σ (input_tokens × input_price + output_tokens × output_price)
for each API call in the task

Cost Breakdown Example

Here’s a realistic breakdown for a 10-step research agent task using Claude Sonnet ($3/M input, $15/M output):

StepComponentInput TokensOutput TokensInput CostOutput CostTotal
1Initial planning4,200800$0.0126$0.0120$0.025
2Web search call4,800200$0.0144$0.0030$0.017
3Process search results8,500600$0.0255$0.0090$0.035
4Deep reading (page 1)12,000500$0.0360$0.0075$0.044
5Deep reading (page 2)15,200500$0.0456$0.0075$0.053
6Follow-up search16,800200$0.0504$0.0030$0.053
7Process results20,100600$0.0603$0.0090$0.069
8Synthesis22,5001,200$0.0675$0.0180$0.086
9Verification24,000400$0.0720$0.0060$0.078
10Final response25,5001,500$0.0765$0.0225$0.099
Total153,6006,500$0.461$0.098$0.558

Notice how input costs dominate, and they grow with each step as conversation history accumulates. Steps 8–10 account for 47% of total cost despite being only 30% of the steps.

Cost Tracking Code

Implement a tracking wrapper from day one:

import anthropic
import time
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class CostRecord:
step: int
model: str
input_tokens: int
output_tokens: int
cache_read_tokens: int = 0
cache_creation_tokens: int = 0
input_cost: float = 0.0
output_cost: float = 0.0
total_cost: float = 0.0
duration_ms: float = 0.0
# Pricing per million tokens (as of early 2026)
MODEL_PRICING = {
"claude-haiku": {"input": 0.25, "output": 1.25, "cache_read": 0.025, "cache_write": 0.30},
"claude-sonnet": {"input": 3.00, "output": 15.00, "cache_read": 0.30, "cache_write": 3.75},
"claude-opus": {"input": 15.00, "output": 75.00, "cache_read": 1.50, "cache_write": 18.75},
}
@dataclass
class TaskCostTracker:
task_id: str
budget_limit: Optional[float] = None
records: list = field(default_factory=list)
total_cost: float = 0.0
def record_call(self, step: int, model: str, usage) -> CostRecord:
pricing = MODEL_PRICING.get(model, MODEL_PRICING["claude-sonnet"])
input_cost = (usage.input_tokens / 1_000_000) * pricing["input"]
output_cost = (usage.output_tokens / 1_000_000) * pricing["output"]
cache_read_cost = (getattr(usage, 'cache_read_input_tokens', 0) / 1_000_000) * pricing["cache_read"]
cache_write_cost = (getattr(usage, 'cache_creation_input_tokens', 0) / 1_000_000) * pricing["cache_write"]
total = input_cost + output_cost + cache_read_cost + cache_write_cost
record = CostRecord(
step=step,
model=model,
input_tokens=usage.input_tokens,
output_tokens=usage.output_tokens,
cache_read_tokens=getattr(usage, 'cache_read_input_tokens', 0),
cache_creation_tokens=getattr(usage, 'cache_creation_input_tokens', 0),
input_cost=input_cost + cache_read_cost + cache_write_cost,
output_cost=output_cost,
total_cost=total,
)
self.records.append(record)
self.total_cost += total
print(f" Step {step} [{model}]: {usage.input_tokens} in / {usage.output_tokens} out = ${total:.4f} (cumulative: ${self.total_cost:.4f})")
if self.budget_limit and self.total_cost > self.budget_limit:
raise BudgetExceededError(
f"Task {self.task_id} exceeded budget: ${self.total_cost:.4f} > ${self.budget_limit:.4f}"
)
return record
def summary(self) -> dict:
return {
"task_id": self.task_id,
"total_steps": len(self.records),
"total_input_tokens": sum(r.input_tokens for r in self.records),
"total_output_tokens": sum(r.output_tokens for r in self.records),
"total_cost": self.total_cost,
"cost_by_model": self._cost_by_model(),
}
def _cost_by_model(self) -> dict:
by_model = {}
for r in self.records:
if r.model not in by_model:
by_model[r.model] = {"calls": 0, "cost": 0.0}
by_model[r.model]["calls"] += 1
by_model[r.model]["cost"] += r.total_cost
return by_model
class BudgetExceededError(Exception):
pass

Start tracking today, even before optimizing. You need baseline numbers to measure improvement.


2. Model Selection by Task

This is the single biggest cost lever available to you. Not every agent step requires your most powerful model.

Model Tiers and Pricing

Using the Claude family as reference:

ModelInput (per 1M tokens)Output (per 1M tokens)Best For
Haiku$0.25$1.25Classification, routing, extraction, simple formatting
Sonnet$3.00$15.00Complex tool use, research, synthesis, general-purpose
Opus$15.00$75.00Complex planning, nuanced reasoning, high-stakes decisions

Haiku is 12x cheaper than Sonnet on input and 12x cheaper on output. Sonnet is 5x cheaper than Opus across the board.

Per-Role Recommendations

Match each agent role to the cheapest model that meets quality requirements:

  • Router/classifier agentsHaiku. “Is this a billing question or a technical question?” doesn’t need Sonnet. Haiku handles classification with >95% accuracy for well-defined categories.
  • Data extraction agentsHaiku. Pulling structured fields from text, parsing dates, extracting entities — Haiku excels here.
  • Worker agents (tool use)Sonnet. Complex multi-step tool orchestration, research synthesis, and nuanced responses benefit from Sonnet’s capabilities.
  • Orchestrator/planner agentsSonnet or Opus. Planning quality directly impacts total cost (fewer steps = fewer API calls), so investing in a better planner can pay for itself.
  • Verification agentsSonnet for most cases; Opus for high-stakes verification where errors are expensive.

Savings Calculation

Consider an agent handling 200 requests/day with this call pattern per request:

StepRoleCallsInput TokensOutput Tokens
RouteClassification11,00050
ExtractData extraction22,000300
ProcessComplex reasoning68,0002,000
VerifyQuality check13,000200

All Sonnet (baseline):

  • Input: (1,000 + 4,000 + 48,000 + 3,000) × 200 = 11.2M tokens/day → $33.60/day
  • Output: (50 + 600 + 12,000 + 200) × 200 = 2.57M tokens/day → $38.55/day
  • Total: $72.15/day = $2,165/month

Mixed model approach:

  • Route (Haiku): 200K input → $0.05, 10K output → $0.0125
  • Extract (Haiku): 800K input → $0.20, 120K output → $0.15
  • Process (Sonnet): 9.6M input → $28.80, 2.4M output → $36.00
  • Verify (Sonnet): 600K input → $1.80, 40K output → $0.60
  • Total: $67.61/day = $2,028/month

That’s only a 6% savings — but in practice, once you’ve verified Ha