Mar 6, 2026

Agent Cost Optimization: A Practical Guide to Reducing API Spend

Agent Cost Optimization: A Practical Guide to Reducing API Spend

Your agent works great. It handles 200 requests per day, and users are happy. Then you check the API bill: $3,400 this month. You dig into the numbers and realize the agent makes an average of 12 API calls per request, each with a 4,000-token system prompt. That’s 9.6 million input tokens per day just for system prompts. At $3 per million tokens, that’s $864/month on repeated content alone.

Cost is the number one reason production agent deployments get scaled back or killed entirely. Optimization isn’t premature — it’s survival. The good news: most agent deployments have 50–80% cost reduction available through straightforward changes that don’t require rearchitecting your entire system.

In this guide, you’ll learn seven cost reduction strategies, ordered by ROI (Return on Investment). Start at the top and work your way down until you hit your budget target. Each section includes concrete numbers so you can estimate savings before writing a single line of code.

1. Token Accounting: Know Where Your Money Goes

You can’t optimize what you don’t measure. Before changing anything, build a complete picture of where every dollar goes in your agent workflow.

Input Token Breakdown

Every API call to Claude includes several token categories on the input side:

System prompt — Instructions, persona, constraints. Often 1,000–5,000 tokens and repeated on every call.
Tool definitions — JSON schemas for each tool the agent can use. 10 tools can easily consume 2,000–3,000 tokens.
Conversation history — All prior messages in the conversation. Grows with each step.
Tool results — Outputs from previous tool calls injected back into context. Can be massive (full web pages, database results).

Output Token Breakdown

Output tokens are 3–5x more expensive than input tokens, making them a critical optimization target:

Agent reasoning — Internal chain-of-thought (especially with extended thinking).
Tool call generation — The JSON for tool invocations.
Final response — The user-facing answer.

Per-Task Cost Calculation

The total cost for a single agent task is:

Total Cost = Σ (input_tokens × input_price + output_tokens × output_price)
             for each API call in the task

Cost Breakdown Example

Here’s a realistic breakdown for a 10-step research agent task using Claude Sonnet ($3/M input, $15/M output):

Step	Component	Input Tokens	Output Tokens	Input Cost	Output Cost	Total
1	Initial planning	4,200	800	$0.0126	$0.0120	$0.025
2	Web search call	4,800	200	$0.0144	$0.0030	$0.017
3	Process search results	8,500	600	$0.0255	$0.0090	$0.035
4	Deep reading (page 1)	12,000	500	$0.0360	$0.0075	$0.044
5	Deep reading (page 2)	15,200	500	$0.0456	$0.0075	$0.053
6	Follow-up search	16,800	200	$0.0504	$0.0030	$0.053
7	Process results	20,100	600	$0.0603	$0.0090	$0.069
8	Synthesis	22,500	1,200	$0.0675	$0.0180	$0.086
9	Verification	24,000	400	$0.0720	$0.0060	$0.078
10	Final response	25,500	1,500	$0.0765	$0.0225	$0.099
Total		153,600	6,500	$0.461	$0.098	$0.558

Notice how input costs dominate, and they grow with each step as conversation history accumulates. Steps 8–10 account for 47% of total cost despite being only 30% of the steps.

Cost Tracking Code

Implement a tracking wrapper from day one:

import anthropic
import time
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class CostRecord:
    step: int
    model: str
    input_tokens: int
    output_tokens: int
    cache_read_tokens: int = 0
    cache_creation_tokens: int = 0
    input_cost: float = 0.0
    output_cost: float = 0.0
    total_cost: float = 0.0
    duration_ms: float = 0.0

# Pricing per million tokens (as of early 2026)
MODEL_PRICING = {
    "claude-haiku": {"input": 0.25, "output": 1.25, "cache_read": 0.025, "cache_write": 0.30},
    "claude-sonnet": {"input": 3.00, "output": 15.00, "cache_read": 0.30, "cache_write": 3.75},
    "claude-opus": {"input": 15.00, "output": 75.00, "cache_read": 1.50, "cache_write": 18.75},
}

@dataclass
class TaskCostTracker:
    task_id: str
    budget_limit: Optional[float] = None
    records: list = field(default_factory=list)
    total_cost: float = 0.0

    def record_call(self, step: int, model: str, usage) -> CostRecord:
        pricing = MODEL_PRICING.get(model, MODEL_PRICING["claude-sonnet"])

        input_cost = (usage.input_tokens / 1_000_000) * pricing["input"]
        output_cost = (usage.output_tokens / 1_000_000) * pricing["output"]
        cache_read_cost = (getattr(usage, 'cache_read_input_tokens', 0) / 1_000_000) * pricing["cache_read"]
        cache_write_cost = (getattr(usage, 'cache_creation_input_tokens', 0) / 1_000_000) * pricing["cache_write"]

        total = input_cost + output_cost + cache_read_cost + cache_write_cost

        record = CostRecord(
            step=step,
            model=model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            cache_read_tokens=getattr(usage, 'cache_read_input_tokens', 0),
            cache_creation_tokens=getattr(usage, 'cache_creation_input_tokens', 0),
            input_cost=input_cost + cache_read_cost + cache_write_cost,
            output_cost=output_cost,
            total_cost=total,
        )

        self.records.append(record)
        self.total_cost += total

        print(f"  Step {step} [{model}]: {usage.input_tokens} in / {usage.output_tokens} out = ${total:.4f} (cumulative: ${self.total_cost:.4f})")

        if self.budget_limit and self.total_cost > self.budget_limit:
            raise BudgetExceededError(
                f"Task {self.task_id} exceeded budget: ${self.total_cost:.4f} > ${self.budget_limit:.4f}"
            )

        return record

    def summary(self) -> dict:
        return {
            "task_id": self.task_id,
            "total_steps": len(self.records),
            "total_input_tokens": sum(r.input_tokens for r in self.records),
            "total_output_tokens": sum(r.output_tokens for r in self.records),
            "total_cost": self.total_cost,
            "cost_by_model": self._cost_by_model(),
        }

    def _cost_by_model(self) -> dict:
        by_model = {}
        for r in self.records:
            if r.model not in by_model:
                by_model[r.model] = {"calls": 0, "cost": 0.0}
            by_model[r.model]["calls"] += 1
            by_model[r.model]["cost"] += r.total_cost
        return by_model


class BudgetExceededError(Exception):
    pass

Start tracking today, even before optimizing. You need baseline numbers to measure improvement.

2. Model Selection by Task

This is the single biggest cost lever available to you. Not every agent step requires your most powerful model.

Model Tiers and Pricing

Using the Claude family as reference:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Haiku	$0.25	$1.25	Classification, routing, extraction, simple formatting
Sonnet	$3.00	$15.00	Complex tool use, research, synthesis, general-purpose
Opus	$15.00	$75.00	Complex planning, nuanced reasoning, high-stakes decisions

Haiku is 12x cheaper than Sonnet on input and 12x cheaper on output. Sonnet is 5x cheaper than Opus across the board.

Per-Role Recommendations

Match each agent role to the cheapest model that meets quality requirements:

Router/classifier agents → Haiku. “Is this a billing question or a technical question?” doesn’t need Sonnet. Haiku handles classification with >95% accuracy for well-defined categories.
Data extraction agents → Haiku. Pulling structured fields from text, parsing dates, extracting entities — Haiku excels here.
Worker agents (tool use) → Sonnet. Complex multi-step tool orchestration, research synthesis, and nuanced responses benefit from Sonnet’s capabilities.
Orchestrator/planner agents → Sonnet or Opus. Planning quality directly impacts total cost (fewer steps = fewer API calls), so investing in a better planner can pay for itself.
Verification agents → Sonnet for most cases; Opus for high-stakes verification where errors are expensive.

Savings Calculation

Consider an agent handling 200 requests/day with this call pattern per request:

Step	Role	Calls	Input Tokens	Output Tokens
Route	Classification	1	1,000	50
Extract	Data extraction	2	2,000	300
Process	Complex reasoning	6	8,000	2,000
Verify	Quality check	1	3,000	200

All Sonnet (baseline):

Input: (1,000 + 4,000 + 48,000 + 3,000) × 200 = 11.2M tokens/day → $33.60/day
Output: (50 + 600 + 12,000 + 200) × 200 = 2.57M tokens/day → $38.55/day
Total: $72.15/day = $2,165/month

Mixed model approach:

Route (Haiku): 200K input → $0.05, 10K output → $0.0125
Extract (Haiku): 800K input → $0.20, 120K output → $0.15
Process (Sonnet): 9.6M input → $28.80, 2.4M output → $36.00
Verify (Sonnet): 600K input → $1.80, 40K output → $0.60
Total: $67.61/day = $2,028/month

That’s only a 6% savings — but in practice, once you’ve verified Ha

Agent Cost Optimization: A Practical Guide to Reducing API Spend

Agent Cost Optimization: A Practical Guide to Reducing API Spend

1. Token Accounting: Know Where Your Money Goes

Input Token Breakdown

Output Token Breakdown

Per-Task Cost Calculation

Cost Breakdown Example

Cost Tracking Code

2. Model Selection by Task

Model Tiers and Pricing

Per-Role Recommendations

Savings Calculation

Related Articles