COMPUTER ACCESS SYSTEM

Streaming Agent Responses: Real-Time Output for Multi-Step Workflows


Streaming Agent Responses: Real-Time Output for Multi-Step Workflows

Your agent takes 20 seconds to research a question, call three tools, and synthesize an answer. The user clicks “Ask” and stares at a spinner for 20 seconds. They’re not sure if it’s working. They consider refreshing the page. They wonder if they should start over.

Now imagine this instead: the user clicks “Ask” and within 500 milliseconds, they see the agent’s first words appear. They watch it search for information—”🔍 Searching order database…”—and see results arrive in real time. They read the answer as it’s being written, sentence by sentence. Same 20 seconds. Completely different experience.

Streaming is not a nice-to-have for user-facing agents—it is a UX requirement. Time-to-first-token is the single most important latency metric in agent interfaces. Users who see progress are patient. Users who see nothing abandon. Study after study in web performance shows that perceived latency matters more than actual latency, and streaming is the most powerful tool you have to close the gap between the two.

In this article, you’ll learn how to implement real-time streaming for multi-step AI agents—from basic token-by-token delivery to tool call transparency, status updates, progressive disclosure, error recovery, and transport layer choices.


Section 1: Claude Streaming API Basics

Before you can stream an agent, you need to stream a single Claude response. Let’s start with the fundamentals.

Stream vs. Non-Stream

A non-streaming API call blocks until the entire response is generated, then returns it all at once. A streaming call returns a sequence of events as the response is produced, starting with the first token.

The difference in code is minimal. The difference in user experience is enormous.

Event Types

The Claude streaming API emits a structured sequence of server-sent events:

  1. message_start — Contains the initial Message object with metadata (model, role, usage).
  2. content_block_start — Signals the beginning of a content block (text or tool_use).
  3. content_block_delta — Contains incremental content: text fragments or partial tool input JSON.
  4. content_block_stop — Signals the end of a content block.
  5. message_delta — Final updates to the message (stop reason, final usage).
  6. message_stop — The stream is complete.

For text responses, you’ll receive many content_block_delta events, each carrying a small chunk of text (often a few tokens).

Basic Streaming Implementation

Here’s a synchronous streaming example using the Anthropic Python SDK:

import anthropic
client = anthropic.Anthropic()
def stream_basic_response(user_message: str):
"""Stream a basic Claude response token by token."""
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": user_message}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # Newline after stream completes
stream_basic_response("Explain quantum entanglement in simple terms.")

And the async version, which is what you’ll use in production web servers:

import anthropic
import asyncio
async_client = anthropic.AsyncAnthropic()
async def stream_basic_response_async(user_message: str):
"""Async streaming with the Anthropic SDK."""
async with async_client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": user_message}],
) as stream:
async for text in stream.text_stream:
print(text, end="", flush=True)
print()
asyncio.run(stream_basic_response_async("Explain quantum entanglement in simple terms."))

Handling Partial Tokens and UTF-8 Boundaries

The SDK handles UTF-8 decoding for you, but if you’re working with the raw HTTP stream, be aware that multi-byte characters can be split across chunks. Always buffer raw bytes and decode only when you have complete UTF-8 sequences. The anthropic SDK’s text_stream iterator handles this automatically—another reason to use it rather than parsing the raw SSE stream yourself.


Section 2: Streaming Tool Calls

Basic text streaming is table stakes. The real challenge—and the real value—comes when your agent uses tools. A multi-step agent that calls three tools in sequence can feel dead during tool execution unless you surface what’s happening.

How Tool Calls Appear in the Stream

When Claude decides to use a tool, the stream emits a content_block_start event with type: "tool_use", followed by content_block_delta events containing fragments of the tool’s input JSON. Once the full tool call is assembled, you execute the tool, inject the result, and continue the conversation.

The key insight: you know the tool name as soon as content_block_start arrives, even before the input JSON is complete. This means you can immediately show the user something like ”🔍 Searching order database…” without waiting for the full tool call.

The Complete Streaming Agent Loop

Here’s a full streaming agent loop that displays tool invocations in real time:

import anthropic
import json
client = anthropic.Anthropic()
# Define tools
tools = [
{
"name": "search_orders",
"description": "Search customer orders by order ID or customer email.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Order ID or email"},
},
"required": ["query"],
},
},
{
"name": "get_shipping_status",
"description": "Get real-time shipping status for an order.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "The order ID"},
},
"required": ["order_id"],
},
},
]
def execute_tool(tool_name: str, tool_input: dict) -> str:
"""Execute a tool and return the result as a string."""
if tool_name == "search_orders":
# Simulated database lookup
return json.dumps({
"order_id": "ORD-12345",
"customer": "jane@example.com",
"items": ["Blue Widget x2", "Red Gadget x1"],
"total": "$47.99",
"status": "shipped",
})
elif tool_name == "get_shipping_status":
return json.dumps({
"order_id": tool_input["order_id"],
"carrier": "FedEx",
"tracking": "7891011",
"estimated_delivery": "2026-03-10",
"current_location": "Memphis, TN",
})
return json.dumps({"error": f"Unknown tool: {tool_name}"})
def stream_agent_response(user_message: str):
"""
Complete streaming agent loop with real-time tool call display.
"""
messages = [{"role": "user", "content": user_message}]
while True:
# Stream the model response
tool_calls = []
current_tool = None
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tools,
messages=messages,
) as stream:
response = None
for event in stream:
# The SDK exposes raw events via iteration
pass
# Use the helper to collect text and tool use
response = stream.get_final_message()
# Process the response content blocks
for block in response.content:
if block.type == "text":
print(block.text, end="", flush=True)
elif block.type == "tool_use":
tool_name = block.name
tool_input = block.input
tool_id = block.id
# Show the user what's happening
print(f"\n⚙️ Calling tool: {tool_name}({json.dumps(tool_input)})")
# Execute the tool
result = execute_tool(tool_name, tool_input)
print(f"✅ Result received from {tool_name}")
tool_calls.append({
"tool_use_id": tool_id,
"tool_name": tool_name,
"tool_input": tool_input,
"result": result,
})
# If the model stopped because it wants to use tools, continue the loop
if response.stop_reason == "tool_use":
# Add assistant message with all content blocks
messages.append({"role": "assistant", "content": response.content})
# Add tool results
tool_results = []
for tc in tool_calls:
tool_results.append({
"type": "tool_result",
"tool_use_id": tc["tool_use_id"],
"content": tc["result"],
})
messages.append({"role": "user", "content": tool_results})
print("\n--- Continuing agent loop ---")
else:
# Model is done (stop_reason == "end_turn")
print()
break
# Run the agent
stream_agent_response("Where is my order #ORD-12345? When will it arrive?")

For true token-by-token streaming with tool call detection, you can iterate over the raw events:

def stream_agent_with_live_tokens(user_message: str):
"""Stream text tokens live while also detecting tool calls."""
messages = [{"role": "user", "content": user_message}]
while True:
collected_content = []
current_text = ""
current_tool_name = None
current_tool_input_json = ""
current_tool_id = None
stop_reason = None
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tools,
messages=messages,
) as stream:
for event in stream:
if hasattr(event, 'type'):
if event.type == 'content_block_start':
if event.content_block.type == 'tool_use':
current_tool_name = event.content_block.name
current_tool_id = event.content_block.id
current_tool_input_json = ""
print(f"\n🔧 Agent is calling: {current_tool_name}...")
elif event.type == 'content_block_delta':
if hasattr(event.delta, 'text'):
print(event.delta.text, end="", flush=True)
current_text += event.delta.text
elif hasattr(event.delta, 'partial_json'):
current_tool_input_json += event.delta.partial_json
response = stream.get_final_message()
stop_reason = response.stop_reason
# Handle tool execution and loop continuation
if stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
print(f"\n{block.name} returned results")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
else:
print()
break

Parallel Tool Calls

When Claude invokes multiple tools in a single response, you’ll receive multiple tool_use content blocks in the stream. You can execute them in parallel and return all results at once:

import asyncio
async def execute_tools_parallel(tool_blocks):
"""Execute multiple tool calls concurrently."""
tasks = []
for block in tool_blocks:
# In production, these would be actual async I/O operations
tasks.append
---
## Related Articles
- [Tool Use Patterns: Building Reliable Agent-Tool Interfaces](/blog/agent-tool-use-patterns/)
- [Multi-Agent Patterns: Orchestrators, Workers, and Pipelines](/blog/multi-agent-patterns/)
- [Agent Error Recovery: 5 Patterns for Production Reliability](/blog/agent-error-recovery-patterns/)
- [Web Automation Agents: Browser Control with Claude and Computer Use](/blog/web-automation-agents-browser-control-with-claude-and-computer-use/)