Claude Managed Agents¶

★★★★★ Intermediate

Hosted agent platform from Anthropic. You define agent config (model, prompt, tools), Anthropic runs it in managed cloud containers with built-in code execution, web access, file I/O, and event streaming. No Docker, no orchestration code, no tool execution layer.

Key difference from Messages API: Messages API gives direct model access (you build everything). Managed Agents gives a complete agent platform (containers, tools, persistence, fault recovery included).

Four Core Concepts¶

Concept	What it is	Lifecycle
Agent	Versioned config: model + system prompt + tools + MCP servers	Create once, reference by ID
Environment	Container template: OS packages, network rules, language runtimes	Reusable across sessions
Session	Running agent instance inside an environment	Holds conversation history, filesystem, status
Events	SSE stream between your app and the agent	User messages in, agent responses + tool calls out

Quick Start¶

# Install CLI
brew install anthropics/tap/ant          # macOS
# or curl from GitHub releases for Linux

# Install SDK
pip install anthropic                    # Python
npm install @anthropic-ai/sdk           # TypeScript

export ANTHROPIC_API_KEY="your-key"

Create Agent + Environment + Session¶

from anthropic import Anthropic

client = Anthropic()

# 1. Agent config
agent = client.beta.agents.create(
    name="Coding Assistant",
    model="claude-sonnet-4-6",
    system="You are a helpful coding assistant. Write clean, well-documented code.",
    tools=[{"type": "agent_toolset_20260401"}],
)

# 2. Container template
environment = client.beta.environments.create(
    name="dev-env",
    config={
        "type": "cloud",
        "networking": {"type": "unrestricted"},
        "packages": {"pip": ["pandas", "numpy"]},  # optional
    },
)

# 3. Start session
session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=environment.id,
    title="My first session",
)

# 4. Send task and stream results
with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(
        session.id,
        events=[{
            "type": "user.message",
            "content": [{"type": "text", "text": "Create a Fibonacci script"}],
        }],
    )
    for event in stream:
        match event.type:
            case "agent.message":
                for block in event.content:
                    print(block.text, end="")
            case "agent.tool_use":
                print(f"\n[Tool: {event.name}]")
            case "session.status_idle":
                print("\n\nDone.")
                break

Built-in Tools¶

agent_toolset_20260401 includes all tools:

Tool	Description
`bash`	Shell commands in container
`read`	Read files
`write`	Write files
`edit`	String replacement in files
`glob`	File pattern matching
`grep`	Regex search
`web_fetch`	Download URL content
`web_search`	Internet search

Selective Tool Configuration¶

# Disable specific tools
{"type": "agent_toolset_20260401",
 "configs": [
     {"name": "web_fetch", "enabled": False},
     {"name": "web_search", "enabled": False},
 ]}

# Enable only specific tools (everything else disabled)
{"type": "agent_toolset_20260401",
 "default_config": {"enabled": False},
 "configs": [
     {"name": "bash", "enabled": True},
     {"name": "read", "enabled": True},
 ]}

Custom Tools¶

agent = client.beta.agents.create(
    tools=[
        {"type": "agent_toolset_20260401"},
        {
            "type": "custom",
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {"location": {"type": "string"}},
                "required": ["location"],
            },
        },
    ],
)

Custom tool best practices: 3-4 sentence descriptions (what, when, limitations). Group related operations under one tool with action parameter. Use namespace prefixes (db_query, storage_read). Return only essential data.

Permission System¶

Two modes, combinable per-tool:

Mode	Behavior	Use case
`always_allow`	Auto-execute	Trusted internal agents
`always_ask`	Pause for approval	User-facing agents

MCP tools default to always_ask. This is more production-ready than LangGraph/CrewAI/AutoGen - none offer per-tool permissions out of the box.

Usage Patterns¶

Pattern	Description	Example
Event-triggered	External system fires agent	Sentry: bug detected -> agent writes patch -> opens PR
Scheduled	Cron-style	Daily GitHub digest, team task summary
Fire-and-forget	Human submits task, gets result	Asana AI Teammates
Long-horizon	Hours of work, persistent state	Research projects, large codebase migrations

CLI + SDK pattern: agent templates as YAML in git (model, prompt, tools, MCP). CLI deploys via CI pipeline. SDK manages sessions at runtime.

Outcomes (Research Preview)¶

Turns sessions from conversations into goal-oriented work. Define what "done" looks like with a rubric. A separate grader process evaluates quality independently.

client.beta.sessions.events.send(
    session_id=session.id,
    events=[{
        "type": "user.define_outcome",
        "description": "Build a DCF model for Costco in .xlsx",
        "rubric": {"type": "text", "content": RUBRIC_TEXT},
        "max_iterations": 5,  # default 3, max 20
    }],
)

Rubric tips: concrete, verifiable criteria ("CSV contains price column with numeric values"), not vague ("data looks good").

Result	What happens
`satisfied`	Session goes idle
`needs_revision`	Agent starts new iteration
`max_iterations_reached`	Final attempt, then idle
`failed`	Rubric doesn't match task

Retrieve output files via Files API from /mnt/session/outputs/.

Multi-Agent (Research Preview)¶

One orchestrator delegates to sub-agents. Each runs in its own thread with isolated context, but all share the container filesystem.

orchestrator = client.beta.agents.create(
    name="Engineering Lead",
    model="claude-sonnet-4-6",
    system="Coordinate engineering work. Delegate code review to reviewer, tests to test agent.",
    tools=[{"type": "agent_toolset_20260401"}],
    callable_agents=[
        {"type": "agent", "id": reviewer.id, "version": reviewer.version},
        {"type": "agent", "id": test_writer.id, "version": test_writer.version},
    ],
)

Limitation: single delegation level only. Orchestrator calls agents, agents cannot call other agents.

Stream sub-agent threads independently:

for thread in client.beta.sessions.threads.list(session.id):
    print(f"[{thread.agent_name}] {thread.status}")

Architecture¶

Three independent components with minimal assumptions about each other:

Brain - Claude + agent loop (tool selection, reasoning)
Hands - sandboxes and tools (execution)
Session - event journal (persistence)

Each can fail or be replaced independently. Built-in: prompt caching, context compaction, automatic infrastructure recovery.

Pricing¶

Standard Claude API token rates + $0.08/hour active session time. A 10-minute coding session costs a few cents for compute.

Operation	Rate Limit
Create (agents, sessions, environments)	60 req/min
Read (get, list, stream)	600 req/min

Gotchas¶

Outcomes grader runs in separate context - it cannot see the agent's reasoning, only the output. This is by design (prevents self-evaluation bias) but means your rubric must be evaluable from artifacts alone, not from the conversation
Single delegation level - orchestrator -> agents, but agents cannot call sub-agents. Design flat hierarchies. For deeper nesting, use sequential sessions where one agent's output feeds another's input
Container state is per-session - files created in one session don't carry to another unless you explicitly download and re-upload. Use the Files API for persistence across sessions
always_ask pauses the entire session - if your app doesn't handle the approval event promptly, the agent sits idle burning session time. Implement approval webhooks or polling with timeouts