Notes and Practical Guide Based on Barry’s Anthropic Talk
This guide breaks down the practical lessons shared by Barry from Anthropic on how they think about AI agents, workflows, orchestration systems, and production deployment.
The core message was simple:
Most teams are building agents too early.
Anthropic’s approach is far more practical.
The talk focused on 3 ideas:
- Don’t build agents for everything
- Keep agent systems simple
- Think like your agent
1. The Evolution of AI Systems
Barry explained how AI systems evolved in stages.
Stage 1: Simple AI Features
Most teams started with:
→ summarization
→ classification
→ extraction
A single LLM call handled one task.
At the time, these systems felt advanced.
Today they are standard product features.
Stage 2: Workflows
Teams then moved toward orchestrated systems.
Instead of one LLM call:
→ multiple models worked together
→ predefined logic controlled execution
→ outputs from one step became inputs to another
Example workflow:
- Extract information
- Classify result
- Generate response
- Validate output
These systems are predictable and easier to control.
Anthropic considers workflows the beginning of agentic systems.
Stage 3: Agents
Agents differ from workflows because they decide their own trajectory.
A workflow follows predefined logic.
An agent:
→ observes environment feedback
→ decides next actions dynamically
→ explores different paths independently
This autonomy increases:
→ usefulness
→ flexibility
→ capability
But also increases:
→ token cost
→ latency
→ unpredictability
→ consequences of errors
2. Don’t Build Agents for Everything
This was the strongest point in the talk.
Anthropic does not treat agents as universal upgrades.
Agents are for:
→ complex tasks
→ ambiguous environments
→ high value operations
Not every use case needs autonomy.
The Agent Decision Checklist
Before building an agent, Anthropic evaluates four things.
A. Task Complexity
Agents work best when:
→ the problem space is ambiguous
→ decision trees are difficult to map
→ exploration matters
If the decision process is predictable:
→ use workflows instead
→ explicitly define logic
→ optimize each node manually
Why?
Because workflows provide:
→ lower cost
→ faster execution
→ better reliability
→ easier debugging
B. Task Value
Agents are expensive.
Exploration requires:
→ more tokens
→ more tool calls
→ more iterations
So the task must justify the cost.
Example:
A high volume customer support system with tiny margins should probably use workflows.
Why?
→ most cases are repetitive
→ predictable paths solve most requests
→ autonomous exploration adds unnecessary cost
Agents make more sense when:
→ the task is high leverage
→ accuracy matters more than token efficiency
→ the outcome has significant value
C. Critical Capabilities
Anthropic evaluates whether the model can reliably perform the critical actions required for the task.
For coding agents:
→ writing code is required
→ debugging is required
→ recovering from mistakes is required
If one capability becomes a bottleneck:
→ costs increase
→ retries increase
→ latency increases
Their solution:
→ reduce scope
→ simplify the task
→ retry with narrower objectives
D. Cost of Error
This is one of the most important deployment questions.
Ask:
How dangerous are mistakes?
And:
How hard are mistakes to detect?
If errors are:
→ expensive
→ irreversible
→ difficult to discover
Then autonomy becomes risky.
In these cases Anthropic recommends:
→ read only access
→ limited permissions
→ human approval loops
→ constrained execution environments
3. Why Coding Agents Work Well
Barry used coding agents as an example of a strong agent use case.
Why?
Coding Is Ambiguous
Going from:
→ design
→ implementation
→ pull request
requires dynamic decision making.
Coding Has High Value
A working implementation creates large leverage.
Outputs Are Verifiable
This is critical.
Code has:
→ unit tests
→ CI pipelines
→ runtime validation
The system can verify success automatically.
This makes coding ideal for agentic systems.
4. Keep Agent Systems Simple
Barry repeatedly emphasized this.
Most teams overengineer agents too early.
Anthropic’s Core Agent Structure
According to Barry, agents mainly consist of:
1. Environment
The world the agent operates inside.
Examples:
→ browser
→ IDE
→ operating system
→ database
→ API environment
2. Tools
Interfaces the agent uses to take actions.
Examples:
→ search tools
→ terminal access
→ browser actions
→ file editing
→ API requests
3. System Prompt
Defines:
→ goals
→ constraints
→ behavior rules
→ execution style
4. Model Loop
The model repeatedly:
→ observes
→ reasons
→ acts
→ receives feedback
→ continues execution
Why Simplicity Matters
Complexity destroys iteration speed.
Many teams prematurely add:
→ memory systems
→ multi agent architectures
→ orchestration graphs
→ planning systems
→ reflection loops
before validating basic behavior.
Anthropic instead focuses on:
→ improving tools
→ improving prompts
→ improving environment feedback
Only after behavior works reliably do they optimize.
5. Think Like Your Agent
This was the most important operational insight in the talk.
Humans understand the full system.
Agents do not.
Agents only see:
→ limited context
→ tool outputs
→ screenshots
→ truncated history
→ delayed feedback
The Context Window Problem
Barry explained:
Even highly advanced agent behavior still comes from inference over a limited context window.
The model only knows what exists inside:
→ prompt
→ memory
→ visible observations
Nothing outside the context exists to the model.
Example: Computer Use Agents
Barry described what operating a computer feels like from the agent’s perspective.
The agent:
→ receives a screenshot
→ reads tool descriptions
→ performs an action
→ waits several seconds blindly
→ receives another screenshot
This creates uncertainty.
The model does not truly “see” continuous state changes like humans do.
What Agents Need
After thinking from the agent’s perspective, Anthropic realized agents need better context.
Examples:
→ screen resolution
→ UI structure
→ action constraints
→ recommended actions
→ environment limitations
These reduce:
→ confusion
→ random exploration
→ failed actions
6. Using Models to Improve Agents
Anthropic also uses models to improve agent systems themselves.
Examples:
→ reviewing prompts
→ validating tool descriptions
→ analyzing trajectories
→ identifying confusion points
Barry mentioned they sometimes provide the entire agent trajectory to Claude and ask:
→ Why did the agent make this decision?
→ Which part of the context caused confusion?
→ What information was missing?
This helps improve:
→ prompts
→ tools
→ context quality
→ action reliability
7. Future of Agent Systems
Barry ended with several open questions.
A. Budget Aware Agents
Today agents lack strict control over:
→ token usage
→ latency
→ operational cost
Future systems need:
→ execution budgets
→ spending constraints
→ latency limits
→ adaptive reasoning depth
B. Self Improving Tools
Anthropic already uses models to improve tool descriptions.
Future agents might:
→ redesign tools
→ optimize interfaces
→ generate new abstractions
→ improve their own ergonomics
C. Multi Agent Systems
Barry expects multi agent systems to grow significantly.
Advantages:
→ parallel execution
→ separation of concerns
→ smaller context windows
→ specialized responsibilities
But communication remains unsolved.
Current systems rely heavily on:
→ synchronous interactions
→ linear conversation patterns
Future systems likely need:
→ asynchronous coordination
→ persistent communication layers
→ inter agent protocols
Final Takeaways
Barry summarized the talk into three principles:
1. Don’t Build Agents for Everything
Use agents only when:
→ complexity exists
→ exploration matters
→ value justifies cost
2. Keep Systems Simple
Start with:
→ environment
→ tools
→ prompts
Optimize later.
3. Think Like the Agent
Understand:
→ what the model sees
→ what context is missing
→ where confusion happens
→ how feedback loops affect decisions
Practical Rules for Builders
If you are building AI agents today:
→ Start with workflows first
→ Avoid unnecessary orchestration
→ Validate behavior before scaling complexity
→ Improve tools before adding more agents
→ Keep prompts operational and constrained
→ Add verification systems wherever possible
→ Treat context quality as infrastructure
→ Reduce exploration when reliability matters
→ Use autonomy selectively
→ Measure token cost and latency from day one
