1. Understanding the Core Concept
Before writing any code, you need to understand how AI agents actually work.
A normal LLM works like this:
Input → Model → Output
Example:
You ask:
“Create a folder”
The LLM replies with text explaining how to create it.
But it cannot execute the command itself.
The Problem With LLMs
LLMs cannot:
- access your computer
- run commands
- read emails
- control browsers
- interact with external tools
They only generate predicted text tokens.
So they act like a brain without a body.
2. The Agent Architecture
To make AI perform real actions, you need three components.
1. Brain (LLM)
This is the reasoning layer.
Examples:
- OpenAI GPT
- Claude
- Gemini
Responsibilities:
- understand user request
- plan steps
- choose tools
2. Tools (Body)
Tools perform real actions.
Examples of tools:
executeCommand
readFiles
writeFiles
browserAutomation
dockerControl
These are functions written by developers.
The AI calls them when needed.
Example:
AI decides:
Execute command → mkdir project
The tool runs the command on the system.
3. Gateway (Communication Layer)
This is how users communicate with the agent.
Examples:
HTTP API
Telegram bot
WhatsApp bot
Web interface
The gateway forwards user requests to the AI agent.
3. High Level Workflow
A complete AI agent system works like this.
- User sends request
- Gateway receives request
- Request goes to the AI model
- AI analyzes the task
- AI selects the correct tool
- Tool executes action
- Result returns to AI
- AI decides next step
- Process repeats until task completes
This loop is called the agent execution loop.
4. Tools Used in the Demo
The system in the transcript uses several tools.
1. OpenAI API
Used for the LLM reasoning layer.
Purpose:
- understand instructions
- decide which tool to call
- generate commands
Example models:
GPT-4.1
GPT-5
2. Express.js
Used to create the API server.
Purpose:
- receive requests
- send requests to the agent
- return results
3. Node.js
Used as the runtime environment.
Purpose:
- run the server
- execute commands
- manage tools
4. Requestly
Used for API debugging.
Purpose:
- inspect network calls
- debug API requests
- analyze LLM responses
5. Playwright
Used for browser automation.
Purpose:
- open browsers
- navigate websites
- perform actions
Example:
AI opens Chrome
Searches a query
Clicks buttons
6. Docker
Used to run containers.
Example tasks:
- start Nginx server
- run Apache server
- manage containers
5. Project Setup
Create a new project.
Example:
mkdir openclaude-agent
cd openclaude-agent
Initialize project.
npm init
Install dependencies.
openai
express
Optional development dependencies:
types for node
types for express
6. Building the Agent Layer
The agent layer is responsible for:
- sending prompts to the LLM
- receiving responses
- calling tools
The system prompt instructs the AI how to behave.
Example instructions:
“You are an AI assistant capable of controlling the user’s machine.”
The prompt also describes available tools.
Example:
Tool: executeCommand
Description:
Executes a system command and returns output.
7. Creating the First Tool
Example tool:
executeCommand
Purpose:
Run system commands.
Implementation idea:
Use Node’s child_process module.
The tool receives:
command string
Example commands:
mkdir project
ls
docker run nginx
The tool executes the command and returns output.
8. Tool Calling Logic
The LLM response must follow a structured format.
Two possible responses:
- Text output
- Tool call
Example structure:
type: tool_call
tool_name: executeCommand
params: [“mkdir test”]
Or
type: text
content: “Folder created successfully”
This structure allows the system to decide what to do next.
9. Agent Execution Loop
Agents work using a loop.
Steps inside the loop:
- Send conversation history to LLM
- Receive structured response
- If tool call → execute tool
- Add tool result to conversation
- Repeat
Loop ends when the AI returns a final message.
10. Connecting the Agent to an API
Next step is exposing the agent through an API.
Using Express.
Example route:
POST /message
Input:
user message
Example request:
Create a folder named project
The server sends this message to the agent.
The agent processes the request and returns results.
11. Example Agent Tasks
Once connected, the agent can perform real tasks.
Example 1
Create folder
User request:
Create a folder called test
Agent steps:
AI decides command
Tool executes mkdir test
Result:
Folder created.
Example 2
Create project files.
User request:
Create a To-Do app with HTML, CSS and JS.
Agent steps:
Create folder
Generate files
Write code
Return result
Example 3
Run Docker container.
User request:
Run Nginx server on port 8080
Agent steps:
Pull Nginx image
Run container
Expose port
Example 4
Browser automation.
Using Playwright.
User request:
Open Google and search “AI agents”.
Agent steps:
Launch browser
Navigate to Google
Perform search
12. Error Handling
Agents must handle errors.
Example problems:
tool fails
command fails
missing dependency
Solution:
Wrap tool execution in try/catch.
If error occurs:
Return error to agent.
The AI can then:
fix command
retry action
13. Expanding Agent Capabilities
The system can be extended by adding more tools.
Examples:
Email automation
File management
Browser scraping
Database queries
API integrations
Each new tool expands the agent’s abilities.
14. Adding Messaging Channels
Instead of HTTP API, you can connect the agent to messaging apps.
Examples:
Telegram bot
WhatsApp bot
Slack bot
Users send commands through chat.
The gateway forwards messages to the agent.
15. Security Considerations
Giving AI system access is risky.
Important protections:
command restrictions
sandbox environments
limited permissions
approval systems
Never allow unrestricted command execution in production.
16. Final Architecture
The complete system contains three layers.
User Layer
User interacts through:
API
Telegram
Gateway Layer
Handles communication and routing.
Agent Layer
Contains:
LLM
tools
execution loop
Tools Layer
Performs real world actions.
Final Takeaway
AI agents are not complicated.
They are built from three components.
LLM reasoning
tools for execution
loop for decision making
LLM thinks.
Tools act.
Agents combine both.
This is the same principle behind modern systems built with:
AI agents
automation workflows
platforms like n8n.
