Building an AI agent in 2026 is no longer reserved for ML researchers. With the right frameworks and a clear mental model, any developer can build a production-ready AI agent in days. This tutorial walks you through the complete process — from picking an LLM to shipping a working agent — with real, runnable code at every step.
This tutorial assumes you know JavaScript/TypeScript or Python. No machine learning background required. We build a real customer support agent end-to-end.
What Exactly Is an AI Agent?
An AI agent is a program that uses a large language model (LLM) as its reasoning engine, can call external tools (APIs, databases, functions), maintains memory across steps, and autonomously plans and executes multi-step tasks to reach a goal. The key word is autonomous — unlike a chatbot that responds to one prompt at a time, an agent decides what to do next based on context, results, and its objective.
- LLM (Brain) — Processes instructions, reasons about the next action, generates responses
- Tools (Hands) — Functions the agent can call: search, database query, send email, write file
- Memory (Context) — Short-term (conversation) + long-term (vector store) to remember past interactions
- Planner (Orchestrator) — Breaks a high-level goal into a sequence of tool calls and LLM responses
- Guardrails (Safety) — Validates outputs, prevents irreversible actions, escalates to humans when needed
Step 1 – Choose Your LLM and Framework
The first decision is which LLM powers your agent. In 2026, the leading options are:
- Claude Sonnet / Opus (Anthropic) — Best for complex reasoning, long context (200k tokens), and following nuanced instructions. Excellent tool-use reliability.
- GPT-4o / GPT-4o-mini (OpenAI) — Fast, widely supported, great ecosystem. GPT-4o-mini ideal for high-volume, cost-sensitive agents.
- Gemini 1.5 Pro (Google) — Strongest for multimodal tasks (documents, images, audio) and Google Workspace integrations.
- Llama 3 70B (Meta, self-hosted) — Best for on-premise / air-gapped deployments where data cannot leave your infrastructure.
For this tutorial, we use the Vercel AI SDK (TypeScript) which supports all major providers through a unified interface — meaning you can swap models without rewriting your agent logic.
npm install ai @ai-sdk/openai @ai-sdk/anthropic zod
# or for Python:
pip install anthropic openai langchainStep 2 – Define Your Agent's Purpose and Tools
The biggest mistake developers make is building a generic agent. Start with a single, concrete use case. Our example: a customer support agent for a SaaS product that can look up user accounts, check subscription status, and escalate to a human when needed.
Every action the agent needs to perform becomes a tool. Define each tool with a clear description, input schema (Zod), and an execute function:
import { tool } from 'ai';
import { z } from 'zod';
// Tool 1: Look up a user account
const getUserAccount = tool({
description: 'Look up a user account by email address. Returns plan, status, and usage data.',
parameters: z.object({
email: z.string().email().describe('The user\'s email address'),
}),
execute: async ({ email }) => {
const user = await db.users.findByEmail(email);
if (!user) return { found: false };
return {
found: true,
plan: user.plan, // 'free' | 'pro' | 'enterprise'
status: user.status, // 'active' | 'suspended' | 'cancelled'
usage: user.currentUsage, // { apiCalls: number, storage: string }
};
},
});
// Tool 2: Escalate to human agent
const escalateToHuman = tool({
description: 'Escalate this conversation to a human support agent. Use when the issue is complex, involves billing disputes, or the user is frustrated.',
parameters: z.object({
reason: z.string().describe('Why escalation is needed'),
priority: z.enum(['low', 'medium', 'high', 'urgent']),
}),
execute: async ({ reason, priority }) => {
await ticketService.create({ reason, priority, conversation: getConversationHistory() });
return { escalated: true, ticketId: await ticketService.getLastId() };
},
});Step 3 – Build the Agent Loop
The agent loop is the core execution cycle: take user input → ask the LLM what to do → if the LLM calls a tool, execute it and feed the result back → repeat until the LLM produces a final answer.
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
async function runSupportAgent(userMessage: string, conversationHistory: Message[]) {
const { text, toolCalls, toolResults, steps } = await generateText({
model: anthropic('claude-sonnet-4-5'),
system: `You are a helpful customer support agent for BitPixelCoders.
Your goal is to resolve the user's issue accurately and efficiently.
- Always look up the user account before giving account-specific answers.
- If you cannot resolve the issue in 3 tool calls, escalate to a human.
- Never guess subscription details — always verify with the getUserAccount tool.`,
messages: [
...conversationHistory,
{ role: 'user', content: userMessage },
],
tools: {
getUserAccount,
escalateToHuman,
},
maxSteps: 5, // prevent infinite loops
});
return { response: text, steps };
}maxSteps is critical. Always set a hard limit on the number of tool-call iterations to prevent runaway loops and uncontrolled API costs.
Step 4 – Add Memory
Without memory, your agent forgets everything between sessions. There are two types to implement:
Short-Term Memory (Conversation History)
Store the message array in your database and pass it with every request. For long conversations, summarise older messages to stay within the LLM's context window:
// Load history from DB
const history = await db.conversations.getMessages(sessionId);
// If history is long, summarise old messages to save tokens
if (history.length > 20) {
const summary = await summariseOldMessages(history.slice(0, -10));
history.splice(0, history.length - 10, {
role: 'system',
content: \`Previous conversation summary: ${summary}\`,
});
}
// Pass to agent
const result = await runSupportAgent(userMessage, history);
// Save updated history
await db.conversations.appendMessage(sessionId, { role: 'user', content: userMessage });
await db.conversations.appendMessage(sessionId, { role: 'assistant', content: result.response });Long-Term Memory (Vector Store)
For knowledge retrieval — FAQs, product documentation, past resolved tickets — embed content into a vector store (Pinecone, Supabase pgvector, or Weaviate) and add a searchKnowledge tool. This is the foundation of RAG (Retrieval-Augmented Generation) and prevents hallucination.
Step 5 – Handle Errors and Edge Cases
Production agents fail in creative ways. Handle these patterns explicitly:
- LLM timeout/rate limit — Retry with exponential backoff (max 3 attempts), then return a graceful fallback message
- Tool execution error — Catch the error, return a structured error object to the LLM so it can decide how to proceed
- Hallucinated tool arguments — Validate all tool inputs with Zod before executing; reject invalid calls
- Infinite loop detection — Track tool call counts; escalate to human if agent exceeds maxSteps
- Sensitive data in outputs — Run a PII filter on the final response before returning to the user
// Wrap tool execute with error handling
const safeGetUserAccount = tool({
...getUserAccount,
execute: async (params) => {
try {
return await getUserAccount.execute!(params);
} catch (err) {
// Return structured error — agent can decide to retry or escalate
return { error: true, message: 'Database unavailable. Please try again shortly.' };
}
},
});Step 6 – Test Your Agent
Testing agents is harder than testing regular functions because behaviour is probabilistic. Use these strategies:
- Unit test each tool independently — mock inputs, verify outputs
- Golden dataset testing — create 20–30 representative user queries with expected outcomes; run your agent against them and score accuracy
- Adversarial testing — try prompt injection, off-topic requests, and edge cases; verify the agent does not take unintended actions
- Regression testing — run the golden dataset on every model update or system prompt change
- Cost tracking — log token usage per session during testing to estimate production costs
Step 7 – Deploy to Production
Wrap your agent in an API endpoint and deploy. A minimal Next.js App Router setup:
// app/api/agent/route.ts
import { NextRequest, NextResponse } from 'next/server';
export async function POST(req: NextRequest) {
const { message, sessionId } = await req.json();
if (!message || !sessionId) {
return NextResponse.json({ error: 'Missing message or sessionId' }, { status: 400 });
}
const history = await db.conversations.getMessages(sessionId);
const result = await runSupportAgent(message, history);
// Persist messages
await db.conversations.appendMessage(sessionId, { role: 'user', content: message });
await db.conversations.appendMessage(sessionId, { role: 'assistant', content: result.response });
return NextResponse.json({ response: result.response });
}For streaming responses (so the UI updates token-by-token), use streamText instead of generateText and return a StreamingTextResponse. This dramatically improves perceived performance for end users.
Python Alternative: LangChain / LangGraph
Prefer Python? LangGraph is the most production-ready option in 2026 for complex agentic workflows. It models your agent as a stateful graph, making branching logic, loops, and multi-agent handoffs explicit and debuggable:
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
llm = ChatAnthropic(model='claude-sonnet-4-5')
@tool
def get_user_account(email: str) -> dict:
"""Look up a user account by email. Returns plan and status."""
user = db.users.find_by_email(email)
return {'plan': user.plan, 'status': user.status} if user else {'found': False}
llm_with_tools = llm.bind_tools([get_user_account])
def agent_node(state):
response = llm_with_tools.invoke(state['messages'])
return {'messages': [response]}
graph = StateGraph(dict)
graph.add_node('agent', agent_node)
graph.set_entry_point('agent')
graph.add_edge('agent', END)
agent = graph.compile()Key Metrics to Monitor in Production
- Resolution rate — % of queries resolved without human escalation (target: >80% for a mature support agent)
- Average steps per query — higher = more expensive; optimise your system prompt if this rises above 3–4
- Token cost per session — track P50/P95; set alerts if cost per conversation spikes
- Tool error rate — if a specific tool fails >5% of the time, investigate the integration
- User satisfaction score — CSAT or thumbs up/down after each agent interaction
Need help building a production AI agent for your business? BitPixel Coders specialises in designing and deploying AI agents — from customer support bots to autonomous workflow systems. Get in touch for a free consultation.
Get a Free ConsultationFrequently Asked Questions
Related Guides
- Building AI Agents That Actually Work: A Practical Guide for 2026
- Multi-Agent Orchestration: Best Practices for 2026
- Best AI Tools for Building AI Agents in 2026
- AI Automation Trends 2026 – What Every Business Needs to Know
Written by
Anju Batta
Senior Full Stack Developer & AI Automation Architect
15+ years experience building web applications, AI automation systems, and cloud infrastructure. Delivered 500+ projects for clients worldwide at BitPixel Coders.
LinkedIn Profile