The phrase AI agent has been doing a lot of work in 2026. It covers everything from a chatbot with a clever prompt to autonomous research systems that run for hours. That ambiguity is making it hard for teams to figure out what to actually build.
We design and ship AI agents for a living. Here's our working definition — what an agent is, what it isn't, and what shapes of work it actually fits.
The working definition
An AI agent is a software system that uses a large language model (LLM) to:
- Reason about a goal or task,
- Take actions via tools (APIs, databases, browsers, command-line),
- Observe the results of those actions,
- Adapt its plan and continue until the goal is met or the agent escalates.
The shorthand: an agent uses tools to do things, not just produce text.
Compare that to:
- A chatbot: takes input, produces text, takes input, produces text. No actions, no tool use, no memory of what it did previously (usually).
- A copilot: suggests things to a human; the human takes actions. The copilot is reactive and human-driven.
- A workflow automation: deterministic pipeline. Input flows through known steps to known output. Maybe with AI for a classification or extraction step, but the workflow itself is fixed.
- An agent: decides what to do, takes actions, sees results, decides again. The control flow is dynamic.
Most production "agents" in 2026 are actually somewhere between workflow automation and the full agent definition. That's fine — agency is a spectrum, not a binary.
A concrete example: customer support
A chatbot version: customer types question, model generates response, conversation ends.
A copilot version: customer types question, model suggests a draft response to a support rep, rep edits and sends.
A workflow automation version: ticket comes in, AI classifies it (urgency, category), routes to the right queue. Pre-defined steps.
An agent version: ticket comes in. The agent reads it, decides to look up the customer's order in your CRM (tool call), reads the order history, decides the customer needs a refund, checks refund policy (tool call), calls Stripe to process the refund (tool call), updates the ticket with what happened (tool call), and replies to the customer with a confirmation. The agent decided each step based on what it learned.
Each step up the ladder is more capable but also harder to build, debug, and operate. Most teams overestimate where they should start.
What makes an agent production-ready
A production agent needs:
Tool access
The agent calls APIs, databases, and other tools to actually do things. Reading data from your CRM. Writing back. Sending email. Triggering workflows. Without tools, the agent is just a chatbot.
The emerging standard for tool access is MCP (Model Context Protocol) — an open spec from Anthropic that lets you wire tools to models in a portable way.
Memory
Most agents need to remember something across turns or sessions. Conversation history, user preferences, prior decisions, facts about the world. Memory is layered:
- Short-term: the current conversation.
- Episodic: prior conversations and decisions.
- Semantic: facts the agent has learned (the customer's name, their preferences, prior issues).
Bad memory architecture is one of the most common reasons agents feel dumb in week three.
Structured outputs
When the agent's output drives downstream code, that output needs to be predictable. JSON schemas, type-safe outputs, validation. The agent's "creative" mode is fine for replies to users; it's a disaster for inputs to other systems.
Human-in-the-loop
Agents should escalate to humans when:
- Confidence is low.
- The action is destructive or expensive (refunds, sending mass email, modifying critical data).
- The case is unusual.
The escalation path is part of the design, not a fallback.
Evals
You need a test suite. Cases where you know the right answer; the agent's answer scored against it. Regressions caught before production. Without evals, you're vibe-checking — and the agent's quality will silently drift.
Observability
Every agent run logged. What was asked, what tools were called, what the model thought, what the result was. Replayable for debugging. Critical for production.
Cost + latency control
Each model call costs money. Each tool call costs time. Production agents need budget caps, per-user limits, model routing (Haiku for cheap routing, Sonnet for reasoning, Opus for hard cases), caching, and batching where possible.
Skip any of these and your agent might work in a demo but fall over in production.
What agents are good at
- Multi-step tasks where the path varies. Customer support, internal triage, sales research. The agent decides each step based on what it learned.
- Tasks that integrate multiple systems. When the work involves reading from CRM, checking inventory, drafting an email, and updating a ticket — agents glue it together.
- Tasks where the failure mode is acceptable. When "the agent got it wrong on this case, escalate to a human" is a fine outcome.
- Tasks where reasoning helps. Drafting, classification with nuance, summarization, prioritization.
What agents are bad at
- Tasks with no real reasoning step. If the work is "given X, do Y, every time, deterministically" — that's not an agent task. It's workflow automation. Cheaper, faster, more reliable.
- Tasks where the failure mode is severe. Anything legal, medical, safety-critical, or financially destructive where a single wrong action is a real problem.
- Genuinely novel tasks. Agents pattern-match against training data. They're better at well-trodden shapes than unprecedented ones.
- Tasks at huge volume with tight margins. Cost-per-resolution might exceed the value of the resolution. Worth modeling carefully.
Building vs. buying
Off-the-shelf agent platforms (Intercom Fin, Ada, Forethought, Decagon, OpenClaw, Hermes Agent) are good fits for standard use cases. Custom agents win when your workflow is non-standard, your data lives in non-standard places, or the agent is strategic infrastructure.
We wrote a more detailed framework on this in our Build vs Buy AI Agent comparison.
The most common mistake
Teams jumping to multi-agent systems before they've shipped a single agent. The literature on multi-agent is exciting; the production reality is that getting one agent right is hard, and most "multi-agent" problems are better solved with one agent and clear tool design.
Start with one agent on one workflow. Get it to production. Add a second agent (or expand the first) when you have a specific reason.
Where to go from here
If you want the practical "how to build" guide: read How to build an AI agent.
If you want to think about which framework: OpenClaw vs Hermes Agent covers two leading open-source options. Build vs Buy AI Agent covers the higher-level decision.
If you want to talk to a team that's done this: our AI Agent Development service is exactly that.
