We ship software for a living. We also ship software with agentic coding tools — Claude Code, Cursor, Aider — every day, on real client projects and our own products.
This guide is what we actually do. Not what's in the marketing material, not what works on demo repos. The workflow that gets production code shipped fast without producing the kind of plausible-looking garbage that the agentic coding meme is famous for.
The mental model that fixes most problems
The single biggest mistake teams make with agentic coding: treating the agent like a contractor instead of like a power tool.
A contractor takes a vague prompt, makes a bunch of decisions on their own, and hands back something. You assess the result and either accept it or rework it. You can do that because the contractor is making reasonable judgments about what you meant.
The agent is not a contractor. It will happily implement the literal interpretation of an ambiguous prompt, choose libraries you don't use, and create patterns that don't match the rest of your code. The output looks plausible but is structurally wrong.
The fix: treat the agent like a power tool. You guide it precisely. You read what it's about to do before it does it. You catch mistakes early instead of after a thousand lines have been written. The agent is doing the work; you are doing the judgment.
This is the mindset shift. Everything else flows from it.
The five-step workflow we use
1. Scope the task precisely
Before opening the agent, write down — in plain text — exactly what you want changed. Not "improve the contact form." That's a contractor prompt. Write:
Update
/api/contact/route.tsto add basic rate limiting using Upstash (already installed). Allow 5 submissions per IP per hour. Return 429 with a JSON error message on excess. Add a test for the rate-limit case.
That's the prompt. Specific files, specific library, specific behavior, specific test. Reading that, the agent has almost no room for misinterpretation.
If you can't write a precise prompt, you don't know what you want yet — and the agent definitely doesn't. Stop and figure it out.
2. Ask for the plan first
Most agentic tools default to executing as soon as you prompt. Don't let them. Start with: "Plan only — what files will you edit and what changes will you make? Don't write code yet."
You'll be surprised how often the plan reveals a misunderstanding. The agent says "I'll add the rate limit middleware in /middleware.ts" and you realize you don't have a middleware.ts and your routing works differently. You correct the plan in 20 seconds. Without that step, you'd be reviewing 200 lines of code that does the wrong thing.
3. Execute in small steps
Once the plan is right, let the agent execute — but watch.
If the task is more than a few files, break it into stages. "Step 1: Add the Upstash client and the rate limit helper. Stop." Then: "Step 2: Wire it into the route." Then: "Step 3: Add the test."
Small steps mean you catch mistakes before they cascade. They also mean each step is a clean git commit, which is invaluable when you want to roll back one piece.
4. Read every diff like code review
This is where teams fail. They get into the rhythm of accepting agent output and stop reading. Two weeks later they're debugging a system half written by an AI that nobody fully understands.
Read every diff. Specifically watch for:
- Library inventions. The agent imports something you don't use? Either install it deliberately or reject the change.
- Pattern drift. The new code uses a different state management approach than the rest of the codebase? Either accept the drift consciously or steer the agent back.
- Magic numbers. Hardcoded values that should be config? Push back.
- Skipped error handling. Agent code is often too optimistic about happy paths. Add the cases it skipped.
- Test coverage. Did it actually write the test you asked for, and does the test actually exercise the new behavior, or does it just confirm the existing behavior still works?
A 5-minute careful read of a diff is worth more than 5 hours of debugging the same diff later.
5. Run the verification yourself
The agent will say "tests pass." Maybe they do. Run them yourself.
The agent will say "the build succeeds." Run the build.
The agent will say "this fixes the bug." Reproduce the bug yourself and confirm it's gone.
The agent is right most of the time. The dangerous case is when it's wrong with confidence. Verification is the only protection.
Choosing tools
We use different agentic tools for different shapes of work. The current state of the art (as of mid-2026):
Claude Code
Anthropic's CLI agent. Our default for most agentic work. Particularly strong for:
- Multi-file changes that need understanding of project context
- Refactors and migrations
- New features in established codebases
- Anything involving running tests or commands
The CLI shape is a feature: you stay in the terminal, the agent stays scoped to the repo, and you avoid the IDE distractions that pull you back into typing yourself.
Cursor (agent mode)
Best for in-editor work where you want to see context inline. Strong autocomplete in addition to agent mode. We use it when we need to interleave manual edits with agent edits closely.
Aider
Surgical, terminal-based, great at small precise refactors. We reach for it when we have a tight scope and want minimal magic.
Devin and similar autonomous agents
Larger autonomous agents that take vague prompts and run for longer. We don't use these for production code yet — the cost/quality tradeoff isn't compelling for our work. Worth tracking; not yet our default.
Failure modes we've seen
Confident wrong answers
The agent will sometimes assert things that are not true with the same tone it uses for true things. "I've added the test and it passes" — and the test is checking the wrong thing. Verification is the only defense.
Pattern drift
Over a few weeks, accepting agent changes without enforcing patterns produces a codebase with multiple styles, multiple state management approaches, multiple ways of doing the same thing. Maintainability tanks. Combat this by being explicit about patterns in your prompts and rejecting deviations.
Library sprawl
The agent will install a library to do something your codebase already does. Two weeks later you have four UI libraries, three HTTP clients, and two date utilities. Push back hard. Make the agent use what's already there.
Skipping tests
When the agent's test fails, sometimes the easier fix is to weaken the test instead of fix the code. Watch for tests that get less strict over time. If you see this, the test is the problem and the underlying code is probably broken.
Hidden complexity
Generated code can be technically correct but structurally complex — clever in ways no human would write. The maintenance cost shows up months later when nobody can debug it. Push for simple, boring code. "Rewrite this more simply" is a useful prompt.
When agentic coding is the wrong tool
Not every coding task is well-served by an agent. Manual coding still wins for:
- Genuinely novel architectural work. The agent is pattern-matching; if there's no pattern, it'll guess badly.
- Performance-sensitive code where you need deep understanding. The agent will make plausible changes that often regress performance subtly.
- Security boundaries and auth. The blast radius of a mistake is too high.
- The first version of a new system. Get the data model and shape right by hand, then let agents accelerate the build.
A good rule: use agentic tools for execution within known patterns and manual coding for design of new patterns.
A starter routine
If you're new to this, try this for one week:
- Pick a project you know well.
- Use Claude Code (or your tool of choice) for every task you would have done manually.
- Insist on the plan-first workflow.
- Read every diff.
- Keep a daily note: what worked, what didn't, what surprised you.
By Friday you'll have intuition for when to delegate and when to type. That intuition is the actual skill.
Where this is going
The tools are improving fast. The set of things "well-trodden enough to delegate" is expanding. Two years from now most software engineers will work this way as default.
The skill that stays valuable is judgment. What to build, how to architect it, what to push back on, what to ship. The agent makes execution cheap; that makes judgment more valuable, not less.
Going deeper
If you want to build production AI systems on top of these patterns, our AI Agent Development service covers the architectural side.
If you're hiring someone who's already doing this, hire a vibe coder is the page.
And if you're just trying to wrap your head around the basics: start with what is agentic coding and come back here for the practical workflows.
