How to vibe code — the workflow that actually ships

There's a version of vibe coding that's pure meme — type a vague prompt, watch an AI generate something, ship it, post the screenshot to X. That version produces software that falls apart at first contact with real users.

There's another version — the one we actually do — where senior engineers use AI tools to ship production code 5–10x faster than they would otherwise. Same architecture, same tests, same observability. Just compressed time-to-ship.

This post is how to vibe code the second way. The practical workflow that gets real software out the door.

What "vibe coding" actually means

The phrase came from Andrej Karpathy describing a workflow where you "give in to the vibes" of AI-assisted coding — letting the AI tool drive most of the typing while you guide direction and review output. It went viral, the term got generalized, and now it covers everything from "Cursor power user" to "type prompts into Claude and copy the answers."

The version that ships software:

Senior engineer (you, or someone you trust) sets architecture and reviews output
AI tool (Claude Code, Cursor agent mode, Aider) does the actual typing across files
Tests, type checks, and reviews happen the same way they would on hand-written code
Velocity comes from compressed typing time, not from skipping engineering practices

The version that ships garbage:

Someone without senior engineering judgment prompts their way through a project
Everything the AI generates gets accepted without review
Tests are skipped because "it works"
The codebase becomes unmaintainable in weeks

The difference between the two isn't the tools. It's the judgment layered on top.

The workflow we use

Step 1: Have an opinion about the architecture before touching the AI

The single highest-leverage move: know what you're building before you start vibe coding. The agent is execution-fast and design-mediocre. If you let it design, it'll pick whatever pattern is most common in its training data, which is rarely the right call for your context.

So before any prompts:

What's the data model?
What's the deployment target?
What are the major files?
What patterns does your codebase already use?

Write these down. The agent will follow your lead — but only if you give it one.

Step 2: Set up the toolchain

Our current toolchain (mid-2026):

Claude Code as the primary agent. Strong multi-file context, runs commands, reads test output.
Cursor in agent mode for in-editor work when manual edits and AI edits interleave.
MCP servers that give the agent context on the actual project state (current branch, open issues, recent commits, design docs).
A clean repo. Agents work much better on tidy codebases. Lint-free, type-clean, with good naming.

The toolchain investment pays back within days. A 30-minute setup makes every future session faster.

Step 3: Plan before executing

Always start a task with: "Plan only. What would you change, in which files, with what approach? Don't write code yet."

The plan reveals misunderstandings cheaply. Maybe the agent wants to install a library you don't want. Maybe it's going to put the new code in the wrong place. Maybe it has a different mental model than you. All catchable in 30 seconds of reading the plan.

We catch ~20% of agent mistakes at the plan stage. That's the highest-ROI step of the workflow.

Step 4: Execute in small steps

When the plan looks right, execute — but small. "Implement step 1 of the plan only. Don't move on."

Small steps mean:

Each step is reviewable in 2–3 minutes
Each step is a clean git commit
If something goes wrong, you roll back one thing, not the whole task
You stay in the loop instead of zoning out

The temptation is to dump everything at once and let the agent run. Resist it. Speed comes from compounding small wins, not big leaps.

Step 5: Review like code review

Every diff goes through a real review. Specifically:

Does the code match patterns elsewhere in the project?
Are there magic numbers that should be config?
Are error paths handled?
Are types tight?
Did the tests actually test what was supposed to be tested?
Is there anything weirdly clever that would be hard to maintain?

The agent isn't writing bad code on purpose. But it's optimizing for "code that probably works" not "code that's clearly correct." Your review converts the first to the second.

Step 6: Verify, then commit

Run the tests yourself. Run the build yourself. Hit the changed endpoint yourself. If it's a UI change, look at it in the browser.

The agent's verification is fast and usually right. Yours is the trustworthy one. Always do it.

Where the time savings come from

A concrete example. Say you're adding a new API endpoint with database access, validation, tests, and a UI form to hit it.

Hand-coded breakdown:

Design: 15 min
Write the route + validation: 30 min
Write the database query + types: 20 min
Write the tests: 30 min
Write the form: 30 min
Wire it up: 15 min
Total: ~2 hours

Vibe-coded with this workflow:

Design: 15 min (same — judgment, not typing)
Prompt + plan review: 5 min
Execute (route + validation): 5 min, review 3 min
Execute (db + types): 5 min, review 3 min
Execute (tests): 5 min, review 3 min
Execute (form): 5 min, review 3 min
Verify everything: 10 min
Total: ~1 hour

That's 2x. On tasks where the AI is a better fit (mechanical work, well-established patterns), the savings are larger — 5x or more. On tasks heavy in judgment, the savings are smaller — sometimes none.

The 5–10x velocity claims you'll hear are real on the right shape of work. They're not real across the board.

Common questions

For production code: yes, basically. You need to be able to recognize bad code, bad architecture, and bad tests when the agent produces them. Without that calibration, you'll ship bugs faster.

For prototypes and experiments: no, anyone can vibe code productively. Just don't ship the prototypes as production software without a senior review.
Claude Code for us, day to day. Cursor is excellent. Aider for surgical work. The space moves fast — by the time you read this the answer may differ. Try two or three; pick what works for your workflow.
Apply normal code review standards. The agent is producing code; code goes through review. If you wouldn't accept it from a junior, don't accept it from the agent. Your taste filter is the bottleneck on quality.
Yes — multi-file refactors are one of the strongest use cases. Type-system migrations, schema changes, framework upgrades. Plan in small steps, commit often.
Great for greenfield, with a caveat: set the architecture by hand first. Get the data model right, set up the basic file structure, install your libraries, configure your tools. Then turn the agent loose on building features within that scaffolding.

A starter assignment

Pick a feature you'd hand-code in 2 hours. Build it with the workflow above. Time yourself honestly. Compare the result to your normal output.

Two outcomes are likely:

It took longer than you expected. (You're learning the workflow.)
It was much faster but the code is worse than your normal output. (You're not reviewing carefully enough.)

After a week, both will fix themselves. The third outcome — faster and good — is where you're trying to get. It takes practice, and then it's the new baseline.

Where to go from here

If you want the deeper theory, read What is agentic coding and The agentic coding guide.

If you want our team to do this for your project: our Custom Software Development and AI Agent Development services use this workflow on every engagement.

And if you want to hire someone for this specifically: hire a vibe coder.