Vibe coding is a term coined by Andrej Karpathy in 2025 to describe AI-assisted development where the developer describes what they want in natural language, accepts the generated code without fully reading or understanding it, and moves on. It works for low-stakes projects. It becomes dangerous in production systems.

Is vibe coding always dangerous?

No. For personal projects, prototypes, and low-stakes tools, vibe coding is a legitimate productivity approach. The danger emerges when this mindset is applied to systems where failures have real consequences — financial infrastructure, healthcare, aviation, or any system where other people bear the cost of the code being wrong.

Why can't AI write production-quality code for critical systems?

AI can generate production-quality code in narrow contexts, and it can be a useful tool for experienced developers. The problem is that critical systems require more than code that looks correct. They require domain understanding, institutional memory, explicit assumptions, operational knowledge, failure-mode reasoning, tests that cover real risks, and human accountability. AI does not carry that context or responsibility by itself. In critical systems, AI-generated code is only acceptable when the team can explain it, test it, operate it, and own it.

What should developers do differently?

Use AI tools to accelerate work you already understand — not to replace understanding you don't have. Review every line of AI-generated code as if you wrote it yourself, because in production, you own it. In critical systems, never approve code you don't understand.

Has AI-generated code already caused production failures?

Yes. Multiple organisations have reported production incidents attributed to AI-generated code that introduced security vulnerabilities, race conditions, and logic errors that weren't caught in review. The full scale of the problem is not yet documented, partly because post-mortems rarely identify "AI-generated code" as the root cause — they identify the specific failure, not its origin.

Vibe Coding Will Kill Someone

In February 2025, Andrej Karpathy — former Tesla AI director, former OpenAI founding member — posted a description of how he now writes software. He called it vibe coding. The idea: describe what you want in natural language, let the AI generate the code, accept it without reading it too carefully, and move on. "I'm building something," he wrote, "but it's not really coding."

For a personal project or a landing page, this is fine. Nobody dies if a side project has a race condition.

The problem is what happens when this mindset leaves the prototype and enters production. When it enters systems that process financial transactions, manage medical records, or control infrastructure. When the person who deployed the code doesn't understand what it does — and nobody reviewed it closely enough to find out.

Vibe coding is not dangerous because AI writes code. It is dangerous when it breaks the chain of human understanding, review, testing, and accountability in systems where failure has real consequences.

We've seen this before. The disasters documented on this blog follow a pattern: someone made a decision about a system they didn't fully understand. Vibe coding doesn't create a new failure mode. It industrialises an existing one.

Jump to FAQ ↓

What Vibe Coding Actually Is

Karpathy's description was honest. He wasn't advocating for vibe coding in production systems — he was describing a personal workflow for experimentation. But the term landed in a community already primed to hear it as permission.

In 2025 and 2026, vibe coding became a movement. Developers began shipping production code they couldn't explain. Many of them were new to the profession — but the failure is not individual laziness. It is a system that rewards output over understanding, and speed over accountability.

The code works in the demo. It passes a basic smoke test. A smoke test can tell you that the path works once. It cannot tell you what happens under load, retry, partial failure, clock drift, stale state, duplicate messages, or concurrent writes. It gets deployed.

Three weeks later, something breaks in a way the developer doesn't understand, because they never understood the system in the first place.

What Critical Systems Actually Require

I spent years working on payment infrastructure at SIBS, the organisation behind MBWay — a system that processes millions of transactions across Portugal. Before you touched a single line of code in that environment, you needed to understand the business domain deeply. Not superficially. Not "I read the documentation." You needed to know where you were operating, what the data represented, what a failure at that point in the pipeline would cost — in money, in trust, in regulatory consequences.

The codebase was complex in a way that commanded respect. Method signatures with generics that took real thought to follow. Classes designed with intention, not convenience. Every Optional, every Stream, every carefully placed lambda existed because someone had reasoned about the failure mode it was preventing. It was genuinely beautiful engineering — not in an academic sense, but in the sense that it had survived contact with reality and held.

You didn't deliver code in that environment without unit tests that actually tested something. Not coverage metrics — tests that would catch the edge cases that would surface at 3am on a Saturday when transaction volume spiked. Exception handling wasn't defensive boilerplate. It was the result of someone thinking carefully about every way the operation could fail and deciding, deliberately, what the system should do in each case.

AI can generate code that looks correct, and sometimes is correct. What it cannot do by itself is carry the domain understanding, institutional memory, accountability, and failure-mode reasoning that critical systems require. It does not know when the code it produced is sufficient for this specific system, under these specific conditions, with these specific failure modes.

Code Review Is Shared Responsibility

In critical systems, code review is not a formality. When you approve someone's code, you are taking responsibility for it. You are saying: I understand what this does, I believe it is correct, and I am willing to own the consequences if it isn't.

That changes how you review. You don't skim. You ask questions. You push back. You don't approve code you don't understand, because approving code you don't understand means the next incident is partly yours.

Vibe coding breaks this contract at the source. If the developer who wrote the code doesn't understand it, the reviewer is being asked to take responsibility for something nobody understands. The accountability chain doesn't just weaken — it disappears.

If the author cannot explain the code, the reviewer is no longer reviewing an implementation. They are reverse-engineering an unowned decision.

The Knight Capital failure in 2012 is the clearest example of what happens when this chain breaks. A developer deployed code without fully understanding the system state it would encounter. The reviewer approved it. Forty-five minutes later, $440 million was gone. Nobody set out to cause that failure. Everyone was just moving fast and trusting that the system was understood. It wasn't.

The Pattern We Keep Repeating

These failures were not caused by AI. They matter here because they show the same class of organisational failure: deploying systems whose behaviour was not sufficiently understood under real conditions.

The Therac-25 killed six people. A single developer, no independent review, software that had never been tested at the edge cases that mattered. The assumption that the system worked because it had worked before.

The Ariane 5 destroyed itself sixty-four seconds after launch. Reused code from a different rocket, a different flight profile, an integer overflow that nobody caught because the assumption was that the old code was safe.

The Boeing 737 MAX failures were not a simple coding mistake. They were a sociotechnical failure: design assumptions, organisational pressure, certification gaps, training decisions, and software behaviour interacted in a way the system as a whole did not adequately account for. 346 people died not because of one bad decision, but because a chain of decisions — each individually defensible — combined into something nobody had fully modelled.

The pattern is not malice. It's not incompetence in the simple sense. It's the assumption that the system is understood when it isn't. It's the gap between what the code appears to do and what it actually does under conditions nobody thought to test.

Vibe coding doesn't just risk reproducing this pattern. It makes the pattern the default workflow.

This Is Not Anti-AI

Using AI tools to write software is not the problem. I use them. AI tools are now part of many serious software development workflows. The question is not whether you use AI — it's whether you understand what the AI produced.

There is a difference between using AI to accelerate work you understand and using AI to replace understanding you don't have. The first is a productivity tool. The second is a liability.

The dangerous version of vibe coding is not Karpathy experimenting on a weekend project. It's a developer who has never learned to reason about concurrency using AI to write multithreaded payment processing code, accepting the output because it compiled, and deploying it because nobody stopped them.

Not all production code carries the same risk. A misconfigured landing page costs a conversion. A misconfigured internal dashboard causes confusion. A misconfigured payment pipeline causes silent data corruption, regulatory incidents, and reconciliation failures that take weeks to diagnose. The argument here is not about all production systems. It is about the ones where the failure mode is measured in money, data integrity, or lives.

In a system that processes 1,400 financial events per second — the kind of system I have worked on — a race condition does not produce an error message. It produces silent data corruption: balances that do not match, duplicate settlements, ledger inconsistencies, lost updates, reconciliation failures. The kind of failure that takes weeks to diagnose and costs more than the entire team's annual salary to remediate.

The AI that wrote the code won't be in the post-mortem. The developer who deployed it will be. The engineering manager who approved it will be. And when someone asks who understood what that code was doing before it went to production, nobody will have a good answer.

What Responsible AI-Assisted Development Looks Like

Using AI responsibly in critical systems does not mean refusing the tool. It means refusing to let the tool replace engineering judgement.

AI-generated code should not enter a critical codebase unless:

the developer can explain what it does and why it is correct;
the relevant domain assumptions are explicit;
edge cases and failure modes have been considered;
tests cover the behaviour that matters, not just happy paths;
concurrency, persistence, retries, idempotency, and rollback behaviour are understood where relevant;
code review validates behaviour, not just style;
observability exists for the failure modes the change could introduce;
a human owner is accountable for the code after deployment.

The useful question is not 'did AI write this?' The useful question is 'can the team explain, test, operate, and own this?'

If the answer is yes, AI was a tool. If the answer is no, the team has shipped risk disguised as productivity.

Who Carries the Risk

Software runs in hospitals. It runs in aircraft. It runs in the financial infrastructure that millions of people depend on without knowing it exists. It runs in systems that keep people alive.

The people whose lives depend on those systems are not in the room when the code is written. They don't know whether the developer who built the system understood it. They don't know whether the reviewer read it carefully. They are trusting, without knowing they're trusting, that someone in that chain took the responsibility seriously.

Vibe coding is a choice to opt out of that responsibility. To build something you don't understand and deploy it into a world where other people bear the consequences of your choices.

For a weekend project, that's a personal decision. For production systems that affect real people — it's a different kind of choice entirely.

The next major software failure involving AI-generated code will not happen because a model wrote a bad function. It will happen because a team accepted code it did not understand, reviewers approved behaviour they did not verify, and an organisation rewarded speed over accountability. The only question is whether anyone in the review chain will stop it.