AI

Amazon's AI Code Broke Production. Don't Let Yours Be Next

Mar 23, 2026 · by Tim Kamanin

I've been using Claude Code as my daily coding companion for months now. It writes my boilerplate, catches patterns I'd miss, and saves me real time. I'm a fan.

So when Amazon lost 6.3 million orders in a single day because of AI-generated code that was deployed to production, it got my attention.

On March 2, an AI-assisted code change triggered 1.6 million website errors and 120,000 lost orders. Three days later, a separate incident caused a 99% drop in orders across North American marketplaces. Amazon called it "high blast radius" (engineer-speak for "everything is on fire").

The fallout? A mandatory all-hands, a 90-day safety reset across 335 critical systems, and a new rule: even senior engineers now need two reviewers to sign off before deploying.

First time a major company publicly admitted AI-generated code caused a production disaster. That's... not great.

So what went wrong? Is AI the problem?

The problem isn't the AI itself. It's the gap between how fast AI writes code and how fast humans can review it.

Imagine your factory tripled its production speed overnight, but you kept the same three quality inspectors on the floor. Stuff flies off the line, nobody can keep up, and eventually something broken ships to a customer. That's what's happening with AI code right now.

You used to write a function, sit with it, test it, submit a PR. Now you generate 15 functions in ten minutes and push them all. The bottleneck moved from writing to reviewing. Most teams never adjusted.

I see this in my own work. Claude Code generates clean-looking code fast. And that's exactly the problem: it looks correct. It passes a quick scan. It passes simple tests. But the bugs hide in the places you wouldn't think to check: edge cases, race conditions, assumptions the model made about your system that simply aren't true.

The kind of bug that sails through a 30-second PR review without anyone blinking.

And that's assuming someone actually reads the code. There's a growing crowd — the "vibe coders" — who skip that part entirely. Prompt, merge, ship. If it runs, it ships.

I get the appeal. AI makes it feel effortless. But every line of code you don't understand is a line you can't debug at 3 AM when it breaks. If you're merging AI output without reading it, you're rolling dice with your production. Don't do that!

What I actually do before merging AI code

My setup isn't fancy. I'm a solo dev. Here are four things that keep me out of trouble:

I plan with the AI before it writes anything

I don't just throw a prompt at Claude Code and cross my fingers. I start with a conversation: what are we building, what are the constraints, what could go wrong? We go back and forth for a couple of rounds until I'm confident the approach is solid. Only then does it write the implementation. Claude Code actually has a dedicated plan mode for this. You discuss the approach first, agree on a direction, and only then let it generate code.

This is where most of the quality comes from. By the time the code exists, we've already killed the bad ideas. Don't skip this step because "it's just a small change." Small changes with wrong assumptions cause the biggest fires.

I read every line like it came from a new hire

Not because the AI is bad. It's genuinely good. But it doesn't know about that weird edge case in my production data or the race condition I patched six months ago.

AI code is confident code. It never adds a comment saying "I'm not sure about this." Your brain sees clean formatting, pattern-matches it to "correct," and you scroll right past the bug. You'll actually want to review AI code faster because it looks cleaner than human code. Fight that instinct. Clean formatting can hide dirty logic.

I run it locally before it goes anywhere near a PR

Sounds obvious, right? But when AI generates code fast, the temptation is to push it fast too. I force myself to slow down. Click through the UI. Hit the endpoint. Watch the logs. "It compiled and tests pass" is not the same as "it works."

I make the AI write tests — then I audit them

I hate writing tests. So I let the AI do it, and it's genuinely good at this. It thinks through edge cases methodically, often covering scenarios I wouldn't have bothered with.

I keep a mental checklist of the tests that must exist for a given piece of code. I go through the generated suite and make sure everything I'd expect is there. If the AI missed something from my list, the implementation probably has the same blind spot.

Don't let the AI write tests and decide what to test. It tests what it built. It won't test the assumptions it got wrong. You bring the requirements. The AI brings the implementation.

None of this is rocket science. But doing all four consistently — planning, reading, running, testing — catches the stuff that a quick glance never will.

What's next?

AI-generated code isn't going away. It's getting better every month, and it makes us more productive. But more productive means more code shipping faster, and that means your review process matters more than ever.

AI writes the code and tests it. It can even review it now. But the judgment call on whether it's ready for production? That's still on you.

Amazon learned this with 6.3 million lost orders. You don't have to.

Take care!

Hey, if you've found this useful, please share the post to help other folks find it:

There's even more:

Subscribe for updates