The Demo-to-Product Gap

Recently listened to this excellent interview between Andrej Karpathy and Dwarkesh Patel. Andrej's observation about the demo to product gap being related to the impact of errors stuck out to me. Karpathy calls it the "march of nines." When something works 90% of the time, you've captured the first nine. But you need the second nine to hit 99%, then the third to reach 99.9%, and so on. Each nine takes the same brutal amount of work.

For a product, the real variable isn't capability. It's the cost of errors. Self-driving mistakes kill people. That's why the gap between demo and deployment stretches across decades.

I see this in finance. Equity research is a high recall task so it has high error tolerance. Hence, the barrier from prototype to production stays low. You can iterate messily, ship things that are not 100%, refine as you go. The stakes accommodate speed. But trading systems are a different universe. Errors can result in PnL impact and so the demo-to-product gap widens exponentially with the stakes.

This changes how you should evaluate AI tools. Don't ask "can it do my job?" Ask: "what's my error budget?" If you're writing analytical scripts, mistakes are cheap—ship the 90% solution. If you're writing code that handles money or private data, you're signing up for the march of nines. The demo looks the same either way, but the timeline doesn't.