The Edge Case


Ask an AI model to write a REST endpoint, explain a sorting algorithm, or draft a schema for a standard use case. It does it well. Cleanly, confidently, correctly.

That’s not a coincidence. It’s how these models work. They’re trained to predict the most probable answer given what came before. On common problems, the most probable answer is usually the right one.

The demos look great. The benchmarks look great. The productivity numbers look great.

And they should. On the average case, the models are genuinely impressive.

The problem is that software doesn’t fail on the average case.

The bugs that matter don’t show up in demos. They show up when two events collide in a 200ms window nobody thought to test. When a specific combination of user permissions hits a code path that works fine in isolation. When load patterns on a Tuesday afternoon expose a race condition that’s been sitting quietly in production for eight months.

Experienced engineers carry those cases in their heads. Not because they read about them, because they lived them. The 2am incident that traced back to a schema decision made six months earlier. The cascading failure that turned out to be three individually reasonable choices that were catastrophic in combination. That kind of knowledge doesn’t come from documentation. It comes from cost.

The models carry some of it too. Because for decades, developers wrote those cases down. What got documented, obsessively, was the weird stuff. The edge case that took three days to find. The failure mode nobody had seen before. Nobody posts about the deployment that went smoothly. What got written down was the things that hurt. That pain became training data. The models absorbed the tail not because anyone planned it, but because developers couldn’t stop writing about what broke them.

That snapshot is not being updated.

In 2024 a paper published in Nature proved something that should have made more noise than it did. When models are trained on data generated by other models, they don’t just plateau. They degrade. Specifically, they start forgetting the edges. The rare cases, the complex cases, the nuanced situations that sit in the tail of the distribution. The average stays intact. The model remains convincing. But its grip on the hard stuff quietly loosens.

The researchers called it model collapse. It’s not a risk or a warning. It’s a mathematically proven outcome of recursive training. A 2026 follow-up formalized it further: if the amount of fresh human-generated data shrinks toward zero, degradation isn’t a possibility, it’s guaranteed.

The edge cases stop being documented at exactly the moment the models start forgetting them. Two problems, one collision.

Under the current architecture, the math is unambiguous. The degradation happens at the tail, slowly, in the cases that are rare enough that you don’t notice until you really need them.

That’s the same edge case that experienced engineers carry in their heads. Built from years of being wrong in production, in ways that were painful enough to remember. It doesn’t show up in a prototype, a benchmark, or a 20-minute demo.

We’re not approaching a singularity. We’re approaching a plateau that looks like progress until the edge cases arrive.

And when they do, the question won’t be whether the model can handle them. It’ll be whether anyone in the room still can.


References

Shumailov, I., et al. (2024). AI models collapse when trained on recursively generated data. Nature, 631, 755–759.

Zenil, H. (2026). On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis