Mind the AI Gap: 2025 Update
LLMs now look much more like they are reasoning than they did just a year ago. In his 2025 review, Andrej Karpathy explains why:
In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage to add to this mix. By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like “reasoning” to humans – they learn to break down problem solving into intermediate calculations and they learn a number of problem solving strategies for going back and forth to figure things out (see DeepSeek R1 paper for examples). These strategies would have been very difficult to achieve in the previous paradigms because it’s not clear what the optimal reasoning traces and recoveries look like for the LLM – it has to find what works for it, via the optimization against rewards.
Why should we care if they reason or not, as long as we get things done?
Reflecting my pedagogical approach, I’d argue that understanding a tool’s capabilities is key to getting the most out of it. Understanding what Excel is capable of means we wouldn’t ask it for life advice (even with Copilot integrated now, we’d better resist that urge).
I have been discussing this at a higher level in my talk, “Mind the AI Gap,” since 2024. I’ve most recently delivered this talk as a guest speaker for an executive cohort, and our discussion prompted updates to the deck. So I owe this update to 2025.
The updated deck is here, with both practical advice and high-level insights.
The key takeaway remains: Steve Jobs’ “bicycle for the mind” analogy is holding up. Today’s LLMs aren’t the mind itself; they’re still the tools that make the mind’s efforts go further. Much further. And we have yet to realize their full potential.
Karpathy concludes and I concur:
LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected. In any case they are extremely useful and I don’t think the industry has realized anywhere near 10% of their potential even at present capability.








