One way is hitting that accept button until the code works. This is like gambling: it works until it doesn’t, and you never know why in either case.
Another way is intentional use. Intentional use is:
- Telling LLMs exactly what to do.
- Inspecting their code, line by line, before accepting.
- Unit testing the solution before it ever sees production.
This means treating LLMs as over-confident, lightning-fast assistants. They are incredibly helpful in boosting productivity, quickly looking things up and delivering operational code. They can search a 100-page API documentation and find a solution to your problem in seconds.
Unless we see a structural breakthrough in how language models are built, this is also the best we can have: an over-confident assistant. LLMs don’t think or reason; they achieve (shallow) deductive closure at best. While the debate over whether LLMs “think” can be unproductive, there is a practical implication: LLMs make profoundly inhuman coding mistakes.
The errors LLMs make aren’t the errors a human assistant would make, so working with LLMs takes another perspective shift. Understanding this distinction is key to effectively using them: our coding assistant is deeply inhuman.
Otherwise, LLM-driven coding will inevitably lead to more failures in data science. Expect to hear more stories about models breaking unexpectedly.
This post was inspired by the write-up “Here’s how I use LLMs to help me write code,” which nails many other crucial points. It’s worth checking out.