Generative AI for business

It didn’t take long for Tom Fishburne’s cartoon to come true: Generative AI is increasingly treated as a magic trick. Most tricks aren’t really useful or truly “real,” yet they remain entertaining until we figure out, often the hard way, how the illusion is performed.

My conversations show that some executives are cautiously optimistic, integrating only what is truly useful into their business processes and project workstreams, while others are applauding the tricks and wanting more. The former group is already AI-literate, using machine learning and algorithms to augment and automate processes in the broader, more accurate definition of AI. The latter, more easily impressed group, seems to lack this foundation.

We can expect that more of the applauding executives will eventually join the cautiously optimistic ones once the magic show ends and we move past the peak of the hype cycle.

A history of LLMs

This is an (almost) complete picture of LLMs, highlighting the underlying math for probability calculations. It’s a great companion for the 64-picture visual summary I curated for a talk earlier this year.

Good reminder that “Life’s most important questions are, for the most part, nothing but probability problems.” Pierre-Simon Laplace in Théorie Analytique Des Probabilités

Using LLMs for text extraction

I once quoted Abraham Maslow in my note “Using predictive modeling as a hammer when the nail needs more thinking“, a problem that has since eased as focus shifts to causal modeling and optimization.

Now, if the problem is extracting handwritten digits from a PDF or an image, what’s the solution? I guess good old reliable OCR? There is nothing generative about extraction, so using an LLM may be moot, if not actively detrimental (due to hallucinations). OCR is widely available through cloud like AWS Textract and Google Cloud Vision, some of which also offer human validation.

Instead, the Reuters team intriguingly chose the shiniest hammer, an LLM, to extract data from handwritten prison temperature logs. They uploaded 20,326 pages of logs to Gemini 2.5 Pro, followed by manual cleaning and merging. One big problem with this approach is the inevitable made-up text (hallucinations), which required the team to hand-code 384 logs.

So why not OCR but an LLM? All I can think of is, LLMs may be useful in extraction when the text is highly unstructured and contextual understanding is needed, neither of which seems to apply here. Surprisingly, the methodological note doesn’t even mention OCR as a solution.

Putting the tool choice aside, though, Reuters asks an important question here: “How hot does it get inside prisons?” and the answer is “Very.” I applaud the effort and data-centric journalism, and I recommend reading the story.

Credit for the image goes to Adolfo Arranz.

SourceProject GitHub

New Data Duets post: A look back to look forward

How do you get to new ideas when data is always looking back?

In this latest Data Duets post, we discuss a case from United Airlines and share five key lessons from an interview with Patrick Quayle, Senior VP of Global Network Planning.

The post explores how to go beyond historical data by using:

  1. Transfer learning and clustering
  2. Data triangulation (Spoiler: HBO’s White Lotus informs a business strategy here)
  3. More frequent experimentation
  4. Real-time falsification of new ideas
  5. Combining data science with the art of creativity

Our Director’s Cut expands the discussion to offer insights for retail merchandising.

Enjoy the read and feel free to leave your comments on LinkedIn here.

You’re absolutely right!

[Click title for image]

This is so hilarious I had to share. A major issue with using LLMs is their overly obsequious behavior. They aren’t much help when I’m right; I don’t want to be right, I want to be corrected.

This project uses a Python script to count how often Claude Code says you’re “absolutely right.” The script doesn’t seem to normalize the counts by usage, which might be a good next step.

SourceScript

Student learning with LLMs

In January, I wrote a short note based on one of my talks: “How to use LLMs for learning in 2025.” In that note, I differentiated between using LLMs (1) to learn and (2) to do. With the new semester now underway, I’ve checked some usage numbers and read the Ammari et al. (2025) paper on how students use ChatGPT. I was particularly interested in the second RQ: “Which usage patterns correlate with continued or increased reliance on ChatGPT over time?”

An over-reliance on any tool, regardless of what it is, is a potential red flag for persistent learning, especially when the goal is comprehension. For example, understanding derivatives and calculating them using a computer are two distinct learning objectives. If the reliance on a tool substitutes for understanding, long-term implications may not be a net positive.

The article does not really answer the reliance part of the question. It does, however, report some interesting correlations between LLM behavior and student engagement. Notably, when ChatGPT asks for clarifications, provides unintended or inconsistent answers, or communicates its limitations, students are less likely to continue using it.

Plausible, but what these correlations mean for learning and comprehension is unclear. What is the next step after disengagement? Do they switch to another LLM to get a direct answer without having to answer follow-up questions, or do they go back to figuring it out on their own?

Class of 2029, I guess the answer lies with you. Welcome!

SourcePaper

Using AI at work: Hype vs. reality

A recent New York Times story offers insights.

Using LLMs as a “second pair of eyes” or as a fallible assistant seems to work well. Automation also works effectively when the instructions are clear and the objectives are defined unambiguously. In both cases, human agency remains central.

Use case #15 in the article, “Review medical literature,” reminded me of a study I shared earlier (How do LLMs report scientific text?). The study showed that LLMs systematically exaggerate claims they found in the original text. The user in this case is a medical imaging scientist and is aware of the danger. When a tool isn’t foolproof, the user’s expertise and awareness make all the difference.

The high-demand use cases are quickly scaling into independent businesses with more standardized output, often with LLMs as the core wrapper. I suspect some are marketed as “magic,” and to resist that hype, users will need a combination of expertise and awareness.

AI in 64 pictures: A visual journey

If you’re a visual learner looking to deepen your understanding of AI and language models:

I’ve just made the deck from my recent talk, “AI in 64 Pictures,” available. It’s a visual journey through language processing: from word embeddings to RNNs, attention, and transformers.

Understanding AI models better helps us discover more use cases and navigate their limitations. And if you’re looking to dive deeper, you can follow the links in the slides.

AI agents and failing projects

Nearly half of all AI agent projects are set to fail (as Gartner predicts here). Why? Unclear business value, inadequate risk controls, and escalating costs.

As I see it, much of this is fueled by hype, leading to existing solutions being relabeled as “Agentic AI” without any rethinking of business processes.

Human creativity is missing in this picture. It’s this creative thinking that should move agent use beyond just automating or augmenting individual tasks with LLMs, leading instead to the redesign of business processes and a vision for how humans and AI can truly complement each other.

The risks and costs are more straightforward to resolve:

– Managers who are most excited about AI agents often do not fully understand the risks and limitations of LLMs. They should invest as much in understanding these models as they do in using them.

– The true cost of scaling proof-of-concept GenAI solutions is often underestimated. This is on selecting the right vendor. Gartner estimates only about 130 of the thousands of agentic AI vendors are real.

Everybody lies, but why?

Andrew Gelman’s latest “rant” is worth a read. Everybody lies, but why, even when data clearly refutes it?

It’s interesting to think a little bit more and understand why and how people lie, especially when it comes to scientists, medical doctors, and law enforcement officials. Spoiler is, the answer is not always money.