Algorithm that doesn’t rot your brain?

This is slightly off-track, but I felt compelled to share this opinion piece. The NYT published an opinion video featuring Jack Conte, musician and CEO of Patreon. The message is simple: algorithms should serve people instead of people serving algorithms.

The piece reminded me of the times when you could reliably follow someone. These days, I see all kinds of content that I didn’t sign up for, and I miss the content from the people I thought I followed. I don’t even see the updates from my connections.

As a workaround, LinkedIn wants you to “double follow” if you want to really follow someone. You need to visit a person’s profile and click on the unlabeled, literally hidden bell in the upper right to get notified when that person shares something.

Isn’t that a little preposterous?

The opinion piece suggests that we must:

  1. Prioritize long-term relationships
  2. Fund art, not ads
  3. Put humans in control

As a technologist, I agree. This may sound like a rant, but it really is not. I think Jack is doing an excellent job making people question the existing design (and offering an alternative?).

I’ve created a gift link so you can access the content without a NYT membership, see here.

Learning, insight, and causality

If the goal of teaching is learning, then how exactly does the brain make a difficult concept instantly clear?

I’ve been a student of how the human brain works for as long as I can remember, particularly since the early days of my teaching. Teaching is moot if actual learning lags. Learning is difficult by definition, and making it sticky is even more challenging.

This article provides a status update on research into what “insight” is, how it is formed, and how it aids learning and long-term memory. Worth a read.

In the age of generative models, a better understanding of how insight is formed and the role of cause-effect triggers (water rises – Eureka!) is increasingly valuable.

Is AI the bicycle or the mind?

[Click title for image]

Is AI the bicycle for the mind (following Steve Jobs), or is it the mind riding the bicycle (quite literally like the 20-year-old robot here even before the Transformer)?

In this article, Tim O’Reilly, countering Jensen Huang’s keynote remarks, frames this as a question of function: Is AI a tool or a worker using other tools? He explores a number of premises and concludes the LLM is “a tool that knows it’s a tool.”

This may actually be an apt way to describe an agent: a tool that knows it’s a tool -and- can use other tools.

Credit for the picture goes to Koji Sasahara / AP.

Bias-variance tradeoff in matching for diff-in-diff

In matching for causal inference, we often focus too much on reducing bias and too little on variance. This has generalizability implications. This paper, while not focused on external validity, tackles the bias-variance trade-off in matching for diff-in-diff:

While matching on covariates may reduce bias by creating a more comparable control group, this often comes at the cost of higher variance. Matching discards non-comparable control units, limiting the sample and, in turn, jeopardizing the precision of the estimate. That’s a good reminder.

How about matching also on pre-treatment outcomes?

Here, the win is clear: it’s a guaranteed reduction in variance because the sample-size trade-off no longer applies once matching is performed. So, while a reduction in bias isn’t a mathematical certainty, this makes additionally matching on pre-treatment outcomes a potentially optimal strategy when both bias and variance are a concern.

The generalizability implications will be part of the matching chapter of the Causal Book.

PS: Yes, matching on pre-treatment outcomes reduces the diff-in-diff estimator to diff-in-means and may introduce bias, but that’s a discussion for another day (and chapter).

Understand Code Before You Vibe It?

[Click title for image]

That tagline with the made-up graph instantly raises a red flag, but the core idea is surprisingly cool. Windsurf’s new owner, Cognition (following the failed OpenAI acquisition), has shipped a new feature called Codemaps.

The idea is to significantly ease codebase understanding. This actually looks incredibly useful, especially when tackling an existing codebase, say, an open-source project, and it might get me to switch over from Cursor.

Source

LLMs vs. Stack Overflow

Did you know about stackoverflow.ai? I must’ve completely missed this. It looks like a great alternative to the search function on the site (or using Google to search it). We seem to have come a full circle from LLMs killing StackOverflow to LLMs powering StackOverflow for search and discovery. Recommended.

Back to Causal Book: Regression Discontinuity

The intro sections and DAGs for the RD chapter are in. More to come.

I’m looking for interesting datasets for the RD design. I have some candidates, but I’m eager to find more compelling, real data. Ideally, I’d like a business case (rather than policy), such as one on customer loyalty status. The IV chapter already uses policy data (tax on cigarette prices vs. smoking). Please comment with a link if you have ideas beyond the Kaggle datasets.

As a reminder, Causal Book is an accessible, interactive resource for the data science and causal inference audience. It is not meant to substitute for the excellent texts already available, such as The Effect by Nick Huntington-Klein and The Mixtape by Scott Cunningham. This book aims to complement them by focusing on the idea of solution patterns, with code in R and Python, exploring different approaches (Freq. Statistics, Machine Learning, and Bayesian), and clarifying some of the counterintuitive (or seemingly surprising) challenges faced in practice.

Causal Book

Is college old school now?

This is interesting: Palantir has launched a “Meritocracy Fellowship” to hire high-achieving high school graduates right out of school, offering a paid internship with a chance at full-time employment. The company presents this as an alternative to college.

This is a very limited, transactional view of college. College is more than just training for employment; it is where students gain knowledge and broaden their horizons, learning how to think and ask questions, in addition to acquiring practical skills. I doubt a four-week history seminar will make up for all that.

Source

Neo the household robot

I’m terribly sad to learn that the first consumer humanoid robot marketed to load the dishwasher (finally!) is essentially a proxy operated remotely by a human (oh no). The automation it offers is akin to hiring a teenager to mow your lawn remotely, yet it introduces privacy and latency nightmares.

How much longer must we keep loading the dishes? Until Nvidia’s valuation hits $10 trillion? Let’s buy more stonks. I’m losing my patience here.

Source – Neo Order Page

Is our society increasingly rewarding conformity?

Is our society increasingly rewarding conformity? Is AI accelerating this process, and is it simultaneously stripping work of deep meaning? After all, thanks to LLMs, many now define creativity as merely a probabilistic recombination of matrices derived from a training set.

Is science also contributing to this potential lack of deviation in culture, education, arts, architecture, and business? Here’s a take on it (not mine):

You can spot this scientific bland-ification right away when you read older scientific writing. As Roger’s Bacon points out, scientific papers used to have style. Now they all sound the same, and they’re all boring.

Maybe science is supposed to be boring? What happened to the style though?

For definitive answers to these questions, look elsewhere. For a compelling set of data (plus plenty of causal speculation) on many aspects of contemporary society and scientific style, check out this compilation and the essay here.

The greatest thinkers in science (and business) are often prolific authors. They write books, blogs, and copious emails to sharpen ideas. Richard Lewontin, E.O. Wilson, and Paul Graham are but three examples. Dorothy Hodgkin’s scientific correspondence and papers, stacked together, extend 25.85 meters in length. Great thinkers, in other words, write all the time.

Researchers are evaluated by simple measures of productivity or influence — number of papers published, citation count, and grant dollars. In such an environment, it has become exceedingly difficult for scientists to take stylistic risks in their academic writing or to devote significant amounts of time to other forms of creative writing.

When is TSLS Actually LATE?

I first came across this paper while writing the Machine Learning Using IV chapter of the Causal Book. Revisiting it today, I remain struck by its central finding: about 95% of the empirical TSLS (Two-Stage Least Squares) models surveyed here claim to estimate the Local Average Treatment Effect (LATE), but they fail to meet the necessary conditions to do so.

The failure is mainly due to not controlling for covariates nonparametrically. That is to say, in attempting to correct for selection bias (endogeneity) using IVs, causal modelers inadvertently introduce significant specification bias, thereby theoretically nullifying the LATE interpretation.

On a different note, I’ve resumed work on Causal Book. Updates are on the way!

Is AI also innocent until proven guilty?

Today, my feed is full of speculation linking the recent AWS layoffs, driven by increased AI automation, to yesterday’s outage. In reality, we don’t really know if AI caused any of it.

What do we know? I read two articles this morning, and one thing that struck me is that AWS was reportedly not able to diagnose the core issue for 75–90 mins. That’s an absurdly long time.

If this timeline is accurate, the extended delay is compelling evidence that critical expertise was either absent or inaccessible when it was most needed, for whatever reason.

Credit for the image goes to Emil Lendof/WSJ.

Source 1Source 2

Update on using LLMs for OCR

Here’s an update on using LLMs for OCR without having to use the same hammer (generic model) for all nails. DeepSeek has released an OCR-focused model: https://github.com/deepseek-ai/DeepSeek-OCR

Check out the deep parsing mode, which is parsing images within documents through secondary model calls. Very useful for data extraction. The results are pretty impressive too:

Our work represents an initial exploration into the boundaries of vision-text compression, investigating how many vision tokens are required to decode 𝑁 text tokens. The preliminary results are encouraging: DeepSeek-OCR achieves near-lossless OCR compression at approximately 10× ratios, while 20× compression still retains 60% accuracy. These findings suggest promising directions for future applications, such as implementing optical processing for dialogue histories beyond 𝑘 rounds in multi-turn conversations to achieve 10× compression efficiency.

Education, AI, and standards

The data on education call for attention:

– 33% of eighth graders are reading at a level that is “below basic”—meaning that they struggle to follow the order of events in a passage or to even summarize its main idea.

– 40% of fourth graders are below basic in reading, the highest share since 2000.

– In 2024, the average score on the ACT, a popular college-admissions standardized test that is graded on a scale of 1 to 36, was 19.4—the worst average performance since the test was redesigned in 1990.

The article speculates on several causal links to explain the declining trend in the metrics, ranging from the effects of COVID to the influence of smartphones and social media.

The point that truly resonates with me as an educator, though, is this: a pervasive refusal to hold children to high standards. Standards are about values, not technology or tools. No tool causes the fading emphasis on rigor.

The article discusses other important aspects, such as the disparity between school districts, the heterogeneity in outcomes based on affluence, and the potential role of AI as a democratizer, but keeps returning to the same line: declining standards and low expectations. And that’s for a good reason:

“Roughly 40 percent of middle-school teachers work in schools where there are no late penalties for coursework, no zeroes for missing coursework, and unlimited redos of tests.”

This is potentially the most important problem facing our society today, and it warrants far more attention.

Source

Generative AI for business

It didn’t take long for Tom Fishburne’s cartoon to come true: Generative AI is increasingly treated as a magic trick. Most tricks aren’t really useful or truly “real,” yet they remain entertaining until we figure out, often the hard way, how the illusion is performed.

My conversations show that some executives are cautiously optimistic, integrating only what is truly useful into their business processes and project workstreams, while others are applauding the tricks and wanting more. The former group is already AI-literate, using machine learning and algorithms to augment and automate processes in the broader, more accurate definition of AI. The latter, more easily impressed group, seems to lack this foundation.

We can expect that more of the applauding executives will eventually join the cautiously optimistic ones once the magic show ends and we move past the peak of the hype cycle.

A history of LLMs

This is an (almost) complete picture of LLMs, highlighting the underlying math for probability calculations. It’s a great companion for the 64-picture visual summary I curated for a talk earlier this year.

Good reminder that “Life’s most important questions are, for the most part, nothing but probability problems.” Pierre-Simon Laplace in Théorie Analytique Des Probabilités

Using LLMs for text extraction

I once quoted Abraham Maslow in my note “Using predictive modeling as a hammer when the nail needs more thinking“, a problem that has since eased as focus shifts to causal modeling and optimization.

Now, if the problem is extracting handwritten digits from a PDF or an image, what’s the solution? I guess good old reliable OCR? There is nothing generative about extraction, so using an LLM may be moot, if not actively detrimental (due to hallucinations). OCR is widely available through cloud like AWS Textract and Google Cloud Vision, some of which also offer human validation.

Instead, the Reuters team intriguingly chose the shiniest hammer, an LLM, to extract data from handwritten prison temperature logs. They uploaded 20,326 pages of logs to Gemini 2.5 Pro, followed by manual cleaning and merging. One big problem with this approach is the inevitable made-up text (hallucinations), which required the team to hand-code 384 logs.

So why not OCR but an LLM? All I can think of is, LLMs may be useful in extraction when the text is highly unstructured and contextual understanding is needed, neither of which seems to apply here. Surprisingly, the methodological note doesn’t even mention OCR as a solution.

Putting the tool choice aside, though, Reuters asks an important question here: “How hot does it get inside prisons?” and the answer is “Very.” I applaud the effort and data-centric journalism, and I recommend reading the story.

Credit for the image goes to Adolfo Arranz.

SourceProject GitHub

New Data Duets post: A look back to look forward

How do you get to new ideas when data is always looking back?

In this latest Data Duets post, we discuss a case from United Airlines and share five key lessons from an interview with Patrick Quayle, Senior VP of Global Network Planning.

The post explores how to go beyond historical data by using:

  1. Transfer learning and clustering
  2. Data triangulation (Spoiler: HBO’s White Lotus informs a business strategy here)
  3. More frequent experimentation
  4. Real-time falsification of new ideas
  5. Combining data science with the art of creativity

Our Director’s Cut expands the discussion to offer insights for retail merchandising.

Enjoy the read and feel free to leave your comments on LinkedIn here.

You’re absolutely right!

[Click title for image]

This is so hilarious I had to share. A major issue with using LLMs is their overly obsequious behavior. They aren’t much help when I’m right; I don’t want to be right, I want to be corrected.

This project uses a Python script to count how often Claude Code says you’re “absolutely right.” The script doesn’t seem to normalize the counts by usage, which might be a good next step.

SourceScript

Student learning with LLMs

In January, I wrote a short note based on one of my talks: “How to use LLMs for learning in 2025.” In that note, I differentiated between using LLMs (1) to learn and (2) to do. With the new semester now underway, I’ve checked some usage numbers and read the Ammari et al. (2025) paper on how students use ChatGPT. I was particularly interested in the second RQ: “Which usage patterns correlate with continued or increased reliance on ChatGPT over time?”

An over-reliance on any tool, regardless of what it is, is a potential red flag for persistent learning, especially when the goal is comprehension. For example, understanding derivatives and calculating them using a computer are two distinct learning objectives. If the reliance on a tool substitutes for understanding, long-term implications may not be a net positive.

The article does not really answer the reliance part of the question. It does, however, report some interesting correlations between LLM behavior and student engagement. Notably, when ChatGPT asks for clarifications, provides unintended or inconsistent answers, or communicates its limitations, students are less likely to continue using it.

Plausible, but what these correlations mean for learning and comprehension is unclear. What is the next step after disengagement? Do they switch to another LLM to get a direct answer without having to answer follow-up questions, or do they go back to figuring it out on their own?

Class of 2029, I guess the answer lies with you. Welcome!

SourcePaper