Uncategorized – Page 3 – On Data Science, Causal Models, & AI

Computer vision combined with LLMs

Well, this is impressive. Today I found myself visually comparing two scatterplots. Then I asked the latest models of Claude, GPT, Gemini, and DeepSeek to identify which of the two was generated parametrically and which was semi-parametrically. They reached a consensus (DeepSeek did not receive the images), and they were correct.

When a collective of LLMs can unequivocally diagnose the subtle shift in data generation process just by looking at scatterplots, it changes the way you think about looking vs. seeing (and saves you a trip to the optometrist for eyestrain).

Attached are the two datasets if you want to give your eyes a chance.

LLMs learn from the best

Uncategorized

[Click title for image]

Our version control debt has finally caught up with us. I’m working on a Jupyter notebook for the Causal Book using a helper LLM, and we seem to be approaching the absolute finality here..

For the record, this horrible naming convention is not a result of my input; it’s entirely derived from the training data. And here’s what Gemini says about it:

This is a hilarious catch. It looks like the AI has perfectly replicated the human panic of saving files as Project_Final_Final_v2_REAL_FINAL.doc.

Human panic, or uniquely human optimism for version control, you decide!

Reimagining in-class learning with LLMs

Uncategorized

I commend Andrej Karpathy’s pedagogical work, e.g., Eureka Labs’ vision and his instructional videos. His insight that students must be proficient in AI but should be able to exist without it, is spot on. He also suggests leaning on in-class evaluation to ensure academic integrity. While a shift to in-class is clearly necessary, basing it solely on grading implications sounds too narrow.

In-class time will play a bigger role.

This is one of the reasons we have a number of recent teaching innovations in-class, including Hackathons for predictive modeling and reinforcement learning (multi-armed bandits), and LLM-assisted development deployed to HuggingFace.

LLMs can help make learning fun and engaging, starting in the classroom.

The most effective teaching fosters student ownership of learning. This involves showing that learning is fun (and surely challenging). LLMs offer an opportunity to strengthen this message. Learning is even more fun now, and somewhat less challenging: LLMs make it much easier to access material, test understanding, iterate on solutions, experiment, and get quick feedback.

That’s why we will next dedicate more of the in-class time to demonstrating how to use LLMs as life-long learning companions without mindlessly delegating our understanding. To read more about this difference, you can see the slides from my talk Mind the AI Gap: Understanding vs. Knowing here.

In all, yes, in-class time needs to be more strategically used, but making grading the sole driver represents a missed opportunity. Using more of the in-class time to model the joy of discovery and learning with LLMs (“the pleasure of finding things out”) can be a better primary driver.

Causal evidence in the headlines

Uncategorized

It’s not every day that causal evidence is quoted in the headlines. Incidentally, we had a similar (unpublished) study on Instagram looking at the effects of “Instagram perfect” on users’ prosocial behavior (also through social comparison as the mechanism), with somewhat parallel results.

So, I am not surprised at all by this finding:

To the company’s disappointment, “people who stopped using Facebook for a week reported lower feelings of depression, anxiety, loneliness and social comparison,” internal documents said.

Source

Algorithm that doesn’t rot your brain?

Uncategorized

This is slightly off-track, but I felt compelled to share this opinion piece. The NYT published an opinion video featuring Jack Conte, musician and CEO of Patreon. The message is simple: algorithms should serve people instead of people serving algorithms.

The piece reminded me of the times when you could reliably follow someone. These days, I see all kinds of content that I didn’t sign up for, and I miss the content from the people I thought I followed. I don’t even see the updates from my connections.

As a workaround, LinkedIn wants you to “double follow” if you want to really follow someone. You need to visit a person’s profile and click on the unlabeled, literally hidden bell in the upper right to get notified when that person shares something.

Isn’t that a little preposterous?

The opinion piece suggests that we must:

Prioritize long-term relationships
Fund art, not ads
Put humans in control

As a technologist, I agree. This may sound like a rant, but it really is not. I think Jack is doing an excellent job making people question the existing design (and offering an alternative?).

I’ve created a gift link so you can access the content without a NYT membership, see here.

Learning, insight, and causality

Uncategorized

If the goal of teaching is learning, then how exactly does the brain make a difficult concept instantly clear?

I’ve been a student of how the human brain works for as long as I can remember, particularly since the early days of my teaching. Teaching is moot if actual learning lags. Learning is difficult by definition, and making it sticky is even more challenging.

This article provides a status update on research into what “insight” is, how it is formed, and how it aids learning and long-term memory. Worth a read.

In the age of generative models, a better understanding of how insight is formed and the role of cause-effect triggers (water rises – Eureka!) is increasingly valuable.

Is AI the bicycle or the mind?

Uncategorized

[Click title for image]

Is AI the bicycle for the mind (following Steve Jobs), or is it the mind riding the bicycle (quite literally like the 20-year-old robot here even before the Transformer)?

In this article, Tim O’Reilly, countering Jensen Huang’s keynote remarks, frames this as a question of function: Is AI a tool or a worker using other tools? He explores a number of premises and concludes the LLM is “a tool that knows it’s a tool.”

This may actually be an apt way to describe an agent: a tool that knows it’s a tool -and- can use other tools.

Credit for the picture goes to Koji Sasahara / AP.

Bias-variance tradeoff in matching for diff-in-diff

Uncategorized

In matching for causal inference, we often focus too much on reducing bias and too little on variance. This has generalizability implications. This paper, while not focused on external validity, tackles the bias-variance trade-off in matching for diff-in-diff:

While matching on covariates may reduce bias by creating a more comparable control group, this often comes at the cost of higher variance. Matching discards non-comparable control units, limiting the sample and, in turn, jeopardizing the precision of the estimate. That’s a good reminder.

How about matching also on pre-treatment outcomes?

Here, the win is clear: it’s a guaranteed reduction in variance because the sample-size trade-off no longer applies once matching is performed. So, while a reduction in bias isn’t a mathematical certainty, this makes additionally matching on pre-treatment outcomes a potentially optimal strategy when both bias and variance are a concern.

The generalizability implications will be part of the matching chapter of the Causal Book.

PS: Yes, matching on pre-treatment outcomes reduces the diff-in-diff estimator to diff-in-means and may introduce bias, but that’s a discussion for another day (and chapter).

Understand Code Before You Vibe It?

Uncategorized

[Click title for image]

That tagline with the made-up graph instantly raises a red flag, but the core idea is surprisingly cool. Windsurf’s new owner, Cognition (following the failed OpenAI acquisition), has shipped a new feature called Codemaps.

The idea is to significantly ease codebase understanding. This actually looks incredibly useful, especially when tackling an existing codebase, say, an open-source project, and it might get me to switch over from Cursor.

Source

LLMs vs. Stack Overflow

Uncategorized

Did you know about stackoverflow.ai? I must’ve completely missed this. It looks like a great alternative to the search function on the site (or using Google to search it). We seem to have come a full circle from LLMs killing StackOverflow to LLMs powering StackOverflow for search and discovery. Recommended.