New chapter in Causal Book: Oh my! Matching can make DID worse

Just published the latest Causal Book chapter for data scientists.

Matching is controversial in diff-in-diff use cases, and for good reason. In this chapter, we review two studies to delve into the details and understand the optimal decision, which is arguably not to match the treatment and control groups at all.

Causal Book is an interactive resource for applying modern methods and tools to causal inference. It follows a nonlinear path, unlike a traditional book. Because of this unique structure, the new chapter doesn’t have data just yet (as the preceding chapters are still in progress), so expect an update down the road.

Link to the New Chapter

Cal State Going All in on AI

So the California State University system was the largest single-institution deployment of ChatGPT in the world. Last year, it paid OpenAI $16.9 million. That’s quite a pivot in resource allocation. This is an interesting story with rich anecdotes to learn from.

At San Jose State — the oldest public university in the California State University system — evidence of the shift toward A.I. is evident across campus. The university now has an A.I. librarian, and its main library features a new A.I. Center for Civic and Social Good. The business school runs an A.I. boot camp for high school students; the campus career hub is sponsored by Adobe; A.I. literacy training is an orientation requirement and, last year, an A.I. agent helped coordinate commencement logistics.

Will these graduates be ahead of the curve in the new A.I. economy, or robbed of a chance to hone their critical thinking skills?

Source: NYT Magazine (gift link)

Token maxxing

Image 1 shows an AI management problem. Image 2 is a solution: token maxxing.

When employees are judged purely by a number, they will optimize for the number, even if it burns the business down around them. On the other hand, businesses often measure only what’s easy to put on a spreadsheet and assume that everything else follows suit (or doesn’t matter).

This contradiction looks like a perfect example of a perverse incentive combined with the streetlight effect.

Sources: Reddit threadTokenBurn on GitHub

Papers with Code is back

Remember Papers with Code?

Meta acquired it years ago and shut it down last year. Now, the open-source team at Hugging Face seems to have brought it back. While it’s not exactly the same (agents have fully taken over the curation), it still looks like a useful repository for tracking the latest AI work.

New project link: https://paperswithcode.co

Can discovery survive the mean?

We often hear “Claude does this” or “Claude does that,” but how does it perform on an advanced data science task?

This study builds on prior work on variation in causal inference (research I contributed to) and asks Opus 4.5 to replicate the human analysis. Claude performs the same tasks following the same instructions given to human researchers. The tasks follow increasing levels of constraint:

Task 1: Maximum freedom.

Task 2: Data cleaning held constant.

Task 3: Both data cleaning and methodology held constant.

As constraints increase across the tasks, the variation in Claude’s results decreases. Across all stages, Claude consistently shows less dispersion than the researchers. These findings align with the nature of LLMs: they converge toward the mean.

This makes me think: such convergence may be useful for replication and robustness checks, but discoveries often originate in the tails of human variation. How, then, do we keep human variation in the loop?

The answer will depend on the task. Our work in Augmented Data Science, for example, focuses on data science workflows, and we recently posted a method selection agent that aims to retain data scientist variation.

Sources: Original StudyClaude ReplicationReplication Project Website

New chapters in the Causal Book: RD and DoubleML

In OTA loyalty programs, customers typically earn “Platinum” status once their spending exceeds a threshold (e.g., Expedia’s One Key or Booking.com’s Genius). Platinum comes with perks to incentivize higher spend.

Regression Discontinuity is an ideal framework to estimate the causal effect: Do customers actually spend more because of the Platinum status, and how much?

In newly published chapters of the Causal Book, we simulate data using the Synthetic Data Vault to estimate the incremental spend driven by Platinum status. While a naïve comparison suggests a $2,100 increase in spend, the RD estimate via rdrobust yields $1,567 at the threshold, remarkably close to the ground truth of $1,500.

We also explore bandwidth selection, highlighting the bias-variance tradeoff when choosing between MSE-optimal and CER-optimal bandwidths.

Next is the role of DoubleML. In a clean, sharp RD design like this, rdrobust already uses a local polynomial fit around the threshold. But do we gain precision from flexible covariate adjustment? We find that DoubleML does worse (see Oh my! DoubleML is worse for the RD design). We then run a diagnostic to understand why.

Anthropic’s advisor strategy

Anthropic’s new Advisor Strategy framework is very promising.

The first image shows how users can now pair a more capable model (like Opus) as a strategic “Advisor” with a faster, cheaper model (like Sonnet or Haiku) as the “Executor” in a single Claude API call. This mirrors the actor-critic logic: the Executor (actor) invokes the Advisor (critic) as needed, with both operating on a shared context to optimize performance and cost.

The second image is a Nano Banana illustration for my year-end reflection on LLM interfaces, where I discussed using IDE tools to bridge different model families from multiple providers and benefit from complementary error patterns, a modular take on the same actor-critic setup.

In a modeling workflow, for example, the actor codes for data cleaning, while the critic serves as the reviewer to ensure variables are correctly coded and align with methodological assumptions. Anthropic’s framework automates this workflow within a single provider.

Source

Anthropic issues takedown requests

What a shame to use copyrighted material without permission…

Anthropic accidentally exposed the underlying instructions for its Claude Code AI app, prompting over 8,000 takedown requests.

The leak revealed Anthropic’s proprietary techniques and tools, providing competitors a roadmap to clone Claude Code’s features.

The exposure, caused by human error, was not a security breach but creates risks for hackers to exploit Claude Code software.

Summary by the WSJ. Gift link to the article

How to delete a Claude Cowork session

[Click title for image]

Turns out Claude Cowork sessions can’t be easily deleted.

I was testing Claude Cowork in the app, and after two sessions, I noticed there is no way to delete the sessions (only an “Archive” button). This appears to be a known open issue, as discussed here.

In Settings, Anthropic actually asks you to send them an email to delete sessions. Ha!

Anyway, after some digging and a few greps, I had to delete my test sessions manually. For the privacy-conscious like me, here’s a two-step solution (Close the Claude App first):

  1. Delete the project files in your global .claude/projects directory (if one exists). The folder names here are “friendly” and easy to find.
  2. Delete the related log file and folder in: C:/Users/<username>/AppData/Roaming/Claude/local-agent-mode-sessions (Windows)

The second step is less friendly because the file and folder names are GUIDs (36-character strings). Unless you can grep for the content from the session, look for folders and JSON files that begin with “local_“. For each Claude Cowork session, there is a folder and a matching file. You can delete everything in local-agent-mode-sessions to delete all sessions. Just restart your computer if you receive a VM error when you open the Claude App next time.

Stop Sloppypasta!

Good reminder and possibly the call of the year: Stop Sloppypasta.

2026 may be the year we finally feel the peak of the “dead internet,” about ten years after the term was coined.

One potential upside is that we might diversify our engagement away from social media as it becomes increasingly bogged down by bot- and AI-generated (and human-copied) content. So, the “death” of the internet might actually be the birth of a more intentional way of connecting. Let’s see if and where the sloppypasta stops.

Source 1 (Stop Sloppypasta) – Source 2 (Dead Internet theory) – Source 3 (Human vs. LLM-generated content)