Gambling on AI in stock markets

Trading AI stocks is starting to look indistinguishable from online gambling. Like sports betting and prediction markets, it has the necessary ingredients:

  • Intermittent Reinforcement: The unpredictable dopamine hit of daily price swings.
  • Social Reinforcement: High-pressure FOMO when “everyone is doing something.”
  • Gamification: Onboarding high schoolers to apps with $14 sign-up bonuses.

[…] Na and her co-workers joke they should sell their underwear to buy more shares. “Even friends who have never touched stocks are getting into it,” Na said. “Everyone’s doing something.”

Accelerating demand for AI-related goods has spurred investors—in Taipei, taxi drivers trade stocks mid-ride—and boosted salaries.

More than 180,000 trading accounts for children 18 or younger were created in the first three months of the year at Toss Securities, a South Korean brokerage. The accounts require parent approval to open and allow children to trade on their own. A recent promotion offered to deposit $14 into new accounts opened by high-school students.

Let’s see how this one will end.

Source: WSJ (gift link)

New post at Data Duets: Data cleaning agent

We have released our newest skill for the Augmented Data Science framework: a data cleaning agent.

Why do we need a skill for data cleaning? Because data cleaning isn’t just an execution task, it requires explicit modeling decisions.

In this series, we are studying the best use of AI in data science. Along the way, we develop and test skills with the ultimate goal of combining them using an orchestrator agent. Our goal is not automation, but to define the roles: data scientists set the intent, target, and method, while LLM agents execute.

Check out the post to see how we tested this skill on a large Amazon purchase dataset for customer behavior modeling (and how it successfully avoided data leakage).

Link to the post

New chapter in Causal Book: Oh my! Matching can make DID worse

Just published the latest Causal Book chapter for data scientists.

Matching is controversial in diff-in-diff use cases, and for good reason. In this chapter, we review two studies to delve into the details and understand the optimal decision, which is arguably not to match the treatment and control groups at all.

Causal Book is an interactive resource for applying modern methods and tools to causal inference. It follows a nonlinear path, unlike a traditional book. Because of this unique structure, the new chapter doesn’t have data just yet (as the preceding chapters are still in progress), so expect an update down the road.

Link to the New Chapter

Cal State going all in on AI

So the California State University system was the largest single-institution deployment of ChatGPT in the world. Last year, it paid OpenAI $16.9 million. That’s quite a pivot in resource allocation. This is an interesting story with rich anecdotes to learn from.

At San Jose State — the oldest public university in the California State University system — evidence of the shift toward A.I. is evident across campus. The university now has an A.I. librarian, and its main library features a new A.I. Center for Civic and Social Good. The business school runs an A.I. boot camp for high school students; the campus career hub is sponsored by Adobe; A.I. literacy training is an orientation requirement and, last year, an A.I. agent helped coordinate commencement logistics.

Will these graduates be ahead of the curve in the new A.I. economy, or robbed of a chance to hone their critical thinking skills?

Source: NYT Magazine (gift link)

Token maxxing

Image 1 shows an AI management problem. Image 2 is a solution: token maxxing.

When employees are judged purely by a number, they will optimize for the number, even if it burns the business down around them. On the other hand, businesses often measure only what’s easy to put on a spreadsheet and assume that everything else follows suit (or doesn’t matter).

This contradiction looks like a perfect example of a perverse incentive combined with the streetlight effect.

Sources: Reddit threadTokenBurn on GitHub

Papers with Code is back

Remember Papers with Code?

Meta acquired it years ago and shut it down last year. Now, the open-source team at Hugging Face seems to have brought it back. While it’s not exactly the same (agents have fully taken over the curation), it still looks like a useful repository for tracking the latest AI work.

New project link: https://paperswithcode.co

Can discovery survive the mean?

We often hear “Claude does this” or “Claude does that,” but how does it perform on an advanced data science task?

This study builds on prior work on variation in causal inference (research I contributed to) and asks Opus 4.5 to replicate the human analysis. Claude performs the same tasks following the same instructions given to human researchers. The tasks follow increasing levels of constraint:

Task 1: Maximum freedom.

Task 2: Data cleaning held constant.

Task 3: Both data cleaning and methodology held constant.

As constraints increase across the tasks, the variation in Claude’s results decreases. Across all stages, Claude consistently shows less dispersion than the researchers. These findings align with the nature of LLMs: they converge toward the mean.

This makes me think: such convergence may be useful for replication and robustness checks, but discoveries often originate in the tails of human variation. How, then, do we keep human variation in the loop?

The answer will depend on the task. Our work in Augmented Data Science, for example, focuses on data science workflows, and we recently posted a method selection agent that aims to retain data scientist variation.

Sources: Original StudyClaude ReplicationReplication Project Website

New chapters in the Causal Book: RD and DoubleML

In OTA loyalty programs, customers typically earn “Platinum” status once their spending exceeds a threshold (e.g., Expedia’s One Key or Booking.com’s Genius). Platinum comes with perks to incentivize higher spend.

Regression Discontinuity is an ideal framework to estimate the causal effect: Do customers actually spend more because of the Platinum status, and how much?

In newly published chapters of the Causal Book, we simulate data using the Synthetic Data Vault to estimate the incremental spend driven by Platinum status. While a naïve comparison suggests a $2,100 increase in spend, the RD estimate via rdrobust yields $1,567 at the threshold, remarkably close to the ground truth of $1,500.

We also explore bandwidth selection, highlighting the bias-variance tradeoff when choosing between MSE-optimal and CER-optimal bandwidths.

Next is the role of DoubleML. In a clean, sharp RD design like this, rdrobust already uses a local polynomial fit around the threshold. But do we gain precision from flexible covariate adjustment? We find that DoubleML does worse (see Oh my! DoubleML is worse for the RD design). We then run a diagnostic to understand why.

Anthropic’s advisor strategy

Anthropic’s new Advisor Strategy framework is very promising.

The first image shows how users can now pair a more capable model (like Opus) as a strategic “Advisor” with a faster, cheaper model (like Sonnet or Haiku) as the “Executor” in a single Claude API call. This mirrors the actor-critic logic: the Executor (actor) invokes the Advisor (critic) as needed, with both operating on a shared context to optimize performance and cost.

The second image is a Nano Banana illustration for my year-end reflection on LLM interfaces, where I discussed using IDE tools to bridge different model families from multiple providers and benefit from complementary error patterns, a modular take on the same actor-critic setup.

In a modeling workflow, for example, the actor codes for data cleaning, while the critic serves as the reviewer to ensure variables are correctly coded and align with methodological assumptions. Anthropic’s framework automates this workflow within a single provider.

Source

Anthropic issues takedown requests

What a shame to use copyrighted material without permission…

Anthropic accidentally exposed the underlying instructions for its Claude Code AI app, prompting over 8,000 takedown requests.

The leak revealed Anthropic’s proprietary techniques and tools, providing competitors a roadmap to clone Claude Code’s features.

The exposure, caused by human error, was not a security breach but creates risks for hackers to exploit Claude Code software.

Summary by the WSJ. Gift link to the article