Language w/o reasoning ≠ understanding

Michael Burry (The Big Short) shared an interesting story today from an 1880 New York Times article titled “Is There Thought Without Language? Case of a Deaf Mute.” The story itself is fascinating, highlighting how far science has progressed in our understanding of deafness.

More to the point, I saw this powerful statement in the 1880 piece that separates understanding from language:

That by which we understand all things must be essentially superior to anything else that is understood by it.

This prompted an update to my “Mind the AI Gap” deck, a framework I initially created in May 2024 for a talk on LLM-assisted learning. Since then I’ve kept it updated as I discussed the topic.

Burry’s conclusion that “Language without the Capacity for Reason fails at Understanding” mirrors a key argument in the deck. This 1880 case study is now next to the previous 1980 discussion point from Steve Jobs (AI is the bicycle for the mind), marking a 100-year interval.

History, it seems, has a lot to teach us about AI.

See the “Mind the AI Gap” deck – Read Michael Burry’s post

Trusting the AI artifact

[Click title for image]

Looks good, must be right? We seem to scale back our questioning once AI starts building. This is a key finding from the latest Anthropic report on AI fluency.

When AI produces artifacts (apps, code, documents, visuals, or interactive tools), users are significantly less likely to verify the work:

  • 3.7 pp less likely to check facts
  • 3.1 pp less likely to question reasoning
  • 5.2 pp less likely to identify missing context

This suggests an interesting conundrum: as AI moves from being a chat/conversation partner to a builder, our skepticism fades. We are far more likely to question a text response than a functional piece of code or a formatted document, even though the latter often requires the most oversight.

Source

Human vs. LLM-generated content

Just a heads up: the next post you’ll read is LLM-generated, because this one isn’t 😉.

Skip the next post.

If there’s one causal effect I’m willing to speculate on without any modeling, this is it. This one doesn’t need a diff-in-diff: the trends are parallel; relative time checks out; ready for publication.

New Data Duets post: Using generative models, well, to generate data

I recently shared an underappreciated use case for generative models in data science: creating high-fidelity tabular datasets (OTA data for regression discontinuity).

The model’s success in data synthesis motivated a question: what are some high-value use cases for data science teams when using generative models to create datasets? This, in turn, led to our latest Data Duets post: “Using generative models, well, to generate data”

I walk through using the Synthetic Data Vault to scale a small OTA sample while preserving its statistical properties and the causal discontinuity. Duygu Dagli then weighs in on business implications: creating statistical twins to share data with vendors for solution optimization and benchmarking, simulating product recall data, and solving cold start problems in retail.

Ultimately the approach here represents a step toward data centricity: using high-fidelity simulations to dissect and validate the assumptions that drive our models.

Link to the full post

Something big is happening?

The title is from a popular post. It was clearly written to be sensational (which it seems to have achieved), yet it makes some valid points and offers useful advice:

Here’s a simple commitment that will put you ahead of almost everyone: spend one hour a day experimenting with AI. Not passively reading about it. Using it. Every day, try to get it to do something new… something you haven’t tried before, something you’re not sure it can handle. Try a new tool. Give it a harder problem. One hour a day, every day.

While following technological progress is always a good idea, the current pace is truly mind-blowing, so it requires more attention. As someone who has been coding since C# first launched (don’t check the date!) and whose day-to-day is full of markdowns, JSONs, and APIs, even I am finding it difficult to keep up lately.

So, personally, and surely as an educator, I can’t help but agree with the point about “the cost of not experimenting.” We are moving into a world where daily experimentation is as essential as your morning coffee, which you must drink.

Source

Is AI killing B2B SaaS?

[Click title for image]

Hard to ignore this question; it’s currently moving financial markets. The first comment in this massive 725-comment Hacker News thread makes a compelling case for why the answer is likely no: enterprise SaaS will survive because management simply does not want to be responsible for the vibe-coded alternative.

As a technologist and professor of “buy vs. build” discussions for over a decade, I agree that the death-of-SaaS argument is overblown. At the center of our discussions is the massive gap between building and maintaining, which is underestimated here. And we may not be emphasizing enough this critical aspect: shifting the liability.

AI is now driving down the cost of the initial build, but the build is only a fraction of the value an enterprise solution provides. SaaS also provides reliability (uptime) and the boring essentials (security compliance, data integrity). Enterprise SaaS owns the “system of record,” and migrating that is not just a (vibe-)coding problem.

Even if all of this is resolved, the liability bottleneck is still there; management won’t want to be responsible. Just because you can build it, doesn’t mean you should, and for most enterprises, I can see why they won’t, except for wrappers.

Statistical Inference: The Big Picture

Most modeling failures are caused by flawed (and often implicit) assumptions.

Statistical pragmatism recognizes that all forms of statistical inference make assumptions, assumptions which can only be tested very crudely (with such things as goodness-of-fit methods) and can almost never be verified. This is not only at the heart of statistical inference, it is also the great wisdom of our field.

This is also what we discuss in the Data Centricity Lab (see datacentricity.org for an overview). We underline the role of assumptions in the modeling process and how they dictate the usefulness of models (and the decisions they support).

This paper defends pragmatism over dogma:

  • Using both frequentist (e.g., p-values, confidence intervals) and Bayesian (e.g., posterior probabilities) tools, depending on the problem.
  • Prioritizing the assumptions that connect models to real-world data rather than debating the “true” nature of probability.

One implication is that we rethink how we frame the relationship between a sample (reality) and the population (hypothetical). We often describe statistical inference as random sampling from a finite population, but that can be misleading. The paper suggests we call the estimand “theoretical mean” rather than “population mean.”

Why does it matter? The more we emphasize the role of assumptions, the more modelers question if the theoretical world aligns well with the real world that produced the data. As we discuss at Data Duets, when assumptions are sidelined, a misconception takes hold: the idea that methodological rigor can substitute for conceptual accuracy. And causal (semi-)parametric solutions are often more sensitive to this misconception than predictive ones (as we further discuss here).

Kass (2011) Paper

Moltbook is not a community

and there is no emergence. It’s yet another simulation. Here’s a reality check.

Community takes trust and authenticity, a shared purpose and identity, and active participation and interaction. These LLM bots have no concept of trust or a shared purpose. Data shows they don’t even truly interact; they just take parallel actions:

tl;dr: agents post a LOT but don’t really talk to each other. 93.5% of comments get zero replies. conversations max out at a depth of 5. at least as of now, moltbook is less “emergent AI society” and more “6,000 bots yelling into the void and repeating themselves” (Holtz)

And emergence requires more than independent entities occupying the same space. Even if the bots truly interacted, emergence takes consistent horizontal influence and downward causation:

One of the emergent properties that a system can have is the power to exert causal influence on the components of that system in a way that is consistent with, but different from, the causal influences that those components exert upon each other. (Newman, 1996)

Bottom line is, Moltbook is an exciting experimental simulation for technologists like me, but it is neither a community nor an emergent society. The community elements and causal loops are currently missing: the agents do not adapt their weights or behaviors based on the collective. They are simply generating tokens into a vacuum.

MoltbookSource 1 (Holtz’s analysis) – Source 2 (Newman, 1996)

[Click title for image] H/t to Ben Lowenstein for the screenshot.

Clawdbot craze explainer

If you heard about ClawdBot but don’t quite understand the hype, this article will help:

  • ​An agent project named ClawdBot was released by developer Peter Steinberger, quickly gaining viral traction (Over 60,000 GitHub stars in just a couple of days).
  • The frenzy was so high that it reportedly caused a surge in Mac Mini sales, while also driving the Cloudflare stock up over 20% because the project used Cloudflare Workers (~$12 billion increase in market cap).
  • Following a trademark request from Anthropic (the name was too close to Claude AI), the project was renamed to MoltBot.
  • During the 10-second window while the name was being changed on GitHub by the developer, crypto scammers hijacked the old name.
  • The scammers launched a $CLAWD token using the name that reached a $16 million market cap before crashing in 12 hours.

And all of this happened in just 48 hours. We’re truly living in exciting times (!).

Source

Synthetic data using generative models

Back to data science. What is an excellent use case for generative AI models? Creating synthetic data for causal identification.

Synthetic controls is already popular, but there is another use case: scaling a small dataset realistically to a larger dataset. Let’s say we have a limited dataset (new product launch, new or niche market, or early startup data) and need to expand it for analysis while preserving its properties. Training a generative model is only sensible.

In a new Causal Book chapter, we use a Gaussian Copula synthesizer to generate data for the regression discontinuity. We start with a small seed data with a causal jump around the cutoff (ideally that seed data is a real dataset). The challenge? Generative models are “smoothers” by nature; they tend to blur the discontinuities.

Our fix was to move the jump outside the black box in a semi-parametric design: we trained the model on residualized data, effectively teaching the model everything except the causal break, and then re-introducing the break during reconstruction:

  1. Residualization: We strip the causal effect pre-training. By subtracting the deterministic part from the outcome, we isolate a residual without the jump.
  2. Training: We train the Gaussian Copula on the residualized data. This allows the generative model to capture the correlation structure between covariates without getting derailed by the discontinuity.
  3. Reconstruction: We don’t ask the model to generate the outcome directly. Instead, we reconstruct it by applying the functional form (the “ground truth” – special case, we know) to the synthetic covariates and the generated residual.

This process forces the treatment effect back into the otherwise smoothed-out data, preserving the causal structure we needed. We used the Synthetic Data Vault library. You can find the Python code and Colab notebook in Chapter 2 of the Causal Book.

Our use case was specialized here, but we can think of other use cases. We are working on a Data Duets article to discuss the broader business implications.