Stop Sloppypasta!

Good reminder and possibly the call of the year. 2026 may be the year we finally feel the peak of the “dead internet,” about ten years after the term was coined.

One potential upside is that we might diversify our engagement away from social media as it becomes increasingly bogged down by bot- and AI-generated (and human-copied) content. So, the “death” of the internet might actually be the birth of a more intentional way of connecting. Let’s see if and where the sloppypasta stops.

Source 1 (Stop Sloppypasta) – Source 2 (Dead Internet theory) – Source 3 (Human vs. LLM-generated content)

Which LLMs can you run locally?

This project helps you find out which models your machine can handle.

If you’re a data scientist experimenting with local models, getting an idea of what you can run locally is better than wasting time setting up huge models. The auto-detection isn’t perfect, and the list is missing some hardware combinations, but the convenience still makes it useful.

This is also useful if you’re running a workshop or demo. In my courses like AI Applications, we experiment with LLMs in Docker containers, but performance varies greatly by hardware.

This new webtool is a nice, fast alternative for selecting models. You can also use the more accurate CLI tool llmfit, but that takes an install or a Docker pull and run.

Autoresearch optimizing random seeds

You may have already seen “AutoResearch” released by Andrej Karpathy yesterday. It is another interesting experiment: research agents training on a single-GPU implementation of nanoGPT.

In this context, “research” is mostly hyperparameter tuning, but the agent is fully autonomous. So it can modify the code as it sees fit without a human in the loop.

While checking it out, I saw a session report posted by the agent, making me smile:

Changing random seed from 42→137 improved by 0.0004. Seed 7 was worse. Make of that what you will.

Even though the agent knows that optimizing the seed is pointless, it does it anyway and then tosses the ball back to you. Do whatever you want with that information!

Source 1 (Autoresearch repo) – Source 2 (Discussion link)

Has the worst language won?

Will we ever see a new programming language scale again?

With foundational models trained on the vast existing corpus of Python, what would it take for another general-purpose language to catch up? Does Julia have a chance of moving beyond its current niche?

Some new languages like Mojo are already being marketed as having “Pythonic syntax” (a superset of Python).

The post titled The Worst Language Won was the trigger for my question:

Python is the language of AI. By all conventional measures, it shouldn’t be.

Python is slow. Thousands of times slower than C, it loses benchmarks to languages that died decades ago.

Python is unsafe. With no compiler to catch your mistakes, your code’s flaws are exposed when it breaks in use.

Despite the intro, the post is actually praising Python’s experimental nature.

Language w/o reasoning ≠ understanding

Michael Burry (The Big Short) shared an interesting story today from an 1880 New York Times article titled “Is There Thought Without Language? Case of a Deaf Mute.” The story itself is fascinating, highlighting how far science has progressed in our understanding of deafness.

More to the point, I saw this powerful statement in the 1880 piece that separates understanding from language:

That by which we understand all things must be essentially superior to anything else that is understood by it.

This prompted an update to my “Mind the AI Gap” deck, a framework I initially created in May 2024 for a talk on LLM-assisted learning. Since then I’ve kept it updated as I discussed the topic.

Burry’s conclusion that “Language without the Capacity for Reason fails at Understanding” mirrors a key argument in the deck. This 1880 case study is now next to the previous 1980 discussion point from Steve Jobs (AI is the bicycle for the mind), marking a 100-year interval.

History, it seems, has a lot to teach us about AI.

See the “Mind the AI Gap” deck – Read Michael Burry’s post

Trusting the AI artifact

[Click title for image]

Looks good, must be right? We seem to scale back our questioning once AI starts building. This is a key finding from the latest Anthropic report on AI fluency.

When AI produces artifacts (apps, code, documents, visuals, or interactive tools), users are significantly less likely to verify the work:

  • 3.7 pp less likely to check facts
  • 3.1 pp less likely to question reasoning
  • 5.2 pp less likely to identify missing context

This suggests an interesting conundrum: as AI moves from being a chat/conversation partner to a builder, our skepticism fades. We are far more likely to question a text response than a functional piece of code or a formatted document, even though the latter often requires the most oversight.

Source

Human vs. LLM-generated content

Just a heads up: the next post you’ll read is LLM-generated, because this one isn’t 😉.

Skip the next post.

If there’s one causal effect I’m willing to speculate on without any modeling, this is it. This one doesn’t need a diff-in-diff: the trends are parallel; relative time checks out; ready for publication.

New Data Duets post: Using generative models, well, to generate data

I recently shared an underappreciated use case for generative models in data science: creating high-fidelity tabular datasets (OTA data for regression discontinuity).

The model’s success in data synthesis motivated a question: what are some high-value use cases for data science teams when using generative models to create datasets? This, in turn, led to our latest Data Duets post: “Using generative models, well, to generate data”

I walk through using the Synthetic Data Vault to scale a small OTA sample while preserving its statistical properties and the causal discontinuity. Duygu Dagli then weighs in on business implications: creating statistical twins to share data with vendors for solution optimization and benchmarking, simulating product recall data, and solving cold start problems in retail.

Ultimately the approach here represents a step toward data centricity: using high-fidelity simulations to dissect and validate the assumptions that drive our models.

Link to the full post

Something big is happening?

The title is from a popular post. It was clearly written to be sensational (which it seems to have achieved), yet it makes some valid points and offers useful advice:

Here’s a simple commitment that will put you ahead of almost everyone: spend one hour a day experimenting with AI. Not passively reading about it. Using it. Every day, try to get it to do something new… something you haven’t tried before, something you’re not sure it can handle. Try a new tool. Give it a harder problem. One hour a day, every day.

While following technological progress is always a good idea, the current pace is truly mind-blowing, so it requires more attention. As someone who has been coding since C# first launched (don’t check the date!) and whose day-to-day is full of markdowns, JSONs, and APIs, even I am finding it difficult to keep up lately.

So, personally, and surely as an educator, I can’t help but agree with the point about “the cost of not experimenting.” We are moving into a world where daily experimentation is as essential as your morning coffee, which you must drink.

Source

Is AI killing B2B SaaS?

[Click title for image]

Hard to ignore this question; it’s currently moving financial markets. The first comment in this massive 725-comment Hacker News thread makes a compelling case for why the answer is likely no: enterprise SaaS will survive because management simply does not want to be responsible for the vibe-coded alternative.

As a technologist and professor of “buy vs. build” discussions for over a decade, I agree that the death-of-SaaS argument is overblown. At the center of our discussions is the massive gap between building and maintaining, which is underestimated here. And we may not be emphasizing enough this critical aspect: shifting the liability.

AI is now driving down the cost of the initial build, but the build is only a fraction of the value an enterprise solution provides. SaaS also provides reliability (uptime) and the boring essentials (security compliance, data integrity). Enterprise SaaS owns the “system of record,” and migrating that is not just a (vibe-)coding problem.

Even if all of this is resolved, the liability bottleneck is still there; management won’t want to be responsible. Just because you can build it, doesn’t mean you should, and for most enterprises, I can see why they won’t, except for wrappers.