Anthropic’s advisor strategy

Anthropic’s new Advisor Strategy framework is very promising.

The first image shows how users can now pair a more capable model (like Opus) as a strategic “Advisor” with a faster, cheaper model (like Sonnet or Haiku) as the “Executor” in a single Claude API call. This mirrors the actor-critic logic: the Executor (actor) invokes the Advisor (critic) as needed, with both operating on a shared context to optimize performance and cost.

The second image is a Nano Banana illustration for my year-end reflection on LLM interfaces, where I discussed using IDE tools to bridge different model families from multiple providers and benefit from complementary error patterns, a modular take on the same actor-critic setup.

In a modeling workflow, for example, the actor codes for data cleaning, while the critic serves as the reviewer to ensure variables are correctly coded and align with methodological assumptions. Anthropic’s framework automates this workflow within a single provider.

Source

Anthropic issues takedown requests

What a shame to use copyrighted material without permission…

Anthropic accidentally exposed the underlying instructions for its Claude Code AI app, prompting over 8,000 takedown requests.

The leak revealed Anthropic’s proprietary techniques and tools, providing competitors a roadmap to clone Claude Code’s features.

The exposure, caused by human error, was not a security breach but creates risks for hackers to exploit Claude Code software.

Summary by the WSJ. Gift link to the article

How to delete a Claude Cowork session

[Click title for image]

Turns out Claude Cowork sessions can’t be easily deleted.

I was testing Claude Cowork in the app, and after two sessions, I noticed there is no way to delete the sessions (only an “Archive” button). This appears to be a known open issue, as discussed here.

In Settings, Anthropic actually asks you to send them an email to delete sessions. Ha!

Anyway, after some digging and a few greps, I had to delete my test sessions manually. For the privacy-conscious like me, here’s a two-step solution (Close the Claude App first):

  1. Delete the project files in your global .claude/projects directory (if one exists). The folder names here are “friendly” and easy to find.
  2. Delete the related log file and folder in: C:/Users/<username>/AppData/Roaming/Claude/local-agent-mode-sessions (Windows)

The second step is less friendly because the file and folder names are GUIDs (36-character strings). Unless you can grep for the content from the session, look for folders and JSON files that begin with “local_“. For each Claude Cowork session, there is a folder and a matching file. You can delete everything in local-agent-mode-sessions to delete all sessions. Just restart your computer if you receive a VM error when you open the Claude App next time.

Stop Sloppypasta!

Good reminder and possibly the call of the year: Stop Sloppypasta.

2026 may be the year we finally feel the peak of the “dead internet,” about ten years after the term was coined.

One potential upside is that we might diversify our engagement away from social media as it becomes increasingly bogged down by bot- and AI-generated (and human-copied) content. So, the “death” of the internet might actually be the birth of a more intentional way of connecting. Let’s see if and where the sloppypasta stops.

Source 1 (Stop Sloppypasta) – Source 2 (Dead Internet theory) – Source 3 (Human vs. LLM-generated content)

Which LLMs can you run locally?

This project helps you find out which models your machine can handle.

If you’re a data scientist experimenting with local models, getting an idea of what you can run locally is better than wasting time setting up huge models. The auto-detection isn’t perfect, and the list is missing some hardware combinations, but the convenience still makes it useful.

This is also useful if you’re running a workshop or demo. In my courses like AI Applications, we experiment with LLMs in Docker containers, but performance varies greatly by hardware.

This new webtool is a nice, fast alternative for selecting models. You can also use the more accurate CLI tool llmfit, but that takes an install or a Docker pull and run.

Autoresearch optimizing random seeds

You may have already seen “AutoResearch” released by Andrej Karpathy yesterday. It is another interesting experiment: research agents training on a single-GPU implementation of nanoGPT.

In this context, “research” is mostly hyperparameter tuning, but the agent is fully autonomous. So it can modify the code as it sees fit without a human in the loop.

While checking it out, I saw a session report posted by the agent, making me smile:

Changing random seed from 42→137 improved by 0.0004. Seed 7 was worse. Make of that what you will.

Even though the agent knows that optimizing the seed is pointless, it does it anyway and then tosses the ball back to you. Do whatever you want with that information!

Source 1 (Autoresearch repo) – Source 2 (Discussion link)

Has the worst language won?

Will we ever see a new programming language scale again?

With foundational models trained on the vast existing corpus of Python, what would it take for another general-purpose language to catch up? Does Julia have a chance of moving beyond its current niche?

Some new languages like Mojo are already being marketed as having “Pythonic syntax” (a superset of Python).

The post titled The Worst Language Won was the trigger for my question:

Python is the language of AI. By all conventional measures, it shouldn’t be.

Python is slow. Thousands of times slower than C, it loses benchmarks to languages that died decades ago.

Python is unsafe. With no compiler to catch your mistakes, your code’s flaws are exposed when it breaks in use.

Despite the intro, the post is actually praising Python’s experimental nature.

Language w/o reasoning ≠ understanding

Michael Burry (The Big Short) shared an interesting story today from an 1880 New York Times article titled “Is There Thought Without Language? Case of a Deaf Mute.” The story itself is fascinating, highlighting how far science has progressed in our understanding of deafness.

More to the point, I saw this powerful statement in the 1880 piece that separates understanding from language:

That by which we understand all things must be essentially superior to anything else that is understood by it.

This prompted an update to my “Mind the AI Gap” deck, a framework I initially created in May 2024 for a talk on LLM-assisted learning. Since then I’ve kept it updated as I discussed the topic.

Burry’s conclusion that “Language without the Capacity for Reason fails at Understanding” mirrors a key argument in the deck. This 1880 case study is now next to the previous 1980 discussion point from Steve Jobs (AI is the bicycle for the mind), marking a 100-year interval.

History, it seems, has a lot to teach us about AI.

See the “Mind the AI Gap” deck – Read Michael Burry’s post

Trusting the AI artifact

[Click title for image]

Looks good, must be right? We seem to scale back our questioning once AI starts building. This is a key finding from the latest Anthropic report on AI fluency.

When AI produces artifacts (apps, code, documents, visuals, or interactive tools), users are significantly less likely to verify the work:

  • 3.7 pp less likely to check facts
  • 3.1 pp less likely to question reasoning
  • 5.2 pp less likely to identify missing context

This suggests an interesting conundrum: as AI moves from being a chat/conversation partner to a builder, our skepticism fades. We are far more likely to question a text response than a functional piece of code or a formatted document, even though the latter often requires the most oversight.

Source

Human vs. LLM-generated content

Just a heads up: the next post you’ll read is LLM-generated, because this one isn’t 😉.

Skip the next post.

If there’s one causal effect I’m willing to speculate on without any modeling, this is it. This one doesn’t need a diff-in-diff: the trends are parallel; relative time checks out; ready for publication.