On Data Science, Causal Models, & AI

Student’s t-test during a study leave at Guinness

You may or may not know that the Student’s t-test was named after William Sealy Gosset, head experimental brewer at Guinness, who published under the pseudonym “Student”. This is because Guinness preferred its employees to use pseudonyms when publishing scientific papers.

The part I didn’t know is that Guinness had a policy of granting study leave to technical staff, and Gosset took advantage of this during the first two terms of the 1906-1907 academic year. This sounds like a great idea to encourage boundary spanning.

This article is a very nice account of the story, with nice visuals (which will definitely make it into the beer preference example in my predictive analytics course).

Source

Mind the AI Gap: Understanding vs. Knowing

Uncategorized

Next week I will be speaking at the 2024 International Business Pedagogy Workshop on the use of AI in education. This gave me the opportunity to put together some ideas about using LLMs as learning tools.

Some key points:
– Knowing is not the same as understanding, but can easily be confused.
– Understanding requires the ability to reason and identify causality (using counterfactual thinking).
– This is completely lacking in LLMs at the moment.
– Framing LLMs as magical thinking machines or creative minds is not helpful because it can easily mislead us to lend our cognition.
– The best way to benefit from LLMs is to recognize them for what they are: Master memorization models.
– Their power lies in their sheer processing power and capacity, which can make them excellent learning companions when used properly.
– How can LLMs best be used as learning companions? That’ll be part of our discussion.

Source

Are the two images the same?

Uncategorized

To humans, the answer is undoubtedly yes. To algorithms, they could easily be two completely different images, if not mistaken in their characteristics. The image on the right is the 𝘨𝘭𝘢𝘻𝘦𝘥 version of the original image on the left.

Glazed is a product of the SAND Lab at the University of Chicago that helps artists protect their art from generative AI companies. Glaze adds noise to artwork that is invisible to the human eye but misleading to the algorithm.

Glaze is free to use, but understandably not open source, so as not to give art thieves an advantage in adaptive responses in this cat-and-mouse game.

The idea is similar to the adversarial attack famously discussed in Goodfellow et al. (2015), where a panda predicted with low confidence becomes a sure gibbon to the algorithm after adversarial noise is added to the image.

I heard about this cool and useful project a while ago and have been meaning to help spread the word. In the words of the researchers:

Glaze is a system designed to protect human artists by disrupting style mimicry. At a high level, Glaze works by understanding the AI models that are training on human art, and using machine learning algorithms, computing a set of minimal changes to artworks, such that it appears unchanged to human eyes, but appears to AI models like a dramatically different art style. For example, human eyes might find a glazed charcoal portrait with a realism style to be unchanged, but an AI model might see the glazed version as a modern abstract style, a la Jackson Pollock. So when someone then prompts the model to generate art mimicking the charcoal artist, they will get something quite different from what they expected.

The sample artwork is by Jingna Zhang.

Source

Hardest problem in Computer Science: Centering things

Uncategorized

This is a must-read/see article full of joy (and pain) for visually obsessed people. It’s a tribute to symmetry and a rebuke to non-random, unexplained errors in achieving it.

Centering is more than a computer science problem. We struggle with centering all the time, from hanging frames on the wall to landscaping. In another world, centering is also central to data science, as in standardized scores and other rescaling operations. Centering gives us a baseline against which to compare everything else. Our brains love this symmetry (as explained here and elsewhere).

Source

Amazon restricts authors from self-publishing more than three books a day after AI concerns

Uncategorized

I’ve just discovered that Amazon has taken swift action to address the AI-generated books flooding the site:

Amazon has created a new rule that limits the number of books authors can self-publish on its site to just 3 per day.

While this rule is a bit too restrictive, it will certainly solve the problem for good!

Source

“Medium is for human storytelling, not AI-generated writing.”

Uncategorized

Medium appears to be the first major publishing platform to adopt a policy banning the monetization of articles written by AI, effective May 1, 2024.

Enforcing this policy will be a real challenge, and will likely require human moderators to win an otherwise cat-and-mouse game. This is another area where AI may, ironically, create jobs to clean up the mess it has made.

Source

Why do people use LLMs?

Uncategorized

Apparently for anything and everything, including advice of all kinds (medical, career, business), therapy, and Dungeons & Dragons (to create storylines, characters, and quests for players).

The list is based on a crawl of the web (Quora, Reddit, etc.).

Source

How do language models represent relations between entities?

Uncategorized

This work shows that the complex nonlinear computation of LLMs for attribute extraction can be well-approximated with a simple linear function…

and more importantly, without a conceptual model.

The study has two main findings:
1. Some of the implicit knowledge is represented in a simple, interpretable, and structured format.
2.. This representation is not universally used, and superficially similar facts can be encoded and extracted in very different ways.

This is an interesting study that highlights the simplistic and associative nature of language models and the resulting randomness in their output.

Source

Google’s new PDF parser

Uncategorized

In less sensational but more useful AI news, I’ve just discovered Google’s release of a new PDF parser.

The product was pushed by the Google Scholar team as a Chrome extension, but once installed, it parses any PDF opened in Chrome (it doesn’t have to be an academic article). It creates an interactive table of contents and shows the in-text references, tables, and figures on the spot, without having to go back and forth from top to bottom of the paper. It also has rich citation features.

I love it, but my natural reaction was, why didn’t we have this already?

Source

World’s first fully autonomous AI engineer?

Uncategorized

Meet Devin, the world’s first fully autonomous AI software engineer.

We are an applied AI lab focused on reasoning.

We’re building AI teammates with capabilities far beyond today’s existing AI tools. By solving reasoning, we can unlock new possibilities in a wide range of disciplines—code is just the beginning.

Cognition Labs makes some big claims. The demos are impressive, but it is not clear what they mean by “solving reasoning”. There is good reasoning and there is bad reasoning. The latter may be easier to solve. Let’s see what’s left after the smoke clears.

At least they do not claim that Devin is a creative thinker.

Source