“AI in 64 Pictures”
A visual journey

(Updated in July, 2025)

Gorkem Turgut (G.T.) Ozer

Paul College of Business and Economics

June, 2025

So using “AI” to refer to “LLMs” is a very limited view.
But popularity wins.

I. EARLY DAYS (2010s!)

What we have seen is a quantification of text. Next, we will see how exactly we quantified the text (Hint: Not so deep learning).

P.S. We skipped a preceding approach: Bag of Words (BoW). That’s because BoW is counting words (more or less).

So far so good, but word embeddings are static, while human language is quite dynamic (“LLMs” are called “AI” these days!).

Let’s do something about that.


We used RNNs to dynamically process word embeddings, updating the context of the words, and paying attention.

Why is attention so important? Let’s see an application.

In the preceding example of language translation using RNNs with attention, we have a little problem. Any idea what it is?

The Transformer comes to the rescue…

II. SMALL PICTURE: FROM RNNs TO TRANSFORMERS

A confession. One of the simplifications in this deck is that we have not talked about the encoder-decoder pairs. They are essential in sequence-to-sequence tasks such as translation.

Let’s fix that and see the complete Transformer architecture.

While a picture simplifies the Transformer, its true complexity lies in its operation. Google Research has a notebook for language translation to test the following transformer:
Next up: GPT - Generative Pre-trained Transformer

III. BIG PICTURE: THE TRANSFORMER EXPLAINED

So far, this is how we’ve arrived at the Transformer:

\(BoW - BoW + Embeddings + RNN + Attention - RNN\) \(+ More Attention + Multiple Heads = Transformer\)

Let’s unpack the magic box now: the Transformer.

In addition to Google Research’s language translation notebook, a more detailed visualization is now available here. Just brew a pot of coffee before you click - it’s comprehensive!

Before we continue, let’s delve into the inner workings of a small GPT-2 right here (based on nanoGPT) and understand it.

IV. FROM PROMPTING GPTs TO ASKING AGENTS

So, is it fair to say LLMs are just next token prediction?

Not quite. Here’s why:

AGENTIC ERA:
PROMPT JUST ASK FOR RESULTS

SIDE NOTE: HOW GPTs ARE TRAINED

Footnotes

  1. By Claude Sonnet 4’s count on VS Code.

  2. Generated using Claude on Perplexity.

  3. Generated using Claude on Perplexity.

  4. Generated using Claude on Perplexity.