Modern macro recording

Remember the ability to “record” Excel macros we were promised back in the 90s that never quite worked? Autotab now does that job as a standalone browser.

It’s basically automation on steroids, making the training and execution of a mini-model easier and more accessible, eliminating the tedious process for everyday tasks.

This is a great use case for the post-LLM world of AI agents, with a potentially direct positive impact on employee productivity and net value creation. Check it out here.

Quantification bias in decisions

When making decisions, people are systematically biased to favor options that dominate on quantified dimensions.*

The figures show the extent of bias in different contexts. Depending on what information is quantified, our decisions change even though the information content remains about the same. In other words, quantification has a distorting effect on decision making.

This made me think about the implications for data centricity. By prioritizing quantitative over qualitative information, are we failing to stay true to the data?

The study provides some evidence: we overweight salary and benefits and overlook work-life balance and workplace culture in our decisions. We check product ratings but miss the fact that the product lacks that one little feature we really need. It’s discussed in product reviews, but not quantified.

That sounds right. Clearly, we often base our decision to stay at a hotel on the rating rather than the sentiment in the reviews. But will this tendency change? Quite possibly. We have LLMs everywhere. LLMs can help resolve the trade-off between quantification and data centricity.

Using text data for decision making is easier than ever. We can now more effectively search in product reviews instead of relying solely on ratings (e.g. Amazon Rufus). Information about work-life balance and workplace culture contained in employee reviews can be more effectively quantified. Currently, Glassdoor applies sentiment analysis to a subset of work-life balance reviews by keyword matching, but it’ll get better. Comparably.com already does better.

It’s time to do better. LLMs offer the opportunity to use qualitative information for more effective, higher quality decisions by staying true to data, or data centricity.

* From the article Does counting change what counts? Quantification fixation biases decision-making.

H/T Philip Rocco for sharing the article. You can learn more about data centricity at datacentricity.org.

TinyTroupe from Microsoft

New Microsoft Research project comes with a Python library to create AI agents “for imagination enhancement and business insights”. Ha! This follows Google’s Interactive Simulacra from last year.

TinyTroupe is an experimental Python library that allows the simulation of people with specific personalities, interests, and goals. These artificial agents – TinyPersons – can listen to us and one another, reply back, and go about their lives in simulated TinyWorld environments. […] The focus is thus on understanding human behavior…

So it’s like a little SimCity where AI agents “think” and act (talk). The product recommendation notebook asks the agents to brainstorm AI features for MS Word. It’s a GPT 4 wrapper after all, so the ideas are mediocre at best, focusing on some kind of train/test logic: learn the behavior of the Word user and… (blame the predictive modeling work that dominates the training data)

Are these the most valuable business insights? This project attempts to “understand human behavior”, but can we even run experiments with these agents to simulate the causal links needed for business insights in a counterfactual design? The answer is no: the process, including agent creation and deployment, screams unknown confounders and interference.

It still looks like fun and is worth a try, even though I honestly thought it was a joke at first. That’s because the project, coming from Microsoft Research, has a surprising number of typos everywhere and errors in the Jupyter notebooks (and a borderline funny description):

One common source of confusion is to think all such AI agents are meant for assiting humans. How narrow, fellow homosapiens! Have you not considered that perhaps we can simulate artificial people to understand real people? Truly, this is our aim here — TinyTroup is meant to simulate and help understand people! To further clarify this point, consider the following differences:

Source

AI outlines in Scholar PDF Reader

Google’s PDF Reader seems to have got a new feature:

An AI outline is an extended table of contents for the paper. It includes a few bullets for each key section. Skim the outline for a quick overview. Click on a bullet to deep read where it gets interesting – be it methods, results, discussion, or specific details.

Clearly it’s not an alternative to reading (well, I hope not), but it makes search and discovery a breeze. Sure, one could feed the PDF into another LLM to generate a table of contents and outline, but the value here is the convenience of having them generated right when you open the PDF (not just in Google Scholar, but anywhere on the web). Highly recommended.

If you’re not already using this tool, I shared this very, very helpful tool when it came out earlier this year.

New chapter in the Causal Book: IV the Bayesian way

New chapter in the Causal Book is out: IV the Bayesian Way. In this chapter we examine the price elasticity of demand for cigarettes and identify the causal treatment effect using state taxes as an instrument. We’ll streamline the conceptual model and data across chapters later.

Basically, the sample question here is: What is the effect of a price increase on smoking? As always, the solution includes complete code and data. This chapter uses the powerful RStan and CmdStanR via brms and ulam, and, unlike the other chapters, doesn’t replicate the solution in Python (due to the added computational cost of the sampling process).

Causal Book is an interactive resource that presents a network of concepts and methods for causal inference. Due to the nonlinear – network structure of the book, each new chapter comes with a number of other linked sections and pages. All of this added content can be viewed in the graph view (available only on desktop in the upper right corner).

This book aims to be a curated set of design patterns for causal inference and the application of each pattern using a variety of methods in three approaches: Statistics (narrowly defined), Machine Learning, and Bayesian. Each design pattern is supported by business cases that use the pattern. Three approaches are compared using the same data and model. The book discusses the lesser known and understood details of the modeling process in each pattern.

Ongoing debate: LLMs reasoning or not

There are now so many papers testing the capabilities of LLMs that I increasingly rely on thoughtful summaries like this one.

The word ‘reasoning’ is an umbrella term that includes abilities for deduction, induction, abduction, analogy, common sense, and other ‘rational’ or systematic methods for solving problems. Reasoning is often a process that involves composing multiple steps of inference. Reasoning is typically thought to require abstraction—that is, the capacity to reason is not limited to a particular example, but is more general. If I can reason about addition, I can not only solve 23+37, but any addition problem that comes my way. If I learn to add in base 10 and also learn about other number bases, my reasoning abilities allow me to quickly learn to add in any other base.

Abstraction is key to imagination and counterfactual reasoning, and thus to establishing causal relationships. We don’t have it (yet) in LLMs, as the three papers summarized here and others show (assuming robustness is a necessary condition).

Is that a deal breaker? Clearly not. LLMs are excellent assistants for many tasks, and productivity gains are already documented.

Perhaps if LLMs weren’t marketed as thinking machines, we could have focused more of our attention on how best to use them to solve problems in business and society.

Nonetheless, the discussion around reasoning seems to be advancing our understanding of our thinking and learning process vis-à-vis machine learning, and that’s a good thing.

The illusion of information adequacy

A new PLOS One study coined this term to describe people’s strong tendency to believe they always have enough data to make an informed decision – regardless of what information they actually have.

In the study, participants responded to a hypothetical scenario in which control participants were given full information and treatment participants were given about half the information (about a water issue involving a school). The study found that treatment participants believed they had comparably adequate information and were equally competent to make thoughtful decisions based on that information.

In essence, the study shows that people assume they have enough information – even when they lack half of the relevant information. This can be extended to data science, where it is often assumed that the data at hand is sufficient to make decisions, even though assumptions fill in the gaps between data and models (implicitly or explicitly), leading to decisions. We briefly discuss this idea of data centricity at datacentricity.org (and more to come).

Image courtesy of learningrabbithole.com.

Programming is solved by LLMs, isn’t it?

AI should virtually eliminate coding and debugging.

This is a direct quote from an IBM report published in 1954 (here, page 2), if you replace AI with Fortran. It didn’t happen, not because Fortran wasn’t revolutionary at the time. It was the first commercial compiler, which took 18 person-years to develop.

Compiling didn’t “solve” it, and neither do LLMs. LLMs help solve (part of) the problem. They don’t solve exception handling. I wrote before about exception handling (or lack thereof) in most machine learning applications. We need to pay more attention to it.

Exception handling is difficult, if not impossible, to automate away because of the complexity and unintended consequences of human-machine (user-model) interactions. LLMs can certainly be useful for generating alternative scenarios and building solutions for them.

We will continue to benefit from the models that are increasingly available to us, including LLMs. Just remembering that the problem is not just pattern recognition, but also exception handling, should help us think about how best to use these models to solve problems.

This essay here is more from a software development perspective. From the essay:

You’d think 15 years into the smart phone revolution most people could operate an order kiosk or self-checkout without help. That’s certainly what stores had hoped. But as these are rolling out you can see how these systems are now staffed by people there to handle the exception. Amazon Go will be surly seen ahead of its time, but those are now staffed full time and your order is checked on the way night. And special orders at McDonalds? Head to the counter 🙂 

Mathematical Methods in Data Science (with Python)

Just came across this neat resource while looking for an MCMC / Gibbs sampling code example in object recognition. Self-description of the book:

This textbook on the mathematics of data has two intended audiences:

  • For students majoring in math or other quantitative fields like physics, economics, engineering, etc.: it is meant as an invitation to data science and AI from a rigorous mathematical perspective.
  • For mathematically-inclined students in data science related fields (at the undergraduate or graduate level): it can serve as a mathematical companion to machine learning, AI, and statistics courses.

Not yet published, but you can check it out here.

Podcast-style discussions on Data Duets

You should not add 1 before log-transforming zeros. If you don’t believe me, listen to these two experts on how to make better decisions using log-transformed data.

This conversation was produced by NotebookLM based on our discussion about the Log of Zero problem at Data Duets. Duygu Dagli and I have now added a podcast-style conversation to each of our articles. All audio is raw/unedited.

The conversations are usually fun (sometimes for odd reasons). The model adds (1) examples we don’t have in the original content and (2) light banter and some jokes. The examples are hit or miss.

So, besides the usual deep and reinforcement learning backend, what does NotebookLM do? (based on Steven Johnson’s description on the Vergecast)

  1. Start with a draft and revise it
  2. Generate a detailed script of the podcast
  3. Critique the script and create a revised version
  4. Add disfluencies (um, uh, like, you know, c-c-can, sssssee…) to sound convincingly human
  5. Apply Google’s latest text-to-speech Gemini model to add intonation, emphasis, and pacing

Have fun, and don’t add 1 to your variables before applying the log transformation.