Public trust in generative models

The fact that 73% of consumers trust content created by generative AI models is intriguing. And it’s not just people playing around with a chatbot for trivial conversations:*

– 67% believe they could benefit from receiving medical advice from a generative AI model
– 66% would seek advice from a generative AI model on relationships (work, friendships, romantic relationships) or life/career plans
– 64% are open to buying new products or services recommended by a generative AI model
– 53% trust generative AI-assisted financial planning

To put this into perspective, only 62% of people trust their doctor the most for medical advice.**

* 2023 survey by Capgemini of 10,000 consumers
** 2023 survey by OnePoll for Bayer of over 2000 adults

Source

tidylog

H/T to Travis Gerke, I’ve just discovered the wonderful work of Benjamin Elbers. tidylog provides feedback for dplyr and tidyr operations in R. This is another simple and powerful idea that basically uses wrapper functions for the dplyr and tidyr functions with feedback added. This will help greatly with both teaching and “doing.”

Source

Pandas AI

Pandas AI is an interesting and somewhat natural direction for embedding large language models into data science/analytics. This is less of a black box than automated exploratory data analysis tools, but still makes things easier.

We will likely see more ideas like Gabriele Venturi‘s here. For any serious project, though, we’ll still need skilled humans who can understand how the algorithm is responding to queries, and can check and confirm that it’s responding as expected.

Source

Experimental data analysis and the importance of conceptual models

In this new post, Duygu Dagli and I took a quick look at the analysis of experimental data. I really enjoyed writing this piece because Lord’s revelation is one of my favorites (pardon the pun).

Lord’s paradox is related to the better known Simpson’s paradox and it highlights the importance of constructing the right conceptual model before moving on to modeling the data. In the post, I speculated about one potential conceptual model and discussed its implications for modeling the data at hand.

Frankly, the example in the post had much to unpack. I just picked up on the part that relates to causal models and Lord’s paradox. I also ended up touching on an interesting discussion around the use of diff-in-diff vs. lagged regression models.

After running an experiment, how do you estimate the average treatment effect (ATE)? Which model do you choose to use? In this post, we use five different models with different assumptions to answer the same question. We find five different ATEs… Which one is the correct average treatment effect in this experiment? How do we decide?

In this post, Gorkem Turgut Ozer and I explore these questions (and more) by discussing the differences across models and potential implications. We ended up covering an interesting paradox I enjoyed learning about!

To me, a main takeaway is, business value from data is maximized when the right conceptual model meets the right method. For this to happen more often, data science and pricing leaders need the technical skills to be able to ask the right questions. They also need to build a trusting relationship with their teams to delegate and learn from them.

Source

Human learner vs. ChatGPT in taking tests designed for humans

Across the board, ChatGPT is passing exams by answering a mix of short-answer, essay, and multiple choice questions:

– U.S. medical licensing exam (says the attached study)
– Wharton School MBA exam on Operations Management
– University of Minnesota Law School exams in Constitutional Law, Employee Benefits, Taxation, and Torts

If ChatGPT is able to pass these exams, it is not because ChatGPT is revolutionary (while it is surely impressive) but because they are just bad exams. These exams must be lacking enough components that require some form of creative thinking and use of imagination.

Source

ChatGPT excitement

What is demonstrated here is a successful translation from human language to code. OpenAI has another project for this purpose: Codex. Microsoft’s GitHub Copilot serves as a specialized version (both are descendents of GPT-3). DeepMind’s AlphaCode and the open source PolyCoder also target English to code translation.

What is missing (and provided by Marco) is the articulation of a solution that stems from a conceptual model, which in turn, is informed by causal links. For example: diversification reduces asset-specific risk.

Unless ChatGPT reasonably limits the weights of each individual stock based only on the objective at the beginning (minimize SD) without being explicitly instructed, we’d better curb our enthusiasm here.

Source

Just tried out ChatGPT…

Just tried out ChatGPT, the new large language model trained by OpenAI, and I was blown away by its capabilities! It can generate human-like text responses to any prompt, making it a powerful tool for conversation simulation, language translation, and more.

I also had a chance to play around with the code, and it’s surprisingly simple to use. Here’s a quick example of how to generate a response from ChatGPT using the Python API:

Not bad but a bit too excited (blown away, really?). Also the shameless self-promotion using my voice without any disclosure. We’re not off to a good, trusting relationship.

“Life’s most important questions are, for the most part, nothing but probability problems.”

Pierre-Simon Laplace in Théorie Analytique Des Probabilités

We need more decision / data scientists to ask “What is the probability of sun rising tomorrow?” yet we don’t seem to put a strong emphasis on probability theory for several reasons.

Probability is not thoroughly covered (ideally as a standalone course) in most data science / business analytics programs. In predictive analytics, common packages/libraries for ensemble methods focus on classifications, almost hiding the probability calculations (which are distorted in some anyway). Most frequentist reporting are limited to point estimates and errors, again hiding the underlying probabilistic assumptions. Et Cetera.

In search of a short reading to share with my students, I’ve come across a recent book (updated in May, 2022) by Dirk P. Kroese, Zdravko Botev, Thomas Taimre, Radislav (Slava) Vaisman Ph.D. The book is open access and the appendices serve as a nice refresher. Appendix C is a little primer on probability for example.

Source

Have you heard of a synthetic control method but you’re not exactly sure what it is and what kind of problems it may help solve?

In this latest collaboration with Duygu Dagli, we shed some light on it and soon other topics. We plan to bridge conceptual and applied takes on statistical modeling and machine learning methods and the business problems they may help solve. We’ll likely touch on causal inference, predictive analytics, and how firms can organize to be more data centric. Check it out and stay tuned for the upcoming pieces on experiments and the analysis of experimental data.

Source

Overcorrecting the overcorrected

Jeremy Siegel almost loses it here for good reason. The Federal Reserve Board called it transitory inflation when prices were skyrocketing last Fall. When the same Fed (except a couple members) now argues that inflation is not yet slowing at a reasonable pace despite the price contractions recent data shows, questions arise. Fed gives the impression that it is now overcorrecting what it overcorrected earlier by loosening the monetary policy too much and for too long.

In case after case, our data modeling and inference practices are tested against lags in data. Lack of a high predictive power, we resort to overcorrections. Overcorrecting is doing more than enough (vs. not doing enough) and sounds better than coming short. But then, the pendulum swings back a little harder.

What do we learn from such swings? Well, one rather obvious takeaway is to put more emphasis on correctly understanding and modeling lags in time series. Another one is to be content with coming short occasionally, especially when the cost of overcorrecting is much higher than the cost of coming short.

Source

Killing fish the right way using computer vision

The commercial refrigerator/shelf looking device in the picture is an ikejime machine. Ikejime is considered the fastest and most humane method of killing fish. The method also leads to the best taste of fish because fish are killed instantly before their bodies go into distress and produce lactic acid and ammonia into their muscles. Ikejime involves the insertion of a spike quickly and directly into the hindbrain, causing immediate brain death.

That is, if fishermen know where the hindbrain exactly is for each species, and can insert the spike quickly and precisely within minutes of catching a fish, and have time to do so repeatedly. Well, that’s what robots are for.

Shinkei Systems’ machine is a combination of hardware and an edge detection algorithm, the engine behind object recognition in convolutional neural networks. Challenges abound. The machine operates on a fishing boat that tilts around even at zero speed. Apparently, “even in the same species, even with the same contour, the brain can be in a different location” as well.

Working with fishermen in Maine, New Hampshire and Cape Cod, Shinkei Systems seems to have been accomplishing the task on fresh-caught fish at a rate of one every 10-15 seconds. Moving forward, accuracy should increase and time to complete the task should decrease, leading to further opportunities.

Source

BLOOM, the first truly open-science, open-access, and multilingual large language model

“We wanted to make sure people with proximity to the data, their country, the language they speak, had a hand in choosing what language came into the model’s training,” says Jernite.

BLOOM, the first truly open-science, open-access, and multilingual (46 languages) large language model with 176B parameters (slightly larger than GPT-3) will soon be released as a complete pretrained model. Behind the project is BigScience, a wide-scale collaboration of over 1,000 researchers.

The project is quite impressive overall, both for the extent of collaboration and outcome. It’s also an engineering delight to watch. The model has been trained using 384 A100 GPUs (with 80 GB of memory each) since March 11, 2022.

BigScience provides updates on training every day (having hit its initial target earlier than planned, the model is currently being trained for “a few more days”). See the links in the comments to follow the updates and download the model. The full model will be released on HuggingFace (also a partner of the project).

This is a significant step forward for at least two reasons: the way the training data was collected and the core values behind the initiative. BigScience seems to have prioritized data quality by hand crafting the training data. In a world of models that favor kitchen sink approaches (because they can!), this is a progress. More obviously, BLOOM paves the way for a true democratization by removing the strings that have been attached to the use of such models by OpenScience, Google, and Facebook (apply for API access, accredited researcher only etc.).

Source

Ordinary lasso vs. fancy lasso

While attending the Symposium on Data Science & Statistics to present our study in Improving Algorithms for Big Data session two weeks ago, I learned about useful new methods (and met the great people behind them).

One of my favorites is Sparsity Ranked Lasso (SRL) by Ryan Peterson. The paper mainly focuses on lasso but the idea is also extended to other regularization approaches such as elastic net.

Takeaway: Use SRL over ordinary lasso especially if your model has interaction terms and polynomials. On average, SRL’s predictive performance is better than lasso in the 112 datasets from the Penn Machine Learning Benchmark database. Ryan goes on to show also that SRL overperforms a Random Forest (RF) in a case study both in accuracy and efficiency. Even if SRL performs on par with a RF, why not use SRL as it is both interpretable and explainable!

The part I loved about SRL is the simple yet important challenge, which the authors call “covariate equipoise”: the prior belief that all covariates are equally likely to enter into a model. Basically, a model’s simplicity is usually defined by its parsimony: the number of parameters. This is no matter whether a parameter is an interaction (or a polynomial form) of the other terms in the model. This is problematic for obvious reasons and SRL solves it by treating covariate groups differently based on their type.

And yes, there is a package for that: sparseR. Link to the R package and nicely written paper are in the comments.

R package – Paper

Where have all the Uber drivers gone?

A seemingly persistent effect of the pandemic on Uber is a 50% decrease in the mobility’s share of revenue (a decrease from a 80% share to less than a 40% share of rides in total revenue). Based on revenue, Uber is now more a delivery company than a mobility company.

This is data centricity extrapolated: a shift from carrying people to carrying objects while solving pretty much the same data-driven optimization problem. The article is from last year but the effect persists as of Q1 2022: Only 37% of the revenue is from carrying people.

Source

Google Imagen

Now that object detection is almost a solved problem, work on the next frontier, text-to-image generation, began to thrive. Google Research’s most recent work on generative models, Imagen, uses text embeddings from a large language model called T5 (similar to GPT3 and OPT175B) to encode text for image synthesis.

Interestingly, the study finds that increasing the size of the language model improves performance more than increasing the size of the image diffusion model. Imagen achieves exceptional similarity between real and synthetic images (measured by the distance metric FID, Imagen achieves a score of 7.27 on the COCO dataset). Human raters confirm the performance of the model.

The paper is nicely written with a much-needed ethics discussion at the end, and full of colorful images. Apparently, Imagen does not perform as well when generating images that portray humans.

Synthetic data generation and image restoration are two common use cases of GANs. I will post a link to one such study on medical images in the comments. Arts and crafts is obvious. I can also think of use cases for fashion and potentially personalization of products in retail. What are some other business use cases?

Source

How does the brain learn mental models?

Interesting read and perspective on modeling the learning in hippocampus and potentially applying the model structure to the design and development of algorithms. Clone-structured cognitive graph (CSCG) uses Markov chains and dynamic Markov compression. So CSCGs form a probabilistic sequence model.

Source

NeuralProphet puts Facebook’s Prophet on steroids using neural networks

Models remain interpretable to the extent that the components of the original function are retained. The authors claim 55% to 92% improvement in accuracy in short to medium-term forecasts, which is impressive if generalizable. Model training time increases 4-fold but prediction time improves 14-fold. Developed on PyTorch so it can be parallelized and deployed on GPUs, potentially to reduce training time. Ported to R but using a Python environment.

Looks promising especially for “AI on the edge” type mobile applications.

Source

Open Pretrained Transformer

Meta AI’s release of Open Pretrained Transformer (OPT-175B), which is on par with OpenAI’s GPT-3 at 175 billion parameters/weights, emphasizes responsible compute and claims one-seventh the computational cost in terms of carbon footprint. Pretrained model weights are free to download (link in the comments). This is good news for open collaboration and better news for the environment.

Source

When reverse causation is more profitable

You may have heard of ESG (Environmental, Social, and Governance) investing. It’s also called “socially responsible investing” when ethics is added to the picture. Public companies are assigned an ESG score, which is a quantification of the social impact. What social impact though? You would probably expect ESG ratings to quantify the societal impact of (not on) a company, right? Well, you’ll be disappointed. “Socially responsible investing” is a misnomer when associated with the ESG ratings, at least those reported by MSCI, a leading provider of the ESG ratings globally.

MSCI basically quantifies the impact of environmental, social, and governance risks on a company’s operations (not the other way around!). In other words, if we rely on ESG ratings while making investment decisions, we may not be doing any social good. We are essentially ensuring that our investments are protected from the environmental, social, and other risks such as climate change. After all, why would we care about the carbon footprint of our investments on the environment as long as profits are good?

MSCI’s plot offers some takeaways on how to generate data and model it. Apparently, measuring reverse causation and packaging it to look like the cause and effect are in the right place can be quite profitable. To be fair, MSCI is explicit about its data generation and modeling process residing in the darkside.

Source