On Data Science, Causal Models, & AI

Swimming in data but blindly

Data show that masks can slow down the spread. Getting our economy back on its feet depends on slowing down the spread. Yet, wearing masks is not mandated, not at the federal level, not decisively. We are swimming in data but blindly. In addition to the likely direct effect on the spread, behavioral change following such a mandate can potentially help regain consumer confidence, increase spending, and boost economy (or not, but an experiment worth pursuing given there is little to lose, if any).

Data centricity requires a shift in mindset, no matter whether it is policy making or business strategy making. Without this shift, decision makers may swim in a pool of charts and tables but can’t see.

From lock-in to “Trust us”

Uncategorized

What struck me in this opinion piece is the depiction of how multisided (e.g., two-sided) platforms evolve, in an animated GIF by Ryan Kuo. Platform owners feel the need to say “Trust us” at some point, long after contractual relationships are established.

Platform owners gain power and lock in participants (e.g., sellers, buyers, app developers, users) by accumulating network effects and creating switching costs*. More power leads to governance decisions that are increasingly one-sided (e.g., decisions on application approval, product listings, content sharing, or commission/fees). Conflict of interest arises quickly. Trust deteriorates.

Lack of trust can make data centric companies vulnerable to disruption in the long term, even if network effects offer a protection in the short term. One sure way not to gain trust is having to say “Trust us.”

*Cross-side network effects: The more sellers on a platform, the more value for buyers. More buyers join and more sellers follow. As a seller builds a profile full of five star reviews, switching becomes costly. Lock-in can also arise from same-side network effects. In a social platform, value for a user increases with more users.

Mistaken like a human

Uncategorized

Traditionally, computers process data quite differently from how human brains do so. Computers are designed for precision while human brains rely on intuition. With artificial intelligence (#AI), or more specifically, deep learning and neural networks, one idea is to mimic the way human brains work. Does this mean that the hardware, or the body also needs to change? Are CPUs and GPUs not up to the task anymore?

Graphcore.ai claims so, and argues that CPUs and even GPUs are out, and IPUs are in. Graphcore’s #IPU stands for intelligence processing units and is prone to imprecision by design. It is a high-performance computing unit that processes data very imprecisely.

Consider a task like going to a restaurant. A human brain wouldn’t calculate the GPS coordinates but use associations; e.g., recall the restaurant’s name, its neighborhood, and neighboring shops. The difference resembles one between Boolean logic and fuzzy logic, and is true.

What is under the hood, or hardware, matters. One component of achieving data centricity is building an infrastructure that fits the objective, and successful data centric companies know they need to invest in it.

Analyzing data to do nothing

Uncategorized

With an increasing availability of data and off-the-shelf analysis tools, interventions are thriving.

Interventions rarely create value. Rarity is expected simply because the probability of noise is often disproportionately higher. However, larger amounts of data exacerbate the problem of finding value in interventions while none exists. E.g., a frequentist test using a 0.01 p-value threshold would justify an intervention if the probability of an effect occurring by chance is less than 1%. This probability gets smaller with more data, not because the intervention gains value*. 1% should be a moving target, but it is often treated as a fixed one. It should be adjusted also for other reasons, such as running multiple tests.

More importantly, it should be adjusted for unintended consequences. While quantifying the consequences is difficult, we can incentivize analytics teams for finding out what not to do. Action is visible but inaction is not. Successful data centric companies should not mistake thoughtful inaction for idleness. On the contrary, they should encourage and reward it.

*Assuming the actual effect is not zero. Valid for most (if not all) problems outside natural sciences.

Has Apple become the -old- Microsoft?

Uncategorized

Why old? Well, it would be unfair to compare #Apple with today’s #Microsoft, the owner of #GitHub, a sponsor of Open Source Initiative and proponent of innovation through collaboration and co-creation (!). The exclamation will have to stay for a while.

The fight between #Apple and #Hey (hey.com, a contender to #Gmail) is not a surprise but a reminder that Apple is increasingly in the business of value capture, not creation. The gist of the story is, Apple forces Hey to sell subscriptions on its iOS platform but Hey refuses because the cost of doing so is a 30% commission for every subscriber. You can find the details in Kara Swisher’s article: nyti.ms/3ebfyvL

Apple seems to be stuck with incremental one-sided ideas, another iPhone with a larger screen or “dark mode” on its iOS platform, and have forgotten the value of co-creation, which propelled the company at the first place. Apple should be encouraging not oppressing experiments like Hey. For that, it is time for Apple to analyze its data from a fresh perspective that is not short-sighted on quarterly revenue, and rethink its model to embrace diversity again. That is what a successful data centric company would do.

Digital transformation and data centricity in fast food: The foray of McDonald’s

Uncategorized

McDonald’s CEO is out, but his data-focused legacy is likely to stay. McDonald’s has seemingly invested in digital transformation, including the digitization of operational processes such as in-store ordering through kiosks and home delivery. On the face of it, a shift from ordering at a counter to ordering at a kiosk is straightforward and replaces fast-food workers with giant tablets. Obvious gains include efficiency from increased order processing capacity, and, as the ex-CEO argues, increased spending on kiosks*. There is more to that, though. Kiosks can collect richer customer data and potentially create more customer data. Using the kiosks, McDonald’s can learn which products and extras customers scroll through before making a selection, whether they add an extra patty only to decide later not to keep it in their final order, etc. McDonald’s can also add data products to these kiosks such as a recommender system to create and collect more behavioral data, and guide customers into spending more and better. The company is planning to digitize its drive-through ordering as well, by replacing orders put through fast-food workers with a voice recognition software and possibly conversational AI (WSJ coverage). In addition, McDonald’s plans to make its drive-through menus dynamic, changing based on the weather, traffic, time of day, and potentially personal data (e.g., a drive-through customer’s licence plate number can be a nice unique identifier).

Not to mention the robot fryers and other supply-side digitization efforts, this is already quite a long list of data-focused investments for a fast food chain. A recent Bloomberg article featuring the ex-CEO, however, fails to communicate a coherent strategy behind these efforts, if there is one. The article lists some of the aforementioned investments and ends with the ex-CEO’s take on the changes:“But for all the technological breakthroughs, the deals, and the jousting with franchisees, the company’s guiding light has barely changed. Inside a room beyond a corridor stamped with the word “innovate” in block capital letters, the hum of computers and data processing towers is drowned out by a cacophony of test-kitchen staff running trials on secret processes that aim to shave seconds off a Big Mac’s assembly, much like in the old days, when McDonald’s first upended the food industry.”

“In old-school business logic, the big eats the small. In the modern day, the fast eats the slow,” says the ex-CEO. This tragic interpretation leads to doubt about whether these investments are governed by a coherent, focused strategy, or “the company’s guiding light has barely changed,” as the article suggests. If the latter is true, and if McDonald’s is racing just to do it first and do it fast as the ex-CEO says, it may be far from incorporating its data efforts into its strategy and achieving data centricity. This is because data centricity at its core requires a reformulation of problems rather than blindly solving the same problems, (and so does digital transformation). This is one of a series of posts on “data centricity,” a strategy framework in progress. Comments and feedback are welcome in any form.

*Not sure why spending would increase. It looks like the ex-CEO said “What we’re finding is when people dwell more, they select more.” I wonder whether McDonald’s did an experiment or ran a robust analysis on this problem. The article cites a study showing that hedonistic consumption increases when a touchscreen device is used compared to a desktop. While the kiosk question at McDonald’s requires a comparison between apples and oranges, the cited study compares oranges with grapefruits.

Moving up on the ladder of data centricity: From descriptive statistics to modeling

Uncategorized

Most organizations use data in one way or another. Some still collect the data using mechanical tally counters, but that, as well, is data. Simple counts of visits to different exhibits in a museum can provide the director and curator with valuable insights (or not, depending on how strong the thumb muscles of staff are). Descriptive analysis is probably the most common approach to data analysis across organizations. For example, year-over-year (YoY) comparisons, a favorite across industries, are essentially descriptive. A YoY comparison of revenue is not more informative than the data collected by tally counters in a museum: “More visitors entered Exhibit A this year compared to last year.” These descriptive findings are like figuring out today is hotter than yesterday. They can be useful, albeit to a limited extent, and can be misleading when the difference is falsely attributed to a factor, sometimes just because that is readily available even if it was not measured (e.g., the YoY revenue is down because the sales team is underperforming). Measuring relevant factors and including them in a statistical model is another step up on the ladder of data centricity, no matter how simple the model is.

Most of the finance industry compete on models and algorithms today. Even so, Renaissance Technologies, a pioneer of quantitative trading in hedge funds, manages $110 billion of assets with the help of a linear regression model:

“Nick Patterson, who spent a decade as a researcher at Renaissance, says, “One tool that Renaissance uses is linear regression, which a high school student could understand” (OK, a particularly smart high school student; it’s about finding the relationship between two variables). He adds: “It’s simple, but effective if you know how to avoid mistakes just waiting to be made.””
Source: Computer Models Won’t Beat the Stock Market Any Time Soon

Simplicity helps the interpretation, and is preferable if the desired outcome (e.g., high prediction accuracy) is still achieved. In most cases, however, value creation using data involves causal reasoning and inference, higher steps on the ladder. This is one of a series of posts on “data centricity,” a framework I have been developing. Comments and feedback are welcome in any form.

The rise of unobtrusive learning in the age of big data

Uncategorized

One advantage of living in the age of big data is a diminishing need to ask customers explicitly for feedback. A variety of methods for unobtrusive learning from customers have emerged thanks to digitalization (vs. digitization). For example, customers now write reviews every day about products and services without being asked to do so. The behavior of customers can be captured by tracking their website visits. Sensors are now so cheap that a retailer can put sensors all over the floor in its stores and track the physical movements of customers. Compared to the tools and technologies used in an Amazon Go store, such a data collection initiative can be considered a small step today. Motion sensors for store shelves, neural network-powered cameras, and wireless beacons can easily be added as complements. From a managerial perspective, the phenomenon is more than a shift from a push mindset to a pull mindset. Leveraging it fully requires careful planning and execution. This is probably why “data centric” companies are capturing more value from unobtrusive methods while most retailers still struggle to learn from the reviews on their own product pages. Capturing most of the value also requires a systematic effort, rather than ad hoc attempts, ideally starting from product development into the full product life cycle. For example, when launched in 2004, Yelp required asking friends for recommendations. Users could not write reviews without being explicitly asked for. Yelp switched to the current model four months after the launch, based on the data on how early users behaved at the site. This is a short intro to a series of posts on “data centricity,” a concept I have been developing. Comments and feedback are welcome in any form.