Moving up on the ladder of data centricity: From descriptive statistics to modeling

Most organizations use data in one way or another. Some still collect the data using mechanical tally counters, but that, as well, is data. Simple counts of visits to different exhibits in a museum can provide the director and curator with valuable insights (or not, depending on how strong the thumb muscles of staff are). Descriptive analysis is probably the most common approach to data analysis across organizations. For example, year-over-year (YoY) comparisons, a favorite across industries, are essentially descriptive. A YoY comparison of revenue is not more informative than the data collected by tally counters in a museum: “More visitors entered Exhibit A this year compared to last year.” These descriptive findings are like figuring out today is hotter than yesterday. They can be useful, albeit to a limited extent, and can be misleading when the difference is falsely attributed to a factor, sometimes just because that is readily available even if it was not measured (e.g., the YoY revenue is down because the sales team is underperforming). Measuring relevant factors and including them in a statistical model is another step up on the ladder of data centricity, no matter how simple the model is.

Most of the finance industry compete on models and algorithms today. Even so, Renaissance Technologies, a pioneer of quantitative trading in hedge funds, manages $110 billion of assets with the help of a linear regression model:

“Nick Patterson, who spent a decade as a researcher at Renaissance, says, “One tool that Renaissance uses is linear regression, which a high school student could understand” (OK, a particularly smart high school student; it’s about finding the relationship between two variables). He adds: “It’s simple, but effective if you know how to avoid mistakes just waiting to be made.””
Source: Computer Models Won’t Beat the Stock Market Any Time Soon

Simplicity helps the interpretation, and is preferable if the desired outcome (e.g., high prediction accuracy) is still achieved. In most cases, however, value creation using data involves causal reasoning and inference, higher steps on the ladder. This is one of a series of posts on “data centricity,” a framework I have been developing. Comments and feedback are welcome in any form.

Leave a Reply

Your email address will not be published. Required fields are marked *