{"id":693,"date":"2024-09-30T10:08:35","date_gmt":"2024-09-30T14:08:35","guid":{"rendered":"https:\/\/ozer.gt\/log\/?p=693"},"modified":"2024-09-30T10:08:35","modified_gmt":"2024-09-30T14:08:35","slug":"good-data-science-bad-data-science","status":"publish","type":"post","link":"https:\/\/ozer.gt\/log\/2024\/09\/30\/good-data-science-bad-data-science\/","title":{"rendered":"Good data science, bad data science"},"content":{"rendered":"<p><strong>&#8230;and why the difference matters.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-698\" src=\"https:\/\/ozer.gt\/log\/wp-content\/uploads\/2024\/09\/flawed_data_2x-1024x428.png\" alt=\"\" width=\"720\" height=\"301\" srcset=\"https:\/\/ozer.gt\/log\/wp-content\/uploads\/2024\/09\/flawed_data_2x-1024x428.png 1024w, https:\/\/ozer.gt\/log\/wp-content\/uploads\/2024\/09\/flawed_data_2x-300x125.png 300w, https:\/\/ozer.gt\/log\/wp-content\/uploads\/2024\/09\/flawed_data_2x-768x321.png 768w, https:\/\/ozer.gt\/log\/wp-content\/uploads\/2024\/09\/flawed_data_2x.png 1159w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<p>We can call data science the practice of making (high-quality) decisions using data.<\/p>\n<p>The order is (1) decision making (2) using data, not (1) decision driven (2) data. So, ideally, it&#8217;s not stirring the data pile for evidence to support a decision.<\/p>\n<p>That&#8217;s a good place to start. We also need to:<\/p>\n<ol>\n<li>Make the business case really well in advance. Bringing in a half-baked problem or asking the wrong question won&#8217;t lead to the best insights.<\/li>\n<li>Understand what the models can and cannot do. We certainly need more of this in the LLM land. A Gen AI project is cool, but is it what the problem needs?<\/li>\n<li>Stick to the data. Data is real. Models add assumptions. Whether it&#8217;s experimental or observational, predictive or causal, the data must rule.<\/li>\n<li>Divide, focus, and conquer. Prioritize the most important needs. You can measure and track all metrics, but that&#8217;s probably not what you really need.<\/li>\n<li>Align the problem and available data with the assumptions embedded in the modeling solution. Testing the assumptions is the only way to know what&#8217;s real and what&#8217;s not.<\/li>\n<li>Choose the better solution over the faster one, and the simple solution over the complicated one for long-term value creation. This needs no explanation.<\/li>\n<\/ol>\n<p>Some rules of good (vs. bad) data science, based on insights from projects I&#8217;ve been involved with in one way or another. #3 and #5 are most closely related to a framework we are working on: <a href=\"https:\/\/datacentricity.org\">data centricity<\/a>.<\/p>\n<p>Image courtesy of <a href=\"https:\/\/xkcd.com\/2494\/\">xkcd.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8230;and why the difference matters. We can call data science the practice of making (high-quality) decisions using data. The order is (1) decision making (2) using data, not (1) decision driven (2) data. So, ideally, it&#8217;s not stirring the data pile for evidence to support a decision. That&#8217;s a good place to start. We also [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"cybocfi_hide_featured_image":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-693","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/693","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/comments?post=693"}],"version-history":[{"count":78,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/693\/revisions"}],"predecessor-version":[{"id":777,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/posts\/693\/revisions\/777"}],"wp:attachment":[{"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/media?parent=693"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/categories?post=693"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ozer.gt\/log\/wp-json\/wp\/v2\/tags?post=693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}