Looking at the numbers in COVID19
Like many of you, my focus during this crisis has been less on analytics and more about family, friends, etc. which on a more positive note seems to gain greater emphasis as we reassess our priorities. But the bombardment of news regarding this crisis certainly focusses on numbers in terms of providing a perspective of when this crisis might end. The discussion revolves around the so-called notion of “flattening the curve” and essentially looks at two key metrics:
- Number of positive COVID19 cases
- Number of Deaths
Given my other priorities, I have not really paid attention to these numbers being prominently displayed by the media. But in discussion with a very close colleague and friend, it became apparent that the discipline of analytics is not being adhered to based on the virus numbers which are being displayed to the public. Putting on my analytics hat thanks to my very close friend and colleague, I felt compelled to write this brief note.
In any analytics exercise, the first principle is how the source data is being collected. In the world of digital and direct marketing, the use of analytics has been used for decades in targeting the right consumer with the right product at the right time. One of the primary strategies by digital and direct marketers is the treatment of new customers vs. 1 yr+ tenure customers. Different targeting strategies are adopted for both these groups as the source data of behavioural and demographic characteristics are indeed very different between these two groups. Yet, does the government apply this same rationale in understanding the source data that is being reported particularly for cases reported. These numbers are highlighted as potential signs as to whether we are headed in the right direction of mitigating the effects of this pandemic. But is this somewhat flawed?
As a long-time analytics practitioner, let’s explore the metric of cases reported. The first and foremost question is whether this number is random. In other words, did I randomly test a representative sample of the population every day. We know that this is not the case. Testing is done in most cases on a as you need basis. If there appears to be a hotspot in terms of cases, more people are tested in that hotspot. i.e. nursing homes, cruise ships, hospitals, etc. Another flaw in this approach is that the people who are often tested are often those who most severely exhibit the symptoms of the virus. There are many individuals who may catch the virus without exhibiting severe symptoms and who are never tested. What this implies is there will be different biases in the underlying source data depending on what, when, and where these cases were reported.
As one can surmise, attempting to draw insights and conclusions from cases reported can be very misleading. No digital or direct marketers steeped in the discipline of analytics and measurement would ever draw conclusions given the bias of this source data. Instead, the metric of deaths is a more meaningful metric as that number is not influenced by any external behaviour such as the above-mentioned bias of reported cases. Trend analysis of deaths rather than cases reported is a more meaningful number in measuring the impact of our efforts in the fight against COVID19.
Certainly, I understand the desire to report on cases reported as it is a harbinger for resource requirements within our existing health infrastructure. But a flawed number is a flawed number. A better way albeit very directional is to extrapolate heath resources consumed versus deaths. Can we arrive at a loosely linear type relationship between deaths and health resources consumed? However, rather than look at the static number of deaths vs. health resources consumed, a better measure might be to look at the rate of increase/decrease of deaths on a daily level and use that change as a predictor in estimating health resources consumed. These are just some thoughts from one practitioner recognizing that there is no tried and true answer since there is no real precedent here excepts perhaps to a very mild extent, the SARS virus of 2002-2004. But looking at cases reported is deeply flawed when used in any decision-making. Much sounder decision making is enacted using reported deaths and more importantly, the change in deaths from day to day.