amudu: What Model Srilanka going to use to combat COVID19

Friday, March 20, 2020

What Model Srilanka going to use to combat COVID19

The current behavior is that if a country crosses the 100 mark on COVID- 19, the diffusion of cases to reach the 1,000 mark takes five to seven days as per behavioral studies done. This calculation can vary on the culture of the country and density of the population.

China took four days to cross the 1,000 mark after reaching 100, Italy six days, Iran five days, Spain seven days, South Korea six days, Germany and US eight days, which is why the medical fraternity was pressurizing the Government to go for a lockdown so that we keep the numbers below 100. Especially given the ‘family-oriented’ culture that exists in Sri Lanka and the ageing population, the ramifications can be heavy.

However, the President was strongly of the view that location specific curfews and ‘identify and isolate’ is the strategy that will work in Sri Lanka. He went on to state that with this strategy, if implemented with the support of the general public, the COVID-19 curve can be flattened
The Financial Times

This repository contains the source code, models, and example usage of the COVID-19 Vulnerability Index (CV19 Index). The CV19 Index is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID-19 (commonly referred to as “The Coronavirus”). The CV19 Index is intended to help hospitals, federal / state / local public health agencies and other healthcare organizations in their work to identify, plan for, respond to, and reduce the impact of COVID-19 in their communities.

Versions of the CV19 Index

There are 3 different versions of the CV19 Index. Each is a different predictive model for the CV19 Index. The models represent different tradeoffs between ease of implementation and overall accuracy. A full description of the creation of these models is available in the accompanying paper, "Building a COVID-19 Vulnerability Index" (http://cv19index.com).

The 3 models are:

Simple Linear - A simple linear logistic regression model that uses only 14 variables. An implementation of this model is included in this package. This model had a 0.731 ROC AUC on our test set.

Open Source ML - An XGBoost model, packaged with this repository, that uses Age, Gender, and 500+ features defined from the CCSR categorization of diagnosis codes. This model had a 0.810 ROC AUC on our test set.

Free Full - An XGBoost model that fully utilizes all the data available in Medicare claims, along with geographically linked public and Social Determinants of Health data. This model provides the highest accuracy of the 3 CV19 Indexes but requires additional linked data and transformations that preclude a straightforward open-source implementation. ClosedLoop is making a free, hosted version of this model available to healthcare organizations. For more information, see http://cv19index.com.

Step 1 — Making a Labeled Data Set

Data on COVID-19 hospitalizations does not yet exist. While data begins to emerge, we can look at the affected populations and events that serve as proxies for the real event. Given that the disease’s worst outcomes are concentrated on the elderly, we can focus on medicare billing data. Instead of predicting COVID-19 hospitalizations, we can instead predict proxy medical events, specifically hospitalizations due to respiratory infections. Examples include Pneumonia, Influenza, and Acute bronchitis. We identify these labels by parsing medical billing data and searching for specific ICD-10 codes that describe these types of events. All predictions are made on a specific day. From a particular day, we look back in time 15 months for features. We exclude any events happening within three months of the prediction date, due to the lag in medical claims data reporting. Any diagnoses within the last year become the features we use in all of our models.

Step 2 — Models

There are hosts of model considerations that need to be made with these kinds of projects. Ultimately, we wanted these models to balance being as effective as possible, and still accessible to healthcare data scientists as quickly as possible. One of the reasons for choosing the data that we used is because medicare claims data is widely available to healthcare data scientists. If your organization has access to additional data sources, you may observe performance increases by incorporating such information. Balancing those considerations led us to create 3 models based on the ease of adoption and model effectiveness.

The first, is a logistic regression model using a small number of features. At ClosedLoop, we use the standard Python data science stack. The motivation for a very simple model is that it can be ported to environments like R or SAS without having to read or write a line of python. At low alert rates, the model performs close to parity with the more sophisticated versions of the model. The aforementioned white paper has all of the weights for the limited feature set, so it can be ported over by hand.

We evaluate the model using a full train/test split. The models are tested on 369,865 individuals. We express model performance using the standard ROC curves, as well as the following metrics: