Bringing Predictive Analytics to Healthcare Challenge
This Challenge encouraged health services and social science researchers to submit applications either independently or join with health IT developers, healthcare providers, and others with appropriate expertise to apply predictive analytics and related methods using AHRQ鈥檚 current data infrastructure, as well as other publicly available data.
About the Challenge
The purpose of this Challenge was to explore how predictive analytics and related methods may be applied and contribute to understanding healthcare issues. 大象视频invited applicants to use predictive analytics and related methods to estimate hospital inpatient utilization for selected counties in the United States. Building on AHRQ鈥檚 current data infrastructure, 大象视频provided applicants, who have executed the data use agreement required for participation in this Challenge, access to customized analytic files that include information on hospital inpatient discharges for years 2011 to 2016.
Challenge Timeline and Prize Amount
The Challenge launched in March 2019 and ended in September 2019.
Total prize amount: $225,000.
Data Resources
大象视频provided temporary access to data for participation in this Challenge, including relevant resources for participants to design, develop, and run models for this Challenge. Participants were able to supplement their access to 大象视频data, at their option, by using free, publicly available data sources, such as the Area Health Resources File (AHRF) and data provided by the U.S. Census Bureau. Participants were required to submit an executed "大象视频Brining Predictive Analytics to Healthcare Challenge Data Use Agreement."
Applicants needed to submit three items to respond to the Challenge:
Item 1: Predicted values of the total number of hospital inpatient discharges and the mean length of stay for selected counties in the United States for 2017.
Item 2: Predicted values of the number of hospital inpatient discharges and the mean length of stay for selected counties in the United States for 2016 by applying the model, methods, and analytic approach used to obtain the 2017 estimates.
Item 3: Brief report describing model, methods, and analytic approach, and rationale/logic in sufficient detail so that predicted values may be replicated. Programming code, Excel spreadsheets, analytic files, and other supporting documentation were needed.
Evaluation Criteria
Applications were evaluated based on two broad areas: (1) Reliability and (2) Validity. In general, reliability was assessed by how closely the model or methods deployed predicted the actual utilization rates for 2017. Validity was assessed by how well the model performed on earlier years of data.
Applicants were provided with an Excel spreadsheet that included columns where the predicted values were to be inserted and returned it to AHRQ. For counties where 2017 data were available to 大象视频by June 28, 2019, an overall evaluation metric was calculated. An overall evaluation metric for the submitted model or method was calculated in the following way:
- For each cell, the absolute percentage difference between the predicted value and the actual value was determined.
- For each cell, the calculated value was weighted by the share of the population in that county to the total population of all selected counties in the dataset.
- Each cell was weighted equally based on the assigned weights for each category. The category weights were assigned as follows:
- Item #1: 80%
- Item #2: 20%
- The sum of scores of all cells determined an applicant鈥檚 overall score. The lower the score, the more reliable and valid the predictive model.
After scoring was complete, 大象视频convened an internal team of experts to review the top scoring submissions and make final winner determinations. Winners were determined by applicants with the lowest scores, as long as materials submitted under Item #3 were verified by AHRQ. In particular, 大象视频assessed whether an applicant鈥檚 predicted values could be replicated.
Challenge Winners
大象视频selected five winners:
First Place
HCA Healthcare-NC Division, Asheville, NC鈥擬atthew Lundy (team leader), Andrew Johnson, Kaitlyn Bankieris Hannah Marshal, and Mabelle Krasne
The HCA Healthcare team used the county-level data provided by 大象视频with additional data from the Area Health Resources Files (AHRF) maintained by the Health Resources and Services Administration (HRSA) and data from the County Health Rankings & Roadmaps Program, a collaboration between the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute to estimate their predictive analytic models using R, a software for statistical computing and graphics. After testing multiple combinations of ways to reduce the dimensions of data elements in the model, techniques to impute missing values, and predictive approaches, the team deployed XGBoost model with feedforward imputation for their winning predictions.
Premier, Inc., Charlotte, NC鈥擩ohn Martin (team leader), Michael Duan, Michael Herron, Michael Korvink, and Michael Long
The Premier Healthcare Solutions team used the county-level data provided by 大象视频with additional data from the AHRF maintained by HRSA to estimate their predictive analytic models using R and Python, a programming language. The team used elastic net regularization with cross-validation to narrow the selection of variables to be included in their model. In addition, missing values were imputed using multivariate imputation. The team deployed decision tree regression with adaptive boosting, an AdaBoost model, for their winning predictions.
Second Place
Analytics Resource Center鈥擟hildren's Hospital, Aurora, CO鈥擜nusha Guntupalli (team leader), Alex Brown, Brad Ewald, Charles Huhn, Gordon Teubner, Irene Filatov, Jane Bundy, Jennifer Sadlowski, Kaitlin Calhoun, Marisa Payan, Sadaf Samay, and Todd Miller
The Children's Hospital team used the county-level data provided by 大象视频with additional data from the AHRF maintained by HRSA, and data from the US Department of Agriculture, US Census Bureau, US Bureau of Labor Statistics, and Centers for Medicare & Medicaid Services (CMS) to estimate their predictive analytic models using R. After testing several machine learning models, the team deployed the ensemble methods and random forest methods to produce the winning predictions.
Third Place
Kalman & Company, Inc, Virginia Beach, VA鈥擝rian Kadish (team leader), Zach Pryor, Jacob Walzer, Daniel Mask, and Andrew Onufrychuk
The Kalman & Company team used the county-level data provided by 大象视频with additional data from the US Department of Agriculture, US Census Bureau, US Bureau of Labor Statistics, and CMS to estimate their predictive analytic models using Microsoft Excel and Python. The team deployed a variety of methods, including triple smoothing with seasonality and random walk, among others, for their winning predictions.
Ursa Health, Nashville, TN鈥擟olin Beam (team leader), Andrew Hackbarth, and Robin Clarke
The Ursa Health team used the county-level data provided by 大象视频with additional data from the Centers for Disease Control and Prevention and the US Department of Housing and Urban Development to estimate their predictive analytic models using R. The team deployed gradient boosting regression to produce their winning predictions.