Agencies, universities, and private companies increasingly turn to machine learning to speed up and improve the predictive accuracy of healthcare technology testing. A group of more than 30 top hospitals and research institutions in Europe and the U.S. formed the OWKIN Loop Network, a shared resource that helps train predictive machine learning models with massive datasets. In February Global Market Insights predicted 10-fold growth by 2025 in healthcare AI implementations, particularly in pharmaceutical development and image analysis.

MIT researchers affiliated with the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Laboratory for Financial Engineering (LFE) published their study of machine learning with statistical imputation used for predicting drug approvals in the inaugural issue of Harvard Data Science Review – an open-access data science platform.

The MIT scientists used imputation models to fill in missing data in drug-development and clinical-trial data from 2003 to 2015. They included thousands of drug-indication pairs with more than 140 features in 15 disease groups. By imputing the missing values based on observed data in other fields, they constructed the largest dataset of its kind.

The results of the MIT study showed machine learning’s increasing success in predicting drug approvals in rolling 5-year windows. The researchers attribute better data quality and greater quantities of data for the improvements. The most significant success predictors were trial outcomes, trial status, trial accrual rates, duration, prior approval for another indication, and sponsor track records. According to the researchers, their model had a higher prediction success rate than conventional analysis for moving drug development forward to approval.

The MIT study established a methodology for evaluating machine learning’s predictive success with drug development and testing. According to the authors, drug development productivity as measured by the ratio of drug approvals to research and development spending has decreased over the past 50 years as biomedical innovation has become more complex, riskier, and more expensive.

If machine learning can accurately forecast the results of clinical trials and reduce the uncertainties of continued development, productivity should increase. In an ideal world this could mean knowing when to continue or halt development on specific drugs; the improved productivity would encourage pharmaceutical developers to increase their investment. The societal benefit would be new and better drugs to treat the full spectrum of diseases and conditions.