During the past two years, we’ve written about a range of machine learning applications in medicine. Machine learning’ analyzes vast quantities of data rapidly in search of predictive relationships. Machine learning becomes more effective as it receives more data, and is used for diagnostics, treatment plan selection, and aftercare monitoring. Using machine learning to analyze subjective content is trickier than comparing images of marks on the skin, however. We noted a study by Carnegie Mellon’s Center for Cognitive Brain Imaging that detected neural representations of suicidal ideation in adolescents plus work at Vanderbilt University in predicting suicide from large quantities of health records, but success in both of those uses started with quantifiable data records.

Scientists at Emory University and Harvard recently published their findings on using machine learning technology to predict emergent psychosis in the journal npj Schizophrenia. The researchers figured out how to quantify the semantic richness of patients’ language. That step enabled them to identify people who scored low on that measure; low semantic richness is an established indication of psychosis. The second discovery in the joint study was that people at higher risk of psychosis use vocabulary related to sound more often than average. According to Phillip Wolf, the senior author of the study and a professor of psychology at Emery, the researchers developed a machine-learning method for measuring semantic density but were surprised about the words associated with sounds.

The scientists trained the machine learning program with conversations of 30,000 Reddit users. When they had a sufficient baseline, the researchers used the application with earlier diagnostic interviews with forty patients by trained clinicians. The result of the study found more than 90% accuracy in predicting whether the patients developed psychoses.

Treating psychosis is difficult once it is established. Early identification of people at high risk of developing psychosis or other severe mental health disease improves the possibility of intervention to delay or even halt the progress of the disease. Further work in detecting semantic indicators and vocabulary usage patterns in conversation has promise for helping identify predictive signs for developing problems in a range of diseases and conditions where you can’t rely on objective measures such as blood tests or genetic sequencing.