Follow Datanami:
May 5, 2022

New ‘CALF’ Algorithm Identifies Data Patterns, Predicts Risk for Disease

(SOMKID THONGDEE/Shutterstock)

LASSO regression—short for “least absolute shrinkage and selection operator”—is considered a gold standard for identifying patterns in data. Now, researchers from UNC-Chapel Hill’s Renaissance Computing Institute (RENCI) say their new technique—called “CALF”—may surpass LASSO, setting a new standard and paving new roads in disease risk assessment.

The speed of data collection is outpacing the speed and depth of data analysis in many fields, and disease risk assessment is no exception. Researchers can easily produce mountains of data from biological samples, but identifying the relevant markers for disease and interpreting their meaning is another story entirely. LASSO works to solve these sorts of problems through advanced regression, weighting variables to get as close as possible to a sum of one (disease presence) or zero (disease absence).

The researchers say that CALF is distinguished by its simplicity, using a fraction of the predictors compared to LASSO and taking a “greedy” approach where the algorithm accepts the immediate next-best predictor until the algorithm is fully optimized. The team tested the algorithm on five wide-ranging cases across psychiatric and neurological studies, finding it to consistently outperform LASSO.

“CALF outruns LASSO in the five examples we outlined in the paper,” said Clark Jeffries, a RENCI scientist and lead author of the paper. “The metric values using CALF are superior to those of LASSO when the researcher seeks a small number of collectively informative predictors—five chosen from hundreds, for example. Interrogating the biochemistry or other relationships among the five can then suggest causality.”

One of the test cases assessed DNA markers for Alzheimer’s, identifying promising leads in the data.

“Using data from the Alzheimer’s Disease Neuroimaging Initiative, CALF was able to determine a small set of DNA markers that are highly correlated with the age of onset of Alzheimer’s, indicating great potential for CALF as a simple and reliable research tool,” explained Darius Bost, a PhD student at UNC-Chapel Hill and graduate research assistant at RENCI.

CALF’s new abilities show promise for future studies, and even for reexamining data that might have previously been deemed inconclusive or only moderately helpful.

“It’s likely that there are existing data sets out there that failed to show more than a trend with routine analyses and could show classification significance with CALF,” said Diana Perkins, a psychiatrist at UNC-Chapel Hill. “This could be groundbreaking for the field of psychiatry in improving prediction of patients’ risk for psychosis and other mental illnesses, allowing earlier intervention and overall improved outcomes.”