New Algorithm Tackles Everything from Insects to Earthquakes
At first glance, insects and earthquakes might not seem like they have a lot in common – but to researchers at the University of California, Riverside, they do. A pair of new algorithms tackle massive datasets faster and more efficiently, spotting patterns and allowing researchers to quickly gain insights into a myriad of problems.
“It is difficult to overemphasize how scalable this algorithm is,” said Eamonn Keogh, co-author of the research and professor of computer science at UC Riverside. “To demonstrate this, we did one quintillion—that’s 1 followed by 18 zeros—pairwise comparisons of snippets of earthquake data. Nothing else in the literature comes within one-tenth of a percent of that size.” The researchers’ earthquake analysis of the San Andreas fault uncovered a series of quiet, low-frequency earthquakes that might have been missed by lower-resolution analyses.
This scalability is crucial; the scale of the data collected by continuously recording sensors means that spotting patterns is often difficult, and conducting analytics quickly becomes cost-intensive with such large datasets. The new algorithm, which is called “SCAMP,” was built by Zachary Zimmerman (a doctoral computer science student at UC Riverside) using a preexisting algorithm by Keogh as a foundation.
“The most fundamental problem in seismology is identifying earthquakes at all. There have been a number of methodological improvements by seismologists applying strategies from computer science to look for similar patterns,” said Gareth Funning, one of the co-authors and an associate professor of seismology at UC Riverside. “The big advance here is that the dataset you can manage is way, way bigger.”
This flexibility with dataset scales also proves useful in other research fields. The researchers have already applied it insect motion-detecting sensors in order to analyze the Asian citrus psyllid, a pest that has been plaguing citrus crops. They also attached accelerometers to chickens, using SCAMP to analyze their feeding behavior.
Of course, SCAMP isn’t perfect. “SCAMP requires you to have the entire time series before you search. In cases of mining historic seismology data, we have that. Or in a scientific study, we can run the chicken around for 10 hours and analyze the data after the fact,” said Brisk. “But with data streaming right off the sensor, we don’t want to wait 10 hours. We want to be able to say something is happening now.”
Zimmerman also leveraged SCAMP’s earthquake results to train a new algorithm, “LAMP,” which helps to identify the most relevant data as it streams from the sensors. “You can do all your checks in real time because you’re just looking through the important bits,” Zimmerman said.
“A setup like this could potentially do a lot of that discrimination work before it’s transmitted to the system,” said Funning. “You could shave time off the computation required to determine that a damaging event is in progress, buying people a couple extra seconds to drop, cover, and hold on.”
About the research
The paper on SCAMP, “Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond,” was written by Zachary Zimmerman, Kaveh Kamgar, Nader Shakibay Senobari, Brian Crites, Gareth Funning, Philip Brisk and Eamonn Keogh. It was published in the Proceedings of the ACM Symposium on Cloud Computing.