Follow Datanami:
January 25, 2021

MIT Researchers Tackle Time Series Anomalies with Generative Adversarial Networks

Time series data pervades analytics, but making it useful means understanding where and when it diverges from expectations: did those higher sales in November really represent a meaningful fluctuation – and if so, what led to that fluctuation? Answering questions like those is easier said than done, but researchers from MIT and Rey Juan Carlos University in Madrid are working to reduce those barriers with a new anomaly detection approach.

“Time series anomalies can offer information relevant to critical situations facing various fields, from finance and aerospace to the IT, security, and medical domains,” the researchers wrote in their paper, which is available to read on arXiv. “However, detecting anomalies in time series data is particularly challenging due to the vague definition of anomalies and said data’s frequent lack of labels and highly complex temporal correlations.”

In the paper, the researchers propose TadGAN: an “unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)” that “reconstruct[s] and assess[es] errors in a contextual manner to identify anomalies.” The researchers set out to answer a core question: do new, complex approaches like deep learning actually perform better than baseline statistical methods?

To test a baseline approach against deep learning approaches – and TadGAN against all of the above – the researchers evaluated eight methods’ results across 11 time series datasets ranging from spacecraft telemetry data to Twitter data. The methods, meanwhile, included the baseline ARIMA model, six deep learning methods (HTM, LSTM, LSTM AutoEncoder, MAD-GAN, Microsoft Azure Anomaly Detector, and Amazon DeepAR), and TadGAN itself.

Relative to the ARIMA baseline, only one of the preexisting deep learning models showed an improvement in performance: LSTM, with a 4.1% increase in accuracy. Some showed dramatic disadvantages in accuracy, like the Microsoft Azure Anomaly Detector (-63.4%). 

Percent change in average F-scores across the models relative to the baseline ARIMA model. Image courtesy of the researchers.

TadGAN, however, did not suffer the same fate. Instead, it showed a strong 15.3% average increase in accuracy relative to ARIMA. “TadGAN outperformed all the baseline methods by having the highest averaged F1 score across all the datasets, and showed superior performance over baseline methods in 6 out of 11 datasets,” the researchers wrote, adding that they were able to reduce the number of false positives and increase the number of true positives.

The code for TadGAN is open-source and now available for benchmarking time series datasets for anomaly detection. The paper, titled “TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks,” was written by Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. It is available to read here.