Amid COVID-19 Cybercrime Spike, ML Becomes Indispensable
Cybercriminals are exploiting the coronavirus pandemic to significantly expand their malicious activities against individuals and organizations. Amid the deluge of Trojans, ransomware, and phishing attacks, cybersecurity professionals are turning to machine learning to stay on top of the situation.
COVID-19 ushered in an unprecedented shift in how hundreds of millions of people interact with the world. In many cases, employees have become almost entirely reliant on the Internet for work, while schools have shifted to an online-only footing to educate children.
Cybercriminals, ever the opportunists, are taking advantage this surge of activity outside of corporate networks (not to mention fear of COVID-19 itself). Data collected by security firms shows an unprecedented increase in malicious activity around the world.
For example, the security company RiskIQ detected the creation of 317,000 suspicious COVID-19 websites in the two weeks in March following the lockdown order. Google is blocking 18 million COVID-19 scam emails daily, while the FBI says cybercrime reports have increased 400% since the pandemic started, with a 350% bump in phishing attacks.
AT&T Cybersecurity detected a 2,000% increase in cyberattacks from February to March, according to Jaime Blasco, the vice president and chief scientist for the Alien Labs group within AT&T Cybersecurity. “We have a seen a huge increase in threat actors exploiting the COVID-19 situation,” Blasco says.
Some of the most common COVID-19 attacks involve phishing attempts, where cybercriminals will try to trick a victim into sharing personal information under the pretext of helping them get answers to COVID-19 questions. Other threat vectors involve directing unsuspecting users to maliciously crafted websites that will infect the persons’ device with malware.
Alien Labs, which was formed following AT&T’s acquisition of Alien Vault in 2018, provides threat intelligence to a software platform in AT&T’s Security Operations Centers that helps AT&T customers detect malicious threats on their networks. But staying on top of all the malicious activity is no simple matter. According to Blasco, up to 296 petabytes of data flows through the AT&T network per day, and out of that traffic, Alien Labs has detected upwards of 100 billion attempts for bad actors to probe customers systems.
It’s the job of Blasco and his team of 15 researchers analyze 20 million threat indicators per day, and which it isolates about 400,000 malicious URLs and 370,000 pieces of malware per day. That malicious activity is used to create a signature, which are then pushed out to the MTDR offering from AT&T Cybersecurity.
“It’s about how you can find the needle in the haystack from all that data that is in your environment, and all the security information that you are collecting,” Blasco says. “We have systems that are tracking infrastructure on the Internet, new IP addresses, new services that are created, and we bring all that information together and apply them to machine learning models, both supervised and unsupervised, to understand all the activity that’s happening on the Internet and classify infrastructure as malicious when we find it.”
For example, to evade detection, cybercriminals will continually change the websites they’re using to run their phishing campaigns (which is why you should always verify that a URL is legitimate before clicking on a link). Alien Labs use a machine learning model that combines many aspects of a domain (who it’s associated with, where the SSL certificate is registered to, etc.) to determine the likelihood that a new domain is legitimate or a malicious one.
“We create more than 100 features from a domain name and then we train models to… automatically label those as malicious” based on previous malicious activity that we have seen, he says. “When we see a domain we haven’t seen, we can classify the probability that that domain is going to be used for malicious purpose.”
Alien Labs uses a range of data science and machine learning tools and techniques to tackle the domain registration problem and other cybersecurity challenges. Blasco’s team prefers XGBoost and TensorFlow for creating machine learning models, often in a Jupyter data science notebook, and uses its own containerized microservices framework to host them on AWS. It also uses natural language processing (NLP) to read and “understand” the latest literature published on the topic of security threats.
Getting security data into a usable format continues to be a struggle, as the industry has yet to agree on a standard format. When it comes to data storage, Blasco’s group uses a range of databases to store data about malware, its features, and the signatures they create for the MTDR product, including MySQL, MongoDB, Elasticsearch, and Snowflake, while AWS S3 serves as a data lake for storing raw, unstructured data.
Given the data volumes involved, automation is a critical component for modern cybersecurity professional, according to Blasco, and machine learning is core to that automation. “If we were to use human analysts, it would take millions of them to really keep track of all these threats, so we have a lot of automation in terms of executing malware, understanding what the malware does, generating models, and training models that can detect further modifications of those malware.”
However, that doesn’t mean that machine learning alone is a panacea to the security problem. Cybercriminals are human, and as such are creative and adaptable to thwarting whatever new obstacles cybersecurity professionals like Blasco put in their way. That’s why the good guys need to keep a constant eye on what the bad guys are doing, so they can more quickly adapt in the ongoing cat-and-mouse game.
Cyber criminals may be using COVID-19 as cover and pretext to spread their ill will and malware, and data standards in cybersecurity may be a mess. But it’s not all bad news. According to Blasco, Alien Labs has seen an unprecedented surge in cooperation with the larger security community to collect threat intelligence related to the COVID-19 surge and push back with better security software.
“It’s always hard to tell [who’s winning],” Blasco tells Datanami. “What I can tell you is the amount of energy and the amount of effort that community has pulled together to help the companies of all sizes mitigate these issues is — it’s unbelievable.”