Follow Datanami:
March 2, 2023

Cyberspooks Need Big Data Portability, Too


The problem of how to effectively move and manage large amounts of data is one that impacts all organizations of a certain size, including U.S. Government agencies working in cybersecurity. Now a new partnership between two software companies could ease the big data portability problem when it comes to test case repeatability for the government cyber pros protecting our nation’s digital pathways.

Last month, Sylabs and DeciSym announced a partnership that will see them collaborate to deliver a big data portability solution through the Department of Defense’s Test Resource Management Center (TRMC). The two companies signed a 12-month tender contract, shepherded by Trade Winds, to develop a “virtual data fabric” that should dramatically simplify how government cybersecurity teams package, transmit, and work with large data sets used for testing purposes.

Folks in the HPC community will recognize Sylabs as the developer of the Singularity line of containerization solutions, which help to package code and application runtimes for deployment on supercomputers and other batch-oriented infrastructure. SingularityPRO and Singularity Enterprise are already well-established in industrial, government, and academic institutions, and provide a known quantity to build from.

DeciSym, on the other hand, is a relative newcomer to the field. Founded in April 2022 by Don Pellegrino, DeciSym aims to tackle thorny big data management problems impacting governmental agencies and industrial teams. The partnership with Sylabs is designed to leverage the containerization capabilities of Singularity, and to extend it to data as opposed to using it just for code and application runtime portability.

“We’re creating custom technology to make it easier to package that data up, communicate it with others to share that information, and then ease the workflow steps to working with it in those sort of not traditional enterprise environments, where you would have air gaps or security controls or it crosses system boundaries,” Pellegrino tells Datanami.

Cybersecurity professionals today struggle to create testbeds for the solutions they’re developing. They may put many hours into building a test environment for a given application, replete with all of the test data required to assure that the application behaves as expected. When the testing is done, they typically move onto other things, and the test environment is mothballed. When the need for another round of testing arises, they typically must recreate that environment from scratch, wasting resources and dragging out the project.

The virtual data fabric developed jointly by Sylabs and DeciSym will enable the government cybersecurity professionals to retain that testbed of data, thereby giving them a jumpstart on new projects and improving reproducibility and application quality to boot.

“The intent is to increase the efficiency,” Pellegrino says. “Their starting point is better informed because they’re able to bring those digital assets to bear in ways that significantly reduce the manual effort and improve quality.”

SQLite was originally developed to improve data availability on US Navy destroyers (Photos_Footage_11111/Shutterstock)

For example, if government cyber professionals were developing a new machine learning model that could automatically detect malicious network activity, they would need a large amount of training data to test their models. Today, the process of preparing large amounts of network traffic data that may or may not contain signals of malicious activity is largely one-off and bespoke. By combining Sylabs’ Singularity containerization with DeciSym’s data management capability, a large testbed of known network traffic can be maintained in a known state and spun up quickly, thereby improving project delivery and hopefully accuracy too.

“The real challenge here is when they set up to test a new system today, they’re not making great use of the work done in the past,” says Sylab’s CTO Adam Hughes. “[If] they had great test data three years ago for a similar system and they actually developed some novel test software three years ago, today it’s quite hard for them to just bring those onto their test range for version two or the next version of something similar.”

Pellegrino likens the data virtualization solution to SQLite, the embedded database that works in a self-contained manner. SQLite was created by D. Richard Hipp while on a General Dynamics contract for the U.S. Navy. Hipp needed a way to test the damage-control system on destroyers, which originally used an Informix database running on HP-UX. When Hipp routinely encountered database server connection problems with the ship’s damage-control system, he decided to bypass the main Informix database running on the Unix server and instead devise a tiny database that could be embedded anywhere that can handle its miniscule 250KB footprint.

Like SQLite, the virtual data mesh will enable the sharing of large amounts of cyber data just about anywhere.

“We’re packaging collections of data into a single file that sits on file system and then piggybacks off of the most bare computational baseline,” Pellegrino says. “You could stand up any Linux system from nothing and then go to work by just adding some files on top. That’s how we manifest, ideally.”

The joint project with the TRMC began in October and is slated to run through September. If all goes as planned, the virtual data fabric could become a standard used through the Department of Defense. Eventually, the solution could also make its way back to the private sector, where enterprise IT teams face similar problems manipulating large pools of data for operational use cases, analytics, and machine learning.

A virtual data fabric bears some simularity to a data lake. In each case, users are looking to simplify acces to large amounts of centrally stored and managed data. In any case, Sylabs and DeciSym’s joint mission now is finding a way to enable enabling the cybersecurity professionals working for the U.S. Government to more efficiently share test data.

“That’s the big mission here. We’re in discovery process where the focus is on the war fighter or the warfighter support teams in these tests, and we have to figure out how they work and leverage points to make them more effective,” Pellegrino says. “We’re inventing these capabilities so that their day-to-day work ultimately shifts focus from data handling, data maneuvering to a focus on the system that they’re testing, and the quality of the tests and the test results and building a community practice on that shared knowledge as opposed to one-offs in isolation.”

Related Items:

Air Force Looks to AI to Help Maintain Bombers, ICBMs

U.S. Army Employs Machine Learning for Deepfake Detection

U.S. Special Ops Launches $600M Analytics Effort

Editor’s note: This article has been corrected. SingularityPRO and Singularity Enterprise don’t use Docker and Kubernetes under the covers. Datanami regrets the error.