The problem of how to effectively move and manage large amounts of data is one that impacts all organizations of a certain size, including U.S. Government agencies working in cybersecurity. Now a new partnership between two software companies could ease the big data portability problem when it comes to test case repeatability for the government cyber pros protecting our nation’s digital pathways.
Last month, Sylabs and DeciSym announced a partnership that will see them collaborate to deliver a big data portability solution through the Department of Defense’s Test Resource Management Center (TRMC). The two companies signed a 12-month tender contract, shepherded by Trade Winds, to develop a “virtual data fabric” that should dramatically simplify how government cybersecurity teams package, transmit, and work with large data sets used for testing purposes.
Folks in the HPC community will recognize Sylabs as the developer of the Singularity line of containerization solutions, which help to package code and application runtimes for deployment on supercomputers and other batch-oriented infrastructure. SingularityPRO and Singularity Enterprise are already well-established in industrial, government, and academic institutions, and provide a known quantity to build from.
DeciSym, on the other hand, is a relative newcomer to the field. Founded in April 2022 by Don Pellegrino, DeciSym aims to tackle thorny big data management problems impacting governmental agencies and industrial teams. The partnership with Sylabs is designed to leverage the containerization capabilities of Singularity, and to extend it to data as opposed to using it just for code and application runtime portability.
“We’re creating custom technology to make it easier to package that data up, communicate it with others to share that information, and then ease the workflow steps to working with it in those sort of not traditional enterprise environments, where you would have air gaps or security controls or it crosses system boundaries,” Pellegrino tells Datanami.
Cybersecurity professionals today struggle to create testbeds for the solutions they’re developing. They may put many hours into building a test environment for a given application, replete with all of the test data required to assure that the application behaves as expected. When the testing is done, they typically move onto other things, and the test environment is mothballed. When the need for another round of testing arises, they typically must recreate that environment from scratch, wasting resources and dragging out the project.
The virtual data fabric developed jointly by Sylabs and DeciSym will enable the government cybersecurity professionals to retain that testbed of data, thereby giving them a jumpstart on new projects and improving reproducibility and application quality to boot.
“The intent is to increase the efficiency,” Pellegrino says. “Their starting point is better informed because they’re able to bring those digital assets to bear in ways that significantly reduce the manual effort and improve quality.”
For example, if government cyber professionals were developing a new machine learning model that could automatically detect malicious network activity, they would need a large amount of training data to test their models. Today, the process of preparing large amounts of network traffic data that may or may not contain signals of malicious activity is largely one-off and bespoke. By combining Sylabs’ Singularity containerization with DeciSym’s data management capability, a large testbed of known network traffic can be maintained in a known state and spun up quickly, thereby improving project delivery and hopefully accuracy too.
“The real challenge here is when they set up to test a new system today, they’re not making great use of the work done in the past,” says Sylab’s CTO Adam Hughes. “[If] they had great test data three years ago for a similar system and they actually developed some novel test software three years ago, today it’s quite hard for them to just bring those onto their test range for version two or the next version of something similar.”
Pellegrino likens the data virtualization solution to SQLite, the embedded database that works in a self-contained manner. SQLite was created by D. Richard Hipp while on a General Dynamics contract for the U.S. Navy. Hipp needed a way to test the damage-control system on destroyers, which originally used an Informix database running on HP-UX. When Hipp routinely encountered database server connection problems with the ship’s damage-control system, he decided to bypass the main Informix database running on the Unix server and instead devise a tiny database that could be embedded anywhere that can handle its miniscule 250KB footprint.
Like SQLite, the virtual data mesh will enable the sharing of large amounts of cyber data just about anywhere.
“We’re packaging collections of data into a single file that sits on file system and then piggybacks off of the most bare computational baseline,” Pellegrino says. “You could stand up any Linux system from nothing and then go to work by just adding some files on top. That’s how we manifest, ideally.”
The joint project with the TRMC began in October and is slated to run through September. If all goes as planned, the virtual data fabric could become a standard used through the Department of Defense. Eventually, the solution could also make its way back to the private sector, where enterprise IT teams face similar problems manipulating large pools of data for operational use cases, analytics, and machine learning.
A virtual data fabric bears some simularity to a data lake. In each case, users are looking to simplify acces to large amounts of centrally stored and managed data. In any case, Sylabs and DeciSym’s joint mission now is finding a way to enable enabling the cybersecurity professionals working for the U.S. Government to more efficiently share test data.
“That’s the big mission here. We’re in discovery process where the focus is on the war fighter or the warfighter support teams in these tests, and we have to figure out how they work and leverage points to make them more effective,” Pellegrino says. “We’re inventing these capabilities so that their day-to-day work ultimately shifts focus from data handling, data maneuvering to a focus on the system that they’re testing, and the quality of the tests and the test results and building a community practice on that shared knowledge as opposed to one-offs in isolation.”
Related Items:
Air Force Looks to AI to Help Maintain Bombers, ICBMs
U.S. Army Employs Machine Learning for Deepfake Detection
U.S. Special Ops Launches $600M Analytics Effort
Editor’s note: This article has been corrected. SingularityPRO and Singularity Enterprise don’t use Docker and Kubernetes under the covers. Datanami regrets the error.
April 24, 2024
- Cribl Wins Fair Use Case in Splunk Lawsuit, Ensuring Continued Interoperability
- Rambus Advances AI 2.0 with GDDR7 Memory Controller IP
April 23, 2024
- G42 Selects Qualcomm to Boost AI Inference Performance
- Veritas Strengthens Cyber Resilience with New AI-Powered Solutions
- CERN’s Edge AI Data Analysis Techniques Used to Detect Marine Plastic Pollution
- Alteryx and DataCamp Partner to Bring Analytics Upskilling to All
- SymphonyAI Announces IRIS Foundry, an AI-powered Industrial Data Ops Platform
April 22, 2024
- Jülich’s New AI Foundation Models Aim to Advance Scientific Applications
- Cognizant and Microsoft Expand Partnership to Deploy Generative AI Across Multiple Industries
- Gulp Data and Datarade Partner to Empower Enterprises to Monetize Data
- Fullstory Launches Data Direct to Enhance Corporate Understanding of Behavioral Data
April 19, 2024
- Carahsoft to Showcase Cutting-Edge Solutions with 70+ Partners at GEOINT 2024
- BrainChip Highlights the 2nd Generation Akida at tinyML Summit 2024
- MathCo Named Microsoft Solutions Partner for Data and AI
- Salesforce Survey: Data Will Make or Break Workers’ Trust in AI
- Weights & Biases Announces Expanded Integration with NVIDIA NIM
- Dataminr Introduces ReGenAI to Enhance Real-Time Event Monitoring
- Cisco Reimagines Security for Data Centers and Clouds in Era of AI
- Gurucul Enhances Federated Search Capabilities Across Multiple Data Sources
- SAS-Sponsored Study Highlights Talent Shortages and Strategic Gaps in GenAI Adoption
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
AI Hardware & Edge AI Summit Europe
June 18 - June 19London United Kingdom -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States