Follow Datanami:
May 13, 2024

DataForge Sets New Standard for the Future of Data Platforms

Data engineering often requires the utilization of SQL scripting for data transformation within the database. However, this can result in lengthy scripts, recurring copy-paste patterns, the need for schema changes across data pipelines, and potential data loss due to SQL joins. 

These issues can exponentially increase the complexity of codes and data engineering pipelines.  As the complexity of the pipelines grows, so does the difficulty of managing and evolving them.

Until now, teams have relied on building monolithic data platforms using old coding patterns. However, this has been inefficient as it can add complexity to data platforms and significantly increase costs as demands for data and analytics continue to rise. 

Data Forge, a leading systems integrator that develops, builds, and distributes IT solutions, may have found a reliable and efficient solution to these challenges. The company has announced open-sourcing a new framework for developing and managing data transformation – DataForge Core. 

Deploying modern software engineer concepts to data engineering, DataForge Core has redefined the future of data platform development and transformation code management. The new framework is tailor-made for high-growth companies that build rapidly evolving data products. 

The DataForge Core framework operates on the principle of Inversion of Control (IoC). As the name suggests, this principle works by inverting the control flow of a program and taking control of the execution. Specific tasks can be delegated to modules or components within the framework to simplify and streamline data management.

“By bringing DataForge Core to the open-source community, we are reaffirming our belief that innovation happens through collaboration, not isolation,” said Matt Kosovec, co-founder and CEO of DataForge. “We have just scratched the surface of what is possible by thinking differently and believe we will need the help of both data engineering and computer science communities to evolve DataForge quickly enough to keep up with the demand for data and AI products.”

Dataforge Core enables data engineers to focus on generating business value from data by eliminating the need for tedious data plumbing chores. The new framework uses functional programming to simplify the process of translating business logic to code and adding it to existing code as needed. 

(Ayesha kanwal/Shutterstock)

With native integration with Spark SQL and Darabricks, DataForge Core simplifies the process for data scientists looking to create high-quality data pipelines. The framework is specifically useful for batch inference and feature engineering. 

In addition, the platform’s easy-to-follow patterns help in data preparation. Instead of working with numerous data preparation scripts that can quickly become difficult to manage, data scientists can focus on using their expertise to develop and refine ML models. 

Governance and auditability are key aspects of data management as they help in risk mitigation, maintaining data quality, and meeting regulatory requirements. DataForge Core uses a metadata repository that stores a compiled copy of the code in database tables. This facilitates the retrieval of code. This enables teams to simply utilize SQL queries to search the repository and quickly locate relevant code snippets required for audits, analysis, and other use cases.   

Related Items 

Salesforce Report Highlights Struggles with Digital Transformation: 98% of IT Organizations Face Challenges

IBM and SAP Announce New Generative AI Capabilities to Transform Business Processes

Redis Acquires Speedb, Expanding Its Data Platform Capabilities Beyond DRAM