Soda Launches Open Data Monitoring
The roster of data monitoring platform vendors continues to grow with the release of open source SQL management tools by Belgian startup Soda.
Earlier this month, Brussels-based Soda announced a $17.7 million Series A and seed funding round led by European venture investor Singular along with existing seed funders DCF, Hummingbird Ventures, Point Nine Capital and angel investors.
The two-year-old startup said the funding round marks the first stage of an initiative to release open source management tools that would help identify and monitor large data sets, data lakes and data warehouses. The suite includes tools for screening data pipelines while monitoring data via configurable tests.
Soda said Tuesday (Feb. 9) it is developing an open source data testing and monitoring suite dubbed Soda SQL that includes developer tools for data frames and streaming data. Those tools are promoted as operating across data workloads and query engines.
Environments covered include Kafka, Spark, Amazon Web Services S3 and RedShift, Microsoft Azure Blob Storage and Azure Synapse, Google Cloud Datastore and BigQuery, Presto and Snowflake.
Soda SQL configuration options are designed to allow data engineers to monitor tests used to screen for bad data. Metrics are also included to evaluate results.
The data management platform uses SQL requests to extract column profiles and metrics. Query control is provided through declarative YAML configuration files. Soda SQL tests are run across data pipelines, triggering alerts when bad data is detected.
“The subject of testing is not code, but the data,” said Maarten Masschelein, Soda’s CEO and co-founder. The goal is detecting data quality issues early, while boosting collaboration to maintain quality via an open source management tool.
“Businesses are moving from using data to being built entirely on data, but there are challenges in terms of how they collect, process and maintain this data,” Masschelein added.
“You need to test and monitor data to stay on top of it, but most companies don’t have the capabilities and engineering resources to do this.”
Soda SQL runs either in the cloud or on-premise, and is available free on GitHub.