Follow Datanami:
October 13, 2022

Matillion Debuts Data Integration Service on K8S


Matillion yesterday debuted the Data Productivity Cloud, a new service that brings all of the vendor’s ETL and data integration tools together in a single software as a service (SaaS) offering running atop Kubernetes in the AWS cloud.

The idea behind the Data Productivity Cloud is to provide a one-stop shop for all of an organizations’ analytic data integration needs, ranging from batch ETL/ELT to real-time streaming data via change data capture (CDC), says Ciaran Dynes, chief product officer at Matillion.

“It’s anything from enterprise applications to databases, bringing all the information to your data cloud, using Matillion’s transformation capabilities,” Dynes says. “It’s managing that data, merging that data together to create those insights, and then easily from there going to connect that data back into those applications. But it’s all 100% orchestrated in a single, SaaS user experience.”

Matillion has made a name for itself in the ETL/ELT space by catering to the needs of enterprises moving large amounts of transactional and operating data into cloud data warehouses, specifically those offered by Snowflake, AWS’s Redshift, and Databricks, for which it offers push-down SQL-based data transformations in the ELT manner.

The company offers more than 100 pre-built connectors to extract data from source systems, and its software can automatically load data into the schema expected by the cloud warehouses. It supports basic data transformations in a no-code environment, and also offers a low-code method of customizing transformations for SQL-loving analysts.

Redeveloping its entire suite of data integration software to run atop Kubernetes has long been a goal for Matillion. “We’ve rewritten the Matillion ETL layer” for Kubernetes, Dynes says. “Customers won’t even see it.  But under the covers, we’ve adapted the Matillion jobs to run horizontally in clusters.”

Matillion’s new Data Productivity Cloud is currently available on AWS

Now that it’s complete, customers can take advantage of the automated provisioning and load balancing capabilities that comes with running in a serverless fashion in containers  and Kubernetes.

When the Data Productivity Cloud detects an increase demand for processing power, it automatically provisions the additional processing power needed. Similarly, when workload demand decreases, the cluster scales itself down to zero, effectively.

The move to K8S allows Matillion to take the pain and hassle of manually managing and scaling infrastructure off the backs of its customers, Dynes says.

“We’re actually managing all the infrastructure,” he says. “You simply configure it. You design your job inside Matillion, you run your job inside Matillion containers and workloads, and it’s actually our SREs and IT operations teams that manage all of that on your end.”

A consumption-based pricing scheme has also been introduced with the Data Productivity Cloud. Customers start out by buying a set amount of credit with Matillion. Then, as they process data, the credits are debited from their account. As the integration workload goes up, they pay incrementally more. But if there’s no work going on, they pay nothing.

To prevent big data from resulting in massive bills, Matillion allows customers to set limits on how far the Data Productivity Cloud can scale.

“You can set high and low mater marks. How much scale do you want?” Dynes says. “You may have a threshold where, based on the volume of the data, maybe you’re happy to double or quadruple or more the actual scale of the cluster. Maybe you want to put some ceiling limits on it. But it’s fully configurable on the customer side in terms of what they want to do.  On our side, then we will configure the Kubernetes container to scale based on that configuration.”

While the cloud data warehouse handles the bulk of the processing required for data transformations in the ELT scheme, there’s still plenty of work left over for Matillion, and its customers, to realize the benefits of automated scaling and provisioning when it comes to parallel loading, preprocessing and data orchestration, and reverse ETL, Dynes says.

“The collective capability is, everything is serverless,” he says. “Out of four workloads, three of them massively benefit from the Matillion serverless capability, whereas the fourth one [transformation], we already leverage the technology of the cloud platform themselves.”

The new offering is available on AWS now and will be made available on Microsoft Azure and Google Cloud in the future, Dynes says. Eventually, Matillion’s goal is to enable users to manage batch and streaming data integration workloads on all clouds and on-prem from a single centralized console, he says.

The launch of the Data Productivity Cloud marks the completion of a major project for Matillion, which has been working on developing a K8S-based serverless data integration offering for years. “It’s a pretty significant and a pretty momentous thing in the history of Matillion,” Dynes says.

Matillion made a couple of other announcements yesterday. For starters, it’s now supporting dbt, which will give customers another option for programming data transformation tasks besides SQL and Python.

SQL is, by far, the dominant language, but Matillion has been rolling out support for Python , which is popular among data scientists. The dbt tool, which is backed by dbt Labs, is an extremely popular tool, so it behooved Matillion to make it easier for customers to use it in its ETL offering.

The company also rolled out new connectors for SAP, Workday, and Anaplan.

Related Items:

Meet Matillion CEO Matthew Scullion, a Datanami 2022 Person to Watch

Matillion Unveils Streaming CDC in the Cloud

Can We Stop Doing ETL Yet?