Follow Datanami:
December 8, 2021

Matillion Unveils Streaming CDC in the Cloud

(TechnoVectors/Shutterstock)

Matillion made its initial entry into the world of cloud-based ETL at the AWS re:Invent conference in 2015. So it was fitting that the company chose last week’s re:Invent as the venue to announce Matillion Data Loader 2.0, the latest component of the company’s burgeoning data operating system, which includes a new cloud-based, streaming change data capture (CDC) capability.

The Matillion extract, transform, and load suite has grown significantly since that initial product launch six years ago. Initially developed for AWS’s Redshift data warehouse, the cloud-based Matillion ETL offering has been adapted to support the delivery of enriched data from more than 100 transactional systems (on prem or cloud) into all of the major cloud data warehouses, including Snowflake, Google Cloud BigQuery, and Microsoft Azure Synapse.

Funded with $100 million from a Series D round in February–which it topped in September with a $150 million Series E in September–the company has continued to invest in R&D to deliver what customers want. According to CEO Matthew Scullion, part of what customers want is the flexibility to choose which components they use, which led the delivery of Matillion Data Loader 2.0.

An overhaul of the first release delivered at re:Invent two years ago, Matillion Data Loader 2.0 is designed to help enterprise move large amounts of data from transactional systems into cloud-based analytic systems, either through streaming methods or via batch.

The new offering includes two major components, including a streaming CDC that utilizes Apache Kafka and other technologies to move data in real-time fashion from source systems, as a new no-code environment for building custom data connectors.

“Matillion CDC, our write-ahead log-based changed data capture [is] like proper, grown up CDC,” Scullion said during an interview at re:Invent in Las Vegas last week. “A lot of people say they’ve got CDC and what they’re really talking about is a diff. ‘My tables changed. I’m trying to figure out how the tables changed, I’ll just replicate the changes as a diff.’”

However, that form of CDC does not deliver the level of accuracy that enterprises demand, in part because they cannot handle deleted data, Scullion said.

“You can’t do a ‘diff’ over a delete, but you can see what’s happened in the write-ahead log,” he told Datanami. “The change log says what’s happened in the database. You can read that and apply the same changes to a target database.”

Scullion said that Matillion CDC offering delivers the type of functionality that enterprise are accustomed to with established CDC solutions from Golden Gate (Oracle), Attunity (Qlik), and HVR Software (now owned by Fivetran), with the caveat that the Matillion solution is designed to run in the cloud.

“There’s been really great write-ahead log-based CDC products around for years. Golden Gate, Attunity, HVR–those are all great write ahead log-based CDC products,” he said. “Lots of people need CDC. But it’s also a common pain point, because if you’re building a modern enterprise cloud data stack–let’s say you’re using Snowflake, Matillion, and Dataiku–and you need CDC, then you have this outlier on the side of that stack that’s literally 30 years old or 25 years old. Does that scale? Is it managed? Is it deployed in the same way as the born-on-the-cloud native products? Obviously not.”

Matthew Scullion is founder and CEO of Matillion

Matillion identified the need to reinvent (so to speak) the CDC layer in the stack about 18 months ago, and is now delivering the beta of the product, with plans to ship it as a generally available product in 2022. The software, which features Kafka and a smattering of other technologies under the hood, will support the standard mix of relational databases used for transactional systems, such as Oracle, PostgreSQL, and others, as well as NoSQL databases that support write-ahead logs.

“For all the same reasons that all cloud technologies have disrupted previous technology, this was one that was in the queue to be fixed up,” Scullion said. “We feel we have pedigree on doing this because it’s the exact same situation we were in in 2014, 2015 when we were using public cloud and cloud data warehouses, but we were using it with pre-cloud, legacy data integration software.

“It’s a little bit like watching an ultra-HD Blu-ray movie on a standard definition set,” the CEO continued. “You know it’s really high quality, but if you’re watching it through something else, you’re obfuscated from it–you can’t tell the difference.”

Rounding out the Matillion Data Loader 2.0 offering is the new Universal Connectivity feature. Like other ETL providers, Matillion already sported a number of pre-built data connectors for all the usual suspects, including ERP systems, marketing and CRM applications, productivity tools, and many other types of applications. But keeping up with customer demands for connectors is not easy for independent software vendors (ISVs) in the data integration game.

“We all know the same thing, which is there is very fast degradation curve in how much they use,” Scullion said. “So the top 50 are used by everybody all the time. The next 50 are used by some of the people some of the time, and the next 100 after that are used by hardly anyone hardly any of the time.

“That is a massive job,” he continued. “No ISV can deliver all the connectors that everybody needs, and no customer can find an ISV that has all the right connectors. The problem that that leaves then is how do you square that circle. And the answer is Universal Connectivity, launched in Matillion Data Loader this week.”

Universal Connectivity builds on Matillion ETL’s pre-existing capability by providing a no-code environment for generating bespoke data connectors. All that’s required is that the data source  can be connected via a REST API, and the software does the rest.

“As a non-technical user, you can just press the button, configure the wizard, and it builds you a secure, scalable, high performant connector,” Scullion said. “Once it’s built, the connector is there forever. If the API changes, you just go and tweak it. What it means form a business value point of view is the answer to the question ‘Do you have a connector for that” is now always yes.”

The delivery of Matillion Data Loader is occurring amid what Scullion called the “extreme decomposition” of the data integration space. Silicon Valley-funded startups have raised hundred of millions of dollars specializing in a piece of the data integration game. Matillion offers the full gamut with its flagship Matillion ETL offering, but it will offer more targeted solutions for customers that just want that component.

“If you look at Matillion ETL, which is the full-featured enterprise data integration product, it loads data, transforms data, synchronizes data, and orchestrates data,” Scullion said. “The last four words in each of those has 100 features under each of them.”

Like the cockpit of a modern airliner, there are many levers and dials in the ETL offering, and for good reason. But not every customer wants to fly a modern airliner. So for them, Matillion will offer point solutions that will also work with the other elements of what Matillion dubs its data operating system.

“We’ve gone from product to product range and now to platform,” Scullion added, “and where we’re going is to data operating system, a single cohesive platform that takes care of all aspects of how customers can make data useful.”

Related Items:

Matillion Rides Cloud ETL to $100 Million Round

Cloud Is the New Center of Gravity for Data Warehousing

Can We Stop Doing ETL Yet?

Datanami