Follow Datanami:
June 15, 2022

Exploring the Top Options for Real-Time ELT

Rajkumar Sen

(posteriori/Shutterstock)

Competitive advantage in today’s world rests on a company’s ability to innovate and adapt to a rapidly changing environment. To do that, organizations must adopt real-time thinking in the way they approach the design, development and maintenance of their data infrastructure.

Above all, that means dispensing with point-to-point integration and outdated batch processing methods that simply lack the necessary speed and agility to support competitive advantage in today’s world.

Real-time extract, load, transform (ELT) software addresses a critical missing piece of the integration puzzle. While there are a lot of workflow-oriented SaaS integration tools on the market, virtually none of them address the need to extract high-volume transactional data from backbone systems like ERP and deliver it to cloud analytics platforms where it can be put to immediate use.

Change data capture (CDC) is the common starting point for this kind of high-volume, real-time integration. CDC is fast and efficient because it’s driven by log activity rather than attempting to compare and synchronize large datasets. Unfortunately, there are only a small handful of ELT solutions that can check all of the boxes for the kind of immediate, high-volume transactional integration that today’s enterprises need.

What to Look for in a Real-Time ELT Solution

Fortunately, it’s easy to identify the right ELT tool by filtering on the key characteristics that address gaps in the modern data stack. Here are the questions you should ask:

  • Does it offer a wide range of enterprise connectors? The ecosystem surrounding the modern data stack offers a range of different tools to integrate with SaaS applications, but there are relatively few connectors available for enterprise data stores like ERP, systems of records, or other large-scale databases. A true enterprise-grade ELT tool offering should include pre-built connectors for all of your systems, including OLTP, OLAP, and cloud platforms. This is a core requirement because it eliminates the data silos that drive the ELT imperative in the first place. A wide array of data connectors also serves to future-proof your enterprise as it grows, giving you the flexibility to adopt a range of new systems without worrying about interoperability.
  • Does it guarantee against data loss? Look for an ELT tool that provides built-in data consistency and data validation. When pipelines crash, does data integrity suffer because of missed transactions or duplicates? Or does the solution guarantee 100% complete and accurate data transfer, with zero data loss? Ask whether the tool has built-in checkpointing and restart capability so your business never misses a transaction. Each change must be delivered from the source to the target exactly one time, with complete accuracy. Data loss can be especially disastrous as companies begin to rely more and more on artificial intelligence and machine learning. Even small amounts of data drift can erode the accuracy of these technologies, leading to negative business outcomes.
  • Does it degrade performance in the source application? A good ELT tool should be capable of performing change data capture based on transaction logs. It should not rely on an endless stream of queries against the source database in order to detect changes. The best ELT solutions will not degrade source system performance and won’t time-stamp production databases as they read data. CDC solutions can be log-based, timestamp-based or checksum-based. Log-based CDC works without adversely affecting the source because it only reads transactional change streams and logs. It’s fast, reliable, secure and low impact.

    (voyager624/Shutterstock)

  • Does it allow for zero maintenance of streaming pipelines? With some integration platforms, schema changes can result in a need to stop the flow of data and manually reconfigure the schema on both ends of the pipe. Typically, this requires a team of engineers to be on call, monitoring for changes and fixing the pipeline when it breaks. The best ELT solutions make it easy to maintain data pipelines by handling schema changes and evolutions automatically.
  • How secure is it? Data must be encrypted in transit in order to protect personally identifiable information (PII) data and other sensitive information. A good ELT solution will simplify this process so this data can be handled effectively and efficiently, in full compliance with regulatory guidelines.
  • Will it scale? As an organization grows, so will its data integration requirements. If your ELT solution chokes on high volumes of data, your entire data infrastructure will be put at risk. A robust ELT solution should offer built-in autoscaling and performance optimization features to accommodate growth. It should be capable of handling high-volume, high-velocity and high-variety data. In the cloud era, businesses must be able to automatically scale resources up and down based on their needs. Your ELT platform is no different.

As you ask these questions, you’re likely to see some of your initial ELT candidates fall off the list. This isn’t to say that there aren’t some good ELT solutions to choose from, though. Most have at least one or two major shortcomings, and you’ll need to do your homework to zero in on the factors that are most important to you.

There are a few very good contenders in the ELT space, but relatively few cloud-native CDC offerings that can handle high volumes of transactional data with guaranteed delivery. Since ELT plays such a pivotal role in the modern data stack, it’s important to do your homework and drill down on the details.

About the author: Rajkumar Sen is the founder and chief architect at Arcion, the only cloud-native, CDC-based data replication platform. In his previous role as director of engineering at MemSQL, he architected the query optimizer and the distributed query processing engine. Raj also served as a principal engineer at Oracle, where he developed features for the Oracle database query optimizer, and a senior staff engineer at Sybase, where he architected several components for the Sybase Database Cluster Edition. He has published over a dozen papers in top-tier database conferences and journals and is the recipient of 14 patents. For more information on Arcion, visit www.arcion.io/, and follow the company on LinkedIn, YouTube and @ArcionLabs.

Related Items:

Fivetran Raises $565 Million, Buys CDC Vendor HVR

In Search of the Modern Data Stack

Can We Stop Doing ETL Yet?

Datanami