Follow Datanami:
January 20, 2020


ETL has been the industry standard for enterprise data management for several decades. Organizations rely on ETL processes to Extract data from relational databases, Transform it to a standard, consumable and business ready schema, and Load the data into another relational database sitting in the enterprise data warehouse. This creates a single source of truth for the enterprise, and a repeatable process for delivering updated data to the business for analytics and reporting.

The processes and infrastructure around enterprise data management, however, have fundamentally changed. No longer are organizations working with structured transactional data for quarterly or monthly reporting. The era of smartphones, IoT, AI and cloud computing has brought with it increasing volumes of diverse data, and strategic decisions are more and more driven by predictive insights rather than reporting on past performance. The demand for agile analytics and machine learning has organizations struggling to adapt legacy processes to this new paradigm. Is ETL to blame? Or does it just need a reboot?

The T is Changing. That’s a Good Thing

With the adoption of modern cloud infrastructure and an ecosystem of cloud platforms, E → T → L no longer makes much sense as an order of operations. Instead, organizations are reconfiguring the order of operations to be more in line with EL → T, using integration platforms to Extract and Load data into the data lake or data warehouse, and using a preparation platform like Trifacta to Transform the data for its downstream uses. This has several benefits. As transformation moves downstream and into the hands of business stakeholders, stakeholders are able to be more agile in their approach to analysis and prediction. IT is no longer bogged down by requests from business stakeholders, and instead can focus on infrastructure, governance, security, and process optimization. The new approach looks like this:

  1. Ingestion – Data is loaded into the data lake or data warehouse from applications, databases, and other sources. Data is Extracted from a variety of sources and Loaded into the centralized data platform. This process is generally then automated.
  2. Exploration – The data now needs to be explored and understood. Using Trifacta, users see rich data profiles of column level distributions and data quality statistics. Users can easily digest new datasets and assess the contents, a prerequisite for doing any meaningful transformation work.
  3. Preparation – Once a user understands their data’s contents, they inevitably need to manipulate that data to prepare it for its use downstream. This could encompass cleaning up data quality issues, extracting important information, blending with other datasets, aggregating or reshaping data, or performing feature engineering. By moving this process closer to the analysis, and creating user experiences aimed at empowering anyone who works with or relies on data for their jobs, organizations are more capable at adapting to changes in their business process and discovering more insightful ways to use data.
  4. Operation – If a data preparation process turns out to be fruitful, the next step is to automate the process to get continuous, repeatable value from data. Platforms like Trifacta allow users to operationalize data wrangling pipelines on an ongoing basis, and monitor and tweak over time. Trifacta also auto-generates metadata and lineage information, allowing organizations to track all of the steps leading up to analysis for data lifecycle management, auditing and compliance, and other needed data management operations.

So is this the end-of-life for ETL? Not exactly. More accurately this is the realignment to something more compatible with modern cloud computing.

Want to try Trifacta today? Sign up for a test drive today!