Follow Datanami:
March 2, 2020

5 Key Differences Between a Data Lake vs Data Warehouse

A data lake is not a direct replacement for a data warehouse; they are supplemental technologies that serve different use cases with some overlap. Most organizations that have a data lake will also have a data warehouse. Let’s compare the properties of a data lake in comparison to a  (data warehouse & separate ETL server).

1. Data in data lakes is stored in its native format

In a data lake, data can be loaded faster and accessed quicker since it does not need to go through an initial transformation process. For traditional relational databases, you would need to process and manipulate data before storing it.

2. Data in data lakes can be accessed flexibly

Data scientists, engineers, and analysts can access data much quicker in a data lake than they can in a traditional BI architecture. Data lakes increase agility and provide more opportunities for data exploration, proof-of-concept activities, and self-service business intelligence, all within your privacy and security settings.

3. Data lakes provide schema-on-read access

Traditional data warehouses employ schema-on-write technology. This requires an up-front data modeling exercise to define the schema for the data. All data requirements, from all data users, need to be known before modeling to ensure that the models and schemas produce usable data for all parties. As you unearth new requirements, you may have to redefine your models. Schema-on-read, conversely, allows the schema to be developed and tailored on a case-by-case basis. The schema is developed and projected on the data sets required for a particular use case. Once the schema has been developed, it can be kept for future use or discarded when no longer needed.

4. Data lakes provide decoupled storage and compute

When you separate storage from compute, you better optimize your costs by tailoring your storage requirements to the access frequency. The separation allows your business to archive raw data on less expensive tiers while allowing faster access to transformed, analytics-ready data. As a result, you can run experiments and exploratory analysis with new technologies much more easily with this type of data preparation. Traditional data warehouses and ETL servers have tightly coupled storage and compute. This means that if you need to increase storage capacity you also need to expand compute and vice-versa.

5. Data lakes go with cloud data warehouses

While data lakes and data warehouses are both part of the same overall strategy, data lakes go better with cloud data warehouses. ESG research shows that roughly 35 to 45 percent of organizations are actively considering cloud for functions like Hadoop, Spark, databases, data warehouses, and analytics applications. And this is a trend that will only continue to increase because of the benefits of cloud computing: massive economies of scale, reliability and redundancy, security best practices, and easy-to-use managed services. Cloud data warehouses combine these benefits with traditional data warehouse functionality to deliver increased performance & capacity and to reduce the administrative burden of maintenance.

To learn more about data lakes and how to optimize your data analytics download our eBook, ‘The Essential Guide to Data Lakes: Designing Data Lakes to Optimize Analytics‘.