Follow Datanami:
July 20, 2020

Building an Open Cloud Data Lake Future

The explosion of data and the need for business agility to leverage that data for competitive advantage are driving a massive surge of data lake innovation. We’ve moved past first-generation on-premises Hadoop-based data lakes to focus on building next-generation data platforms in the cloud. Organizations of all sizes recognize that cloud data lakes, with separation of compute and data, give them the flexibility and freedom they need both today and tomorrow.

A key advantage of cloud data lakes is their open architecture, which minimizes the risk of vendor lock-in as well as the risk of being locked out of future industry innovation. As the cloud data lake evolves to support a wide range of production analytical and data processing use cases, it’s important to ensure that it maintains this open architecture in the future. A rich ecosystem of open source projects, technology vendors and cloud providers has emerged to make that a reality.

What’s been missing is a rallying point for this rapidly expanding community — an industry event dedicated to showcasing the newest innovations and the best ways to put them to work. And that’s why we’re excited to introduce and host Subsurface, the industry’s first cloud data lake conference.

Subsurface is an industry conference spanning the entire open cloud data lake architecture. It’s the only event designed to bring the broader community together around the fast-growing world of cloud data lakes. A crucial part of that community is the open source creators and committers whose innovations will fuel the next generation of cloud data lakes. At Subsurface, you’ll be able to dive into innovative open source projects such as Apache Arrow, Iceberg, Parquet, Marquez and Superset and how companies like Expedia and Netflix are using them to build open cloud data lakes.

Open source sessions include:

  • Technical Keynote: The Future of Intelligent Storage in Big Data – Daniel Weeks, Big Data Compute Team Lead at Netflix and Apache Iceberg and Parquet committer
  • Apache Arrow: A New Gold Standard for Dataset Transport – Wes McKinney, director at Ursa Labs, Pandas creator and Apache Arrow co-creator
  • Functional Data Engineering: A Set of Best Practices – Maxime Beauchemin, CEO and co-founder at Preset, Apache Superset creator and Airflow creator
  • Data Lineage and Observability with Marquez – Julien Le Dem, CTO and co-founder at Datakin and Apache Parquet co-creator
  • Lessons Learned From Running Apache Iceberg at Petabyte Scale – Anton Okolnychyi, Apache Iceberg PMC member and Apache Spark contributor
  • Hiveberg: Integrating Apache Iceberg with the Hive Metastore – Adrian Woodhead, principal engineer and Christine Mathiesen, software development at Expedia Group

And it’s not just about the technical sessions — Subsurface is the catalyst for a long-term cloud data lake community. We’re creating a dedicated Slack instance for Subsurface which you’ll be able to use both during and after the conference. You’ll be able to jump into topic-based Slack channels with attendees, speakers, event sponsors and open source project leads to get questions answered, watch demos and collaborate on making your cloud data lake initiatives a success.

So, whether you are looking to expand your technical knowledge or hear from your peers about their cloud data lake use cases and architectures, Subsurface provides plenty of opportunities to learn, network and be inspired. We’ll even have a little fun along the way.

Register for Subsurface today!