Follow Datanami:
January 12, 2023

It’s Time for the All-in-One Data Stack

Saket Saurabh

(WHYFRAME/Shutterstock)

“Best-of-breed” or “all-in-one”? This is a common trade-off when choosing technology solutions. However, every once in a while it becomes possible for the two to converge into a single product.

The iPhone (or smartphone in general) is a classic example. Until the iPhone debuted in 2007 and became mainstream within a few years, it was common for people to get the best-of-breed personal devices of the era, such as phones, music players, PDAs, cameras, GPS, and more. In fact, each of these categories had its own leader: Motorola and Nokia in phones, Apple in music players, Blackberry in email devices, Garmin in Navigation, Nikon and Sony in point-and-shoot cameras, and so on. It was inconceivable that a single device could replace all of them.

We are at the cusp of a similar evolution within the data tools ecosystem. For decades, ETL, ELT, iPaas, streaming, data preparation/transformation, data quality, cataloging, and monitoring have been separate tools. Until recently, best-of-breed has been the way to go. However, the convergence into powerful all-in-one tools has already begun.

So let’s take a look at when and how all-in-one solutions start to take over from best-of-breed solutions.

The Tradeoff Between Best-of-Breed and All-in-One

The trade-off here comes from the fact that all-in-one tools are generally woefully inadequate in any one capability when compared to best-of-breed solutions catering to that capability. Best-of-breed solutions are built for one problem. That focus allows them to learn, innovate, build, and address the wide range of user needs within that area. For the most part, an all-in-one approach doesn’t allow for that level of focus, leaving them unable to serve all user needs as comprehensively.

iPhone is an example of all-in-one tech (charnsitr/Shutterstock)

The drawback of best-of-breed solutions, however, comes when trying to stitch together full enterprise-grade solutions with five or six best-of-breed solutions. This creates gaps in integration between tools and requires training with different interfaces, as well as implementing policy and governance across tools. While each area is individually serviced, there is an overall lack of growth potential or planning for future unmet needs.

All-in-One For The Win

To understand when the tables turn, all-in-one approaches become as good as or even better than best-of-breed ones. Let’s analyze the driving factors, going back to the example of the iPhone.

New technology or approach: The iPhone was a new approach of applying a general purpose computer architecture packed into a phone form factor with many built-in peripherals including display, touch input, audio, networking, cellular connectivity, and GPS. The PC-like hardware was supported with a PC-like software ecosystem in the form of a general purpose Unix OS with a SDK (Software Developer Kit) and App Store.

Maturity of underlying technology: Much of the success of the iPhone, and other smartphones, came from the fact that the underlying technology, including cellular modems, cameras, audio chips, GPS, memory, storage, and other aspects evolved to a point of maturity in terms of technology and supply chain that allowed the makers of the smartphone to skip those steps of design, engineering, and manufacturing altogether.

Assembling vs. building: When (1) and (2) come together, i.e. a new technology approach with the ability to leverage mature components, then building a top-end, all-in-one product becomes possible. As skills shift towards assembling and stitching, and away from building every individual capability from the ground-up, innovation becomes faster and easier. User needs are also better understood for each individual application. The win for Apple came from being able to create a much better integration glue. In this case, the iOS and App Store assembled the right set of ingredients in terms of hardware components and applications.

The strategy of pursuing an all-in-one product pays off when the product can be 90% as good as (and sometimes even better than) any single best-of-breed solution. Add to that the smooth integrated experience, ease of use, a consistent user interface, unified governance and policy management, and you have a true winner.

(Michael-R-Ross/Shutterstock)

A more relatable example in enterprise software is Amazon Web Services, which is a classic all-in-one example of infrastructure and enterprise technology bringing together, compute, networking, storage, databases, and several such infrastructure building blocks under a single umbrella.

All-in-One Data Stack

What is making the all-in-one data stack possible?

1. New Technology and Approaches

The trajectory of a product is often driven by the initial target user, use cases, and the resulting architecture. While data tools have gone through many iterations in the last 30 years, the current approach by some companies is centered around a new type of user: one that understands the data and knows what to do with it, but is semi-technical in data systems.

This has led to:

  • Auto-generating connectors, instead of writing code for each;
  • Logical data products: This serves as the connective glue, similar to iOS in the Apple example;
  • Multiple interfaces to underlying product services through no-code UI, SDK, and CLI, allowing the product to serve multiple stakeholders.

2. Maturity of Underlying Technology: Batch Processing, Streaming, and Scheduling

In the past decade, data processing itself has matured in the form of high-quality batch and stream processing systems led by Spark and Kafka. Combined with the maturity of cloud infrastructure, this allows for components and capabilities such as unified batch and stream processing of data, and multi and hybrid cloud architecture, made possible by the maturity of container-based architectures.

(optimarc/Shutterstock)

3. Assembling vs. Building

Combine (1) and (2) with the right connecting glue and it becomes possible to build a very compelling all-in-one product. Logical data products that represent data and metadata in a shareable unit with many flexible consumption patterns and interfaces have proven to be a very effective glue for different users and applications. Some examples include the ability to:

  • Auto generate a data product from read connectors;
  • Transform, combine, enrich, and validate data products to create new derivative data products;
  • Share data products with other users who may further prepare their own derivative data products, share with downstream users or even outside data consumers;
  • Connect data products a warehouse, stream, API, spreadsheet, email for any consumption pattern
  • Populate a catalog with metadata from data products.

I have seen first-hand how some of the largest enterprises have been able to take advantage of this all-in-one product approach. We are still early in this evolution of the new data stack, but from a user’s perspective, a compelling all-in-one solution means users can shift their energy away from operating multiple tools and towards thinking about the layer above. Data is only the means, never the ends in an enterprise. As the data stack stabilizes and becomes simpler to use and manage, teams can focus more of their energy on using data for their analytics, AI, BI, operations, and their own customer’s needs and the applications used to serve them.

Building Blocks of Data Tools

How the all-in-one stack will evolve remains to be seen, but for reference, here is a breakdown of the common building blocks of most data tools. Once we understand the components, we can then see how they would fit together for an all-in-one execution.

(elwynn/Shutterstock)

  1. Connectors serve the purpose of reading and writing data to data systems;
  2. Data processing is about applying computation to data. This leads to capabilities such as data transformation, data preparation, and data validation. Access control should allow for every unit to have control of user and group level permissions;
  3.  Data transport is about moving data, whether in batch, stream, or real-time;
  4. Metadata is data about the data, such as schema, location of data, glossary, documentation, data characteristics;
  5. Access Control should allow for every unit to have control of user and group level permissions.

Let’s now translate these building blocks into the data tools we understand very well.

ELT: Read Connectors for SaaS +, Batch transport + Metadata mapping to Tables + Write Connectors to Warehouse. Data Processing with SQL Functions.

iPaaS: Read Connectors for API + Streaming Transport + Metadata mapping to API + Write Connectors for API

ETL: Read Connectors to DB and Files + Batch Transport + Data Processing with Transforms Functions + Write Connectors for DB and Data Lake

Data Quality : Data Processing with Validation Functions + Stream Transport of Metadata and Errors + Write Connectors to Notifications and Alerts

Data Catalog and Governance: Read Connectors for Metadata + Metadata search and organization + Access Control

Data Preparation: Data Processing with Transform and SQL functions + Batch and Stream processing

The Future With All-in-One

All-in-one executions are far and few in between because they are hard to build. However, when they really work, they are beautiful and end up dominating their ecosystem. For example, AWS undertakes an ‘all-in one’ approach in data center infrastructure.

The winners of the all-in-one strategy over time start to create market domination with two approaches. First, they expand the scope of their product beyond the initial mix of sub-products. An example in the smartphone world is the digital wallet. Secondly, they create verticalized technology to go beyond what best-of-breed solutions have achieved in that segment, thus going further than simply assembling technology. It is no wonder that Apple is now making its own chip for its devices. How this will all play out in the data stack is yet to be seen, but one thing is for sure: there is a lot of innovation coming to empower the data user in the next few years.

About the author: Saket Saurabh is the Co-founder and CEO of Nexla, a data engineering automation company. Nexla’s customers include JPMorgan, LinkedIn, Instacart, DoorDash, Eddie Baeur, Bed, Bath & Beyond, Poshmark, and major brands. Saket previously worked at Nvidia and led a mobile ad tech startup towards a successful IPO. Saket holds an MBA from The Wharton School and a BTech in Computer Science and Engineering from IIT, Kanpur.

Related Items:

Inside the Modern Data Stack

The Six New Rules of Data

Six Reasons Why Enterprises Need a Modern Data Integration Architecture

Datanami