Follow Datanami:
December 17, 2021

2021 Big Data Year in Review: Part 1

(marekuliasz/Shutterstock)

It’s mid-December, and the news has started to slow. As presents start to gather under the tree and the calendar approaches 2022, it’s worth taking a look back on the year that was 2021.

The battle raged on between data warehouses and data lakes. Old school analytics professionals prefer the predictable performance and relative order of a data warehouse, in particular those running on the cloud, which a 2020 IDG survey found was favored by 77% of decision makers. A new generation of ETL vendors, such as Fivetran and Matillion, are rising up to support these cloud data warehouses, and even Teradata hit a two-year high in stock value.

However, a new generation of data pros say S3-compatible object stores (or even–gasp!–Hadoop!) provide everything you need. Presto vendors, such as Ahana and Varada, as well as Starburst (Trino) and Dremio support data lakes as well.

There are also those who support data lakehouses, which blend the benefits of both approaches, and which is a favored approach for Databricks. While Gartner says 90% of data lakes eventually fail, others look to automation as a way to avoid quagmires. Gartner was also kind enough to throw us a life preserver and share their data lake tips.

Did We Mention Unicorns?

2021 had unicorns galore. Dremio hit the $1 billion valuation with a $135 million with a Series D round in early January. Not to be outdone, Starburst announced a $100 million at a $1.2 billion valuation the same day, leading the way among the Presto (and Trino) vendors. Cockroach brought in a $160 million Series E round in January at a $2 billion valuation, making it a database unicorn.

Lakehouse purveyor Databricks brought home a cool $1 billion in a late stage round in February at a $28 billion valuation. Then in August, it completed a $1.6 billion Series H at a $38 billion valuation. In December, it became more clear what the company intends to do with all that money when it launched Databricks Ventures.

Unicorns abounded in big data this year (Sergey_T/Shutterstock)

We don’t know if tigers like unicorns, but TigerGraph brought in $105 million in a Series C round in February, the same month that ETL vendor Matillion announced $100 million; it would raise another $150 million later in the year at a valuation of $1.5 billion, making it an honest-to-goodness unicorn.

Yugabyte, the distributed SQL database vendor, had $48 million round in March (but it’s not a unicorn). NoSQL database vendor Redis achieved unicorn-hood in April with a $110-million round at a $2 billion valuation. Alation cataloged a $110 million Series D round at a $1.2 valuation in April, putting it in unicorn status.

In June, Fishtown Analytics changed its name to dbt Labs and completed a $150-million Series C round at a $1.5 billion valuation. This month, the company was reportedly in the midst of a Series D round at a $6 billion valuation.

Neo4j, the oldest and largest graph database vendor, reached double unicorn status in June with a $325 million round of funding that values the company at $2 billion. Collibra completed a Series F round of funding worth $112.5 million in April, at a $2.3 billion valuation, the followed that up in November with a Series G round of funding worth $250 million at a $5.25 billion valuation.

DataRobot solidified its presence in the AutoML market with two moves in late July: completing a $300 million Series G round in August at a $6.3 billion valuation, and nabbing MLOps software provider Algorithmia at the same time. Not to be outdone, its competitor Dataiku hauled in $400 million in a Series E round at a $4.6 billion valuation in early August, giving it more capital to grow its machine learning and analytics business.

In September, Fivetran raised $565 million in a Series C round with a $5.6 billion valuation, then turned around and nabbed change data capture (CDC) vendor HVR Software for $700 million the same day. SingleStore also raked in $80 million that month at a $940 million valuation, putting it on the cusp of unicorn-hood, while Sisu Data brought in $62 million.

Scale-out SQL database vendor Yugabyte became a unicorn in October when it raised a $188 million Series C at a $1.3 billion valuation. Alluxio brought home $50 million in a Series C, but did not disclose its valuation.

Our last unicorn of the year is Anyscale, the RISELab spinout that announced a $100 million round at an even $1 billion valuation in December.

IPOs, Exits, and Spinouts

Confluent went public in June and raised more $800 million on the NASDAQ with a market capitalization of $13 billion. Its stock is up about 36% and the Kafka company today is worth around $16 billion.

Confluent co-founders Jun Rao, Jay Kreps, and Neha Narkhede ring the opening bell for Nasdaq on June 24 (image courtesy Nasdaq)

Informatica became a public company for the second time in late October with an IPO on the New York Stock Exchange. It raised about $840 million with a market cap of about $8 billion; today its stock is worth about $10 billion.

NoSQL database vendor Couchbase joined the ranks of publicly traded companies in July, when it IPO’ed on the NASDAQ. By mid-December, however, its stock was down about 33% from its debut.

SAS was spotted looking for an exit earlier this year, when the Wall Street Journal reported it was in talks to sell itself to Broadcom in a deal worth up to $20 billion. The analytics giant later walked away from the table, and a couple of weeks later, unveiled a new plan: an IPO in the year 2024.

Cloudera completed the rare reverse-unicorn in April when it announced it was going private in $5.3 billion deal with Wall Street firms.

The Russian search giant Yandex raised a few eyebrows in September when it spun out ClickHouse, its distributed, column-oriented analytics database, into its own company with a $50 million investment.

Feature Stores

Feature stores–those indispensable components of a machine learning system that are used for developing, maintaining, and monitoring the data features–hit the big time in 2021.

Among the feature store vendors making news in 2021 was Molecula, which raked in a $17.6 million round to help build a commercial version of the open source Pilosa project. H2O.ai teamed up with AT&T to co-develop a feature store this fall.

Cloud Costs

Cost became a big deal as big data workloads moved to the cloud. In February, Pepperdata released a study that found 20% to 40% of companies were on pace to spend 40% or more than they had allocated on cloud. A study by Anodot in June found that cloud costs jumped by about 50% in 2020 for nearly a third of data analytics professionals.

(Rrraum/Shutterstock)

Sensing a market opportunity, vendors like Archera have stepped up to be the middleman between clouds and customers and prevent unneeded instances from running.

Data Governance, Privacy, Security, and Ethics

The big data industry continued to grapple with thorny issues around data governance, privacy, security, and ethics. In March, Okera unveiled its approach to “distributed stewardship” while state privacy laws threatened to proliferate. Boston Consulting Group’s ethics chief gave us a six-part plan for avoiding ethical fails in April.

BigID aimed to prevent misuse of data with its 4 Cs approach–catalog, classification, cluster analysis, and correlation. Privacera aimed to address the dual mandates of data democratization and maintaining data governance. Confluent, meanwhile, got the governance bug and released a collection of governance tools for streaming data this fall.

GDPR turned 3 in May. China passed a tough new data privacy law in August, and the number of states and countries with specific data privacy laws continued to grow. Meanwhile, homomorphic encryption emerged as the potential salvation for utilizing that big pool of data you have collected.

Stay tuned for the second part of the 2021 big data year in review, coming soon to a Web browser near you.

Datanami