Follow Datanami:
August 19, 2020

Pachyderm Gains Microsoft Funding, Launches Hub

A startup launched as a Hadoop alternative in the form of a container-based big data platform continues to attract investors to its open source data science framework.

Pachyderm Inc. said this week its $16 million Series B fund was led by M12, Microsoft’s (NASDAQ: MSFT) venture fund. New investors include Decibel Ventures, which is backed by Cisco Systems (NASDAQ: CSCO), and returning investors, among them, Benchmark and Y Combinator.

Benchmark was an early investor in other big data startups that have gone on to be successful, including GitHub, MongoDB (NASDAQ: MDB) and Elastic.

The San Francisco-based data science vendor also on Wednesday (Aug. 19) announced general availability of Pachyderm Hub, a managed service that had been offered in public beta form since last November.

The platform allows users to comply with emerging AI legal standards while insuring that machine learning developers can accurately recreate and repeat data science experiments. The ability to deliver “data lineage” is seen as a key step toward explainable AI.

Pachyderm’s platform targets machine learning pipelines and ETL workflows, managing data and models while tracking output directly to the input datasets from which they were created. The result is data provenance. Promoted as “Git for data science,” a reference to the GitHub code repository, the service provides data science teams with version control for software development tools.

Pachyderm also announced new members of its board, including Nagraj Kashyap, head of Microsoft’s venture fund. Data versioning is “an imperative for organizations of all sizes,” Kashyap said.

Founded in 2014, Pachyderm announced a $10 million funding round in November 2018. It has so far raised about $28 million. The company said it would use the new venture funding to scale development of its hub as well as to accelerate hiring.

Pachyderm burst onto the scene in 2014, catching the application container wave as a way to reimagine the data analytics infrastructure in general and Hadoop in particular. It embraced tools like Docker containers and CoreOS orchestrations tools eventually acquired by IBM’s Red Hat unit. The combination made data platforms more modular and able to scale from individual users to an entire company.

Since then, the startup’s list of customers has grown to included Shell (NYSE: RDS.A) and AgBiome along with pharmaceutical and bioinformatics companies and government agencies.

Recent items:

Inside Pachyderm, a Containerize Alternative to Hadoop

Container Specialist Tops Strata Startup List