Follow Datanami:
March 17, 2015

Tachyon Nexus Gets $7.5M to Productize Big Data File System

Tachyon Nexus, the company founded to productize the Tachyon in-memory file system developed at the AMPlab, has received $7.5 million in venture capital funding from Andreessen Horowitz, the Wall Street Journal reported today.

Tachyon is a new distributed file system that aims to dramatically speed the processing of big data analytics applications developed using today’s big data frameworks, including Apache Spark, Apache Shark, Hadoop MapReduce, and Apache Fink. The software, which doctoral student Haoyuan Li first unveiled in 2013, quickly became the most popular project in AMPlab history–surpassing even Spark.

As a replacement for HDFS, Tachyon allows multiple applications to access data stored in memory, and to do so without giving up the fault-tolerance that writing to spinning disk via HDFS has provided. Or, as the website explains:

“Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.”

The project attracted the attention of Peter Levine, who was an early developer of the Veritas File System before joining Andreessen Horowitz. “Tachyon’s memory-centric architecture represents the future of storage,” Levine wrote today on the venture firm’s blog, Software Is Eating The World.

“Until now, we have always had tiered memory [RAM, disk drives, tape], which served to trade off cost and performance, and led to an entire computing architecture that has persisted for the past 40 years,” Levine writes. “Memory-centric computing will flatten this memory hierarchy and completely up-end compute architectures. It’s a revolutionary concept, one that has never existed before… and we are finally close to achieving this milestone.”

TachyonAccording to Levine, the Tachyon file system will enable a 100x speed boost over HDFS–even HDFS running in memory. That kind of performance, combined with backwards compatibility with HDFS and NFS, that will change the dynamics of big data analytics.

Tachyon is currently used by more than 50 companies, including some production deployments as large as 100 nodes in production, according to Levine, who is a member of the board for Tachyon Nexus, the Berkeley-based company that Li founded to productize the open source software.

“With Tachyon as the ‘memory-centric’ storage layer, our investments [including Databricks and Mesosphere) in the memory-centric infrastructure out of the “badass” Berkeley Data Analytics Stack (BDAS)–from big data storage to compute–is complete.

Related Items:

Apache Flink Takes Its Own Route to Distributed Data Processing

AMPLab’s Tachyon Promises to Solidify In-Memory Analytics

Databricks Takes Apache Spark to the Cloud, Nabs $33M