February 3, 2015

Tachyon Support Coming to Big Data Hypervisor

Alex Woodie

Organizations that are deploying Apache Spark to do data science on big data may be inclined to invest in Tachyon, the in-memory file system that was developed next to Spark at the AMPlab. Getting Spark and Tachyon spun up and deployed on bare metal can be a hassle, but it’s a business opportunity to BlueData, which is aiming to be the VMware of big data.

Tachyon is a distributed, in-memory file system designed to enable reliable file sharing at memory-speed across cluster frameworks. The software–which sits above HDFS in the AMPlab diagram–is emerging as a potentially key component to enable the creation of big data analytics pipelines that touch many different engines, such as Spark, Hive, and MapReduce.

The folks at BlueData see a lot of potential in Tachyon, so much that it decided to the file system in EPIC, the name of its big data virtualization platform that lets non-technical users spin up big data clusters with just a few mouse clicks. BlueData, which took home first place in Strata’s recent Startup Showcase, already supports Hadoop distributions from Cloudera and Hortonworks in its software, in addition to Apache Spark, HDFS, and other file systems like Gluster and NFS.

With support for Tachyon now set to be formally unveiled as a tech preview at the upcoming Strata show, BlueData feels it’s well-positioned to help companies remove the shackles preventing them from riding their on-premise clusters into big data’s wild blue yonder, with all the agility, flexibility, and extensibility that is normally afforded only to cloud deployments.

“We always felt that Tachyon had a very high potential to provide the underlying in-memory file system” for emerging big data applications, says Kumar Sreekanti, CEO and co-founder of Mountain View, California-based BlueData. “We recognize that, as much as Hadoop has garnered interest, we think that in-memory or real-time processing will be here to stay. And there will be new frameworks and new applications that will be coming.”

Tachyon clearly is one of those core infrastructure components that BlueData is betting on will gain traction as real-time analytics becomes more prevalent. Before founding BlueData with his former VMware colleague Tom Phelan, Sreekanti spent time at the AMPlab, the University of Berkeley project that gave rise to Apache Spark and Tachyon. Ion Stoica, the co-director of the AMPLab and CEO of Spark-backer Databricks, is also an adviser to BlueData.

Getting Tachyon running is not a trivial manner, but BlueData says it can take a lot of the headache and hassle out of managing Tachyon as a virtual asset that can be easily created, duplicated, moved, and destroyed without impacting the actual hardware resources that sit underneath it. It is doing that today for Hadoop and Spark, and will soon be doing that for Tachyon with EPIC.

“The real value proposition here is you spin up a Tachyon file system once in the platform, and multiple clusters and multiple users can leverage that shared in-memory file system,” says Anant Chintamaneni, vice president of products for BlueData.

“What happens today is that most people will bring up a Hadoop cluster then manually get the Tachyon code and integrate it with their Hadoop cluster,” he continues. “If they move from that Hadoop cluster to another Hadoop cluster, then they have to spin the Tachyon file system up all over again. If they want to enable these environments for developers, it’s very tedious to integrate Tachyon into each of these environments. Today none of the Hadoop management tools or anybody else out there—Cloudera Manager, Ambari, what have you–can support Tachyon today in their install process.”

While it was developed as part of AMPlab’s stack–with Mesos as the underlying resource manager and Spark providing high-level interfaces for SQL computing, stream processing, graph databases, R and others—there’s nothing preventing Tachyon from being adopted by other bid data engines in the Hadoop big data stack. “Tachyon is HDFS API compliant,” Chintamaneni says. “You can store data in Tachyon and run a Hive job against it or a MapReduce job against it, in addition to Spark.”

While the entire Hadoop stack may benefit, it will be Spark that ultimately drives Tachyon adoption, Chintamaneni says. “I think Tachyon and Spark are made for each other,” he says. “As you see Spark gaining momentum, and as folks start using Spark for more real use cases with larger data volumes, I think they’ll start seeing some of the issues with Spark that are going to be the driver for bringing Tachyon into the stack.”

Self-Provision Hadoop in Five Clicks, BlueData Says

BlueData Eyes Market for Hadoop VMs

Applications: Complex Event Processing, Data Mining, Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services, Healthcare, Retail

Vendors: BlueData, Cloudera, Databricks, Hortonworks, VMware

Tags: big data, Hadoop, hypervisor, in-memory, Spark, Tachyon

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Tachyon Support Coming to Big Data Hypervisor

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Tachyon Support Coming to Big Data Hypervisor

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link