December 11, 2013

Finding Big Data Treasure in the Cloud

Alex Woodie

Heading into 2014, one of the big data trends that will intensify is the transition toward end-to-end data analytic services hosted in the cloud. One of the promising big data cloud services is Treasure Data, a Silicon Valley company that offers an interesting mix of MapReduce, columnar databases, and intelligent agent technology that’s aimed at helping clients get a quick return on their big data investments.

Treasure Data was launched two years ago by former Red Hat engineer Hiro Yoshikawa and Kaz Ohta, who helped implement one of the largest Hadoop clusters in Japan. Ohta was amazed with Hadoop’s capabilities, but not so thrilled about the level of technical complexity it entailed, says Rich Ghiossi, vice president of marketing for the company.

“He came out of that experience thinking this is really cool technology, but there are things that we can do differently,” Ghiossi says. “He said, ‘Let’s take all the difficulty away. Let’s not have to hire an army of people who know MapReduce or an army of people who know how to deploy a particular distribution on this particular hardware subset. Let’s make that all transparent to the user.'”

CTO Ohta and CEO Yoshikawa have strived to do that with Treasure Data. The offering runs on Amazon’s cloud, and combines a the MapReduce component of Cloudera’s CDH with Plazma, its own multi-tenant columnar database, which it uses instead of HDFS. Treasure Data also developed its own intelligent agent technology, called Treasure Agents, which pre-process and transform data before it’s loaded into the database for analysis.

“Our approach is to streamline that whole data pipeline, from data acquisition to storage to analysis,” Ghiossi says. “We do it in a relatively easy process, and start delivering value within days.” It typically takes at least 14 days for companies to can start getting meaningful data out of their Treasure-hosted Hadoop cluster, the company says.

As we reported in August, Treasure Data’s approach was validated with $5 million in Series A venture capital funding earlier this year. Since launching the service in 2012, the company has attracted more than 90 customers, and is now storing 2PB of data for its customers. That corresponds to about 2 trillion rows of data, an amount that doubled in the past eight weeks, Ghiossi says.

Today, Treasure Data announced a partnership with big data darling Tableau Software that will see Tableau’s popular data visualization software integrated into Treasure’s service. The company’s had joint customers in the past, but the new partnership will undoubtedly bring Tableau’s brand of hands-on visualization to more Treasure customers.

Customers are free to access their Treasure data in any way they want, but most use either a BI tool like Tableau’s or get to it through SQL, HiveQL, Pig, or MapReduce. As customers’ data builds up in Treasure, it can become more difficult to track it. So last month, the company unveiled a low-end visualization tool called Treasure Viewer that makes it easier for users to get a quick glimpse of their data. It also unveiled the Treasure Query Accelerator, which is a version of Cloudera Impala that was customized to work with its columnar database. The Treasure Query Accelerator can boost query performance by anywhere from 6 to 60x, Ghiossi says.

Treasure Data has customers in a variety of industries, including several Fortune 500 firms. But so far it’s found its best traction in in the online gaming and advertising spaces. One particular online gaming firm continuously feeds its Treasure Data environment with information about its players, including what customers are playing games, how long they’ve been playing, and at what stage of the game they’re in. The Treasure Data service sucks all this data in, and updates models about the players, which the company uses to help it sell ads.

Treasure Data keeps this company’s model updated every two minutes or so, which is as close to “real time” as the customer needs it. “As long as get that data in a couple minutes they can keep those models pretty much real time,” Ghiossi says. “What they’re looking for is to make sure those models are good, and that the models are dynamics, and that’s what we’re feeding.”

This type of use case–keeping large data model continuously fed with the latest sensor or machine data streaming in from the environment—will undoubtedly become more common as organizations move their Hadoop clusters from development into production. Depending on the industry, there will be different ways that an organization can monetize the vast amount of sensor and machine data and clickstream data. Building an IT infrastructure to do this sort of thing is no easy task, which is why Treasure Data senses such a promising market opportunity is about to unfold.

“There are a lot of technologies that can deal with big data. Even traditional database environments can deal with big data,” Ghiossi says. “But the part about big data that deals with sensor or log data or clickstream type data, and getting that into a service or on premise, is not an easy task. Being able to do that and provide value to the customer in a matter of days is a significant asset.”

Data Scientists–Who Needs Them Anyway?

Treasure Data Gains New Steam for Cloud-based Big Data

Applications: Complex Event Processing, Data Mining

Technologies: Cloud

Vendors: Startups and More...

Tags: Hadoop, Hive, impala, mapreduce

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Finding Big Data Treasure in the Cloud

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Finding Big Data Treasure in the Cloud

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link