Follow Datanami:
August 23, 2013

Big Data Big Five

Isaac Lopez

Another interesting week following the big data trends…  MongoDB rides the elephant, as 10Gen announces new features in its Hadoop connector; Cloud BI player Birst gains $38 million reasons to sell more software; Talend takes some air out of the big data hype balloon; and more…

MongoDB Now Compatible with Hive

MongoDB got a boost this week, with 10Gen announcing that they’ve upgraded its Hadoop connector, which allows data movement between the two new-age databases, giving it significant new abilities – including the ability to run MapReduce jobs on live application data from the MongoDB. Along with this ability, 10Gen says they have added support for Apache Hive, allowing for live querying across Mongo data sets with the popular SQL-like query engine.

This added functionality is good news for vendors who are using the popular database in conjunction with Hadoop. (note: we recently covered a company that implemented MongoDB with Hadoop to provide real-time analysis of manufacturing machine data) As a JSON friendly system, MongoDB has garnered a reputation for ease of use, and adding more functionality to underline this aspect of can be seen as a solid step in the right direction.

10gen also announced this week that data compiled from third party sources, including Google Trends, Stack Overflow, and LinkedIn that they have the largest big data database communities – a lofty claim, but one that they’ve received support on from the 451 Research group.

“Our research into the NoSQL database space indicates that interest in MongoDB is outpacing interest in other NoSQL database technologies by some margin,” said Matt Aslett, research director, data management and analytics, 451 Research. “These results illustrate that, whichever way you look at it, MongoDB is a driving force behind the emerging Big Data technology landscape.”

Despite the claims of having the most active community, the move to better position itself with Hadoop shows who the real elephant in the room is – so to speak. While MongoDB has gained acclaim for its ease-of-use, it’s performance at scale is where it tends to fall down, making an opening for technologies such as HBase or Cassandra.

NEXT – Birst gets a $38M Burst >>>Birst Raises $38 Million

Cloud BI Provider, Birst, announced this week that they have received new cash in their coffers to attack the legacy on-premise analytics services with its brand of distributed business analytics. The new round, led by Sequoia Capital, rung the register to the tune of $38 million and includes all previous investors, as well as new investor Northgate Capital.

The new funding is said to power the San Francisco-based company’s push into new international markets, including Europe, Asia, and the Middle East, as well as drive further product development, where the company already has a mature and well rounded suite of BI services providing analysis tools from a wide variety of databases, including both the traditional RDBMS, as well as the big data oriented databases such as Hadoop, and Cassandra.

The company says they’ve added 13 new partners into its ecosystem in 2013, including firms such as Acumen, Analytics8, Audaxium, CorSource, Eagle Creek, The Pedowitz Group, Projectline, TeleTech and 3Coast, as well as new strategic alliances were formed with Amazon, NetSuite, and Marketo.

NEXT – Talend Survey Takes Some Air Out of the Big Data Hype Balloon >>>Talend Survey Takes Some Air Out of the Big Data Hype Balloon

Open source data integration software vendor, Talend, this week announced the results of a survey which they claim pours some cold water on big data hype. According to the survey, while enthusiasm for big data is growing, the actual implementation of big data technologies is coming along slowly.

Talend says that last year, when asked the question about interest in big data within their organization, there was a large portion (61%) who said that there was no interest. That number fell to 24 percent this year, showing a dramatic change in the discussion is happening within organizations as they wake up to the trend. However, despite this change in discussion, only 19 percent of those surveyed say that they were at planning or appraisal stage. On the upside, this number has more than doubled from the year previous, where it was at 8%.

With more tire-kickers in the market, has this led to more pilots projects as well? The answer, according to the survey, is no. Talend says that the number of big data pilots has remained static at 4 percent year-over-year. This said, there is good news on actual large scale roll-outs, where 10 percent of the respondents say their companies have engaged in such – a number that is up from 2% in the year previous.

Budgetary contraints leads the list of barriesr to big data, with skills being a challenge for companies looking to move along with the trends.

“It is encouraging that the number of businesses rolling out big data strategies has increased, but overall adoption of big data strategies remains slow,” says Yves de Montcheuil, Vice President, Marketing, Talend. “There is still a significant gap between those businesses expressing an interest and those taking the plunge and actually implementing the approach. It is a gap that the industry needs to address and close if the promise of big data is to be fulfilled.”

NEXT – Tresata Launches Open Source Algorithm Library for Mahout & Hadoop >>>Tresata Launches Open Source Algorithm Library for Mahout & Hadoop

Hadoop software vendor, Tresata, announced this week that they have developed the first open source algorithm library built completely on scalding, designed to work within/in Mahout & Hadoop. Scalding, a Scala library, wicks away low-level Hadoop complexities, aiming to make it easier for developers to specify Hadoop MapReduce jobs. The library is used as the core API for all of Tresata’s Hadoop-focused software.

Tresata’s boasts that their new library, dubbed “ganitha,” is the first open source implementation of machine learning and statistical techniques on Scalding. “The core idea behind ganitha was to make complex pieces of MapReduce logic available in a much more clean, simple and powerful abstraction that allows you to run real world algorithms at massive scale,” explained Koert Kuipers, CTO of the company in a blog post this week.  Kuipers says that the nedd to have sparce vectors available in scalding with compact in-memory and serializable representations drove the integration of Mahout vectors into scalding.

Abhishek Mehta, COE and co-founder of Tresata, explained that the decision to open source ganitha stems from a fundamental belief that the intellectual property isn’t in the algorithms, but in how and where they are applied.

Ganitha can be found at its GitHub home here.

NEXT – Vertica a Shining Star for HP >>>Vertica a Shining Star for HP

With HP’s CEO, Meg Whitman, warning that they’re facing a no growth year for the technology giant, the company has undergone a shake-up in its Enterprise Group, reported Wilibon’s Jeff Kelly in an article this week.

Despite revenues being down 8% year-over-year, it had one bright spot in their Vertica big data analytics platform, bringing their software revenues up 1% year-over-year to hit $982 million. Whitman says that growth for the columnar storage platform reached triple digits for the second quarter.

HP acquired Vertica three years ago, and is now reaping the rewards, after what Kelly says was a prolonged period of sales force alignment, with much larger deals coming in than in years past. Moving forward, Kelly says the company will market Vertica as part of its larger big data platform, HAVEn, which was announced at HP Discover this past June.

| — Return to Datanami Home Page >>>

Datanami