Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
Vendors » Startups and More...
The Hungarian firm Radoop today unveiled the second version of its eponymous product, which integrates Rapid Miner’s data mining and predictive analytic tools atop Apache Hadoop. Radoop 2.0 brings many of the features available in Hadoop version 2, including support for YARN, as well as new operators and easier scoring of models.
Enterprises eager for a competitive edge are turning to in-memory stream processing technologies to help them analyze big data in real time. The Apache Spark and Storm projects have gained lots of momentum in this area, as have some analytic NoSQL databases and in-memory data grids. Another streaming technology worth keeping an eye on is DataTorrent.
The surge in popularity of genetic testing is creating a tidal wave of data that people are eager to use to improve their health. But as we saw with 23andMe, there is a concern that raw genetic information can be dangerous in the wrong hands. Today BaseHealth emerged from stealth mode with an innovative approach to this problem that blends data from genetic testing, scientific research, and a patient’s own medical history, all under the supervision of a doctor.
The big data industry has made a compelling case for Hadoop as the core platform for a big data strategy. But deploying and maintaining a well-run Hadoop environment is challenging, to say the least. This difficult has driven the Hadoop-as-a-Service (HaaS) market. However, not all HaaS offerings are the same. Here is a look at the different types of HaaS available today, as well as the nine key criteria that will make your HaaS evaluation a success.
Splunk has enjoyed the first-mover advantage when it comes to analyzing machine-generated data for fun and profit. But as the Internet of Things begins to take off and the machine-generated data seriously begins to fly, the developer of proprietary software is finding increased competition from the open source realm, namely from Elasticsearch, which just snatched away a Splunk VP.
News In Brief
At first glance, the partnership that Cloudera and MongoDB unveiled today is a bit of a head scratcher. While the two companies are arguably the biggest software vendors in the nascent space, they swim in opposite ends of the big data pool. It turns out, that’s exactly why the companies felt they needed to work together.
An in-memory big data analytics appliance dubbed NumaQ is said to allow up to 32-terabytes of data to be loaded and analyzed in memory, helping to eliminate a major bottleneck in data analytics performance.
A Washington, D.C., startup is touting an analytics platform that sifts through state and federal legislative data to predict whether proposed bills will actually become law.
Social media behemoth Twitter said it is acquiring long-time data partner Gnip in a move designed to leverage aggregated Twitter data to pinpoint consumer trends and link with customers.
A speedy operational database integrated with a real-time analytics platform widely used by telecommunications service providers could help carriers extract information about millions of mobile subscribers, then monetize data through revenue generators like contextual advertising.