Big Data • Big Analytics • Big Insight

Features

Training Day: CrowdFlower Sets Human-Generated Data Free

Mar 4, 2015 |

Data scientists who are looking for high quality sets of curated data on which to train their machine learning models may want to check out CrowdFlower, which today unleashed a veritable treasure trove of free human-generated data. CrowdFlower today released about 40 data sets as part of its Data for Everyone campaign (see http://www.crowdflower.com/data-for-everyone). But over the coming weeks, the San Francisco company expects to make thousands of data sets available for download from its website, covering millions of records. Read more…

How to Get a ‘Network Effect’ from Your Big Data Lake

Mar 3, 2015 |

One of the hidden benefits of being a data-driven organization is a so-called “network effect” that occurs around data and analytics. When an organization has several successful big data analytics projects under its belt, it often becomes easier to see how data can be used to benefit the organization in profound new ways. Creating a Hadoop-based data lake is often the first step in going down the big data analytics road. Without data and a place to put it—often a Read more…

The 3 Key Steps to Building a Predictive App with Machine Learning

Mar 3, 2015 |

Machine learning is the technology that allows businesses to make sense of vast quantities of data, make better decisions, and ultimately bring better services to consumers. From personalized recommendations to fraud detection, from sentiment analysis to personalized medicine, machine learning provides the technology to adapt services to individual needs. For all the value that it brings, machine learning technology has a high cost. Building a predictive application is a multi-stage and iterative process that requires a plethora of people, systems Read more…

Novetta Throws Entity Analytics Hat Into Hadoop Ring

Mar 2, 2015 |

One of the new big data analytic vendors exhibiting at the recent Strata + Hadoop World conference was Novetta, a firm that’s well-known in the Washington D.C. area for its cyber analytic offerings. But now the company is widening its reach into the commercial market with a Hadoop-based solution called Novetta Entity Analytics. One of Novetta’s first customers in the big data space was an unnamed government security agency that was having trouble pulling useful information out of an 8-billion Read more…

Rating the Advanced Analytics Vendors

Feb 27, 2015 |

There are several ways you can go about obtaining the advanced analytic capabilities needed to extract insights from large amounts of data. You can outsource the whole thing to a services firm, you can buy pre-built applications for a specific industry, or you can buy tools that will let you build what you need. Last week, Gartner rated the top 16 such build-it-yourself tools in the advanced analytics category. The “Magic Quadrant for Advanced Analytics Platforms” that Gartner delivered last Read more…

Big Data So Easy a Caveman Could Do It?

Feb 26, 2015 |

Let’s face it: big data isn’t easy. If you’re building a big data application today, you’re up to your eyeballs in things like R and Java, MapReduce and Pig, and Storm and Kafka. There’s a reason data scientists are so hard to find that they’re compared to unicorns. But in the future, the big data application assembly process may be dumbed down to the point where, as the insurance commercial says, even a caveman could do it. That’s the approach Read more…

Spark Steals the Show at Strata

Feb 25, 2015 |

There was a lot of good stuff on display at last week’s Strata + Hadoop World conference. But if there was one product or technology that stood out from the pack, that would have to be Apache Spark, the versatile in-memory framework that is taking the big data world by storm. At Strata, Spark creator Matei Zaharia showed how the technology will get even more powerful in the months to come. Spark has garnered an incredible amount of momentum, largely running Read more…

Making Sense of the ODP—Where Does Hadoop Go From Here?

Feb 24, 2015 |

It was no coincidence that Hortonworks and Pivotal unveiled Open Data Platform last week at the start of Strata + Hadoop World, which is Cloudera’s semi-annual parade to everything Hadoop. But now that the dust has settled on that bombshell, let’s look a little closer at the ODP, the organization’s key members, and what it means to the Hadoop stack and ecosystem going forward. To recap: the ODP was unveiled one week ago by Pivotal, Hortonworks, IBM, and 12 other Read more…

The Wild West and Last Frontier of Big Data

Feb 23, 2015 |

We are in the Wild West of big data. The speed of processing keeps getting faster, while the volume of data that can be processed is beyond what could have been imagined just a few years ago. The Last Frontier of big data, meanwhile, is the discovery of value hidden in disparate data sources that have yet to be blended and harmonized. Just like the gold-seeking pioneers from centuries past, big data pioneers who embrace this challenge and blaze their Read more…

Outsmarting Wine Snobs with Machine Learning

Feb 20, 2015 |

For most of us, picking good wine is a bit like picking ponies: Everybody has a method, but at the end of the day, they results aren’t much better than chance. But one budding wine connoisseur/hacker at a big data analytics firm thinks he may have landed upon an approach to predicting the quality of wine. His secret? Machine learning. When he’s not solving big data problems or riding his road bike, H2O’s Alex Tellez enjoys exploring the world of Read more…