Big Data • Big Analytics • Big Insight

Technologies » Middleware

Features

Training Day: CrowdFlower Sets Human-Generated Data Free

Mar 4, 2015 |

Data scientists who are looking for high quality sets of curated data on which to train their machine learning models may want to check out CrowdFlower, which today unleashed a veritable treasure trove of free human-generated data. CrowdFlower today released about 40 data sets as part of its Data for Everyone campaign (see http://www.crowdflower.com/data-for-everyone). But over the coming weeks, the San Francisco company expects to make thousands of data sets available for download from its website, covering millions of records. Read more…

How to Get a ‘Network Effect’ from Your Big Data Lake

Mar 3, 2015 |

One of the hidden benefits of being a data-driven organization is a so-called “network effect” that occurs around data and analytics. When an organization has several successful big data analytics projects under its belt, it often becomes easier to see how data can be used to benefit the organization in profound new ways. Creating a Hadoop-based data lake is often the first step in going down the big data analytics road. Without data and a place to put it—often a Read more…

The 3 Key Steps to Building a Predictive App with Machine Learning

Mar 3, 2015 |

Machine learning is the technology that allows businesses to make sense of vast quantities of data, make better decisions, and ultimately bring better services to consumers. From personalized recommendations to fraud detection, from sentiment analysis to personalized medicine, machine learning provides the technology to adapt services to individual needs. For all the value that it brings, machine learning technology has a high cost. Building a predictive application is a multi-stage and iterative process that requires a plethora of people, systems Read more…

Big Data So Easy a Caveman Could Do It?

Feb 26, 2015 |

Let’s face it: big data isn’t easy. If you’re building a big data application today, you’re up to your eyeballs in things like R and Java, MapReduce and Pig, and Storm and Kafka. There’s a reason data scientists are so hard to find that they’re compared to unicorns. But in the future, the big data application assembly process may be dumbed down to the point where, as the insurance commercial says, even a caveman could do it. That’s the approach Read more…

Spark Steals the Show at Strata

Feb 25, 2015 |

There was a lot of good stuff on display at last week’s Strata + Hadoop World conference. But if there was one product or technology that stood out from the pack, that would have to be Apache Spark, the versatile in-memory framework that is taking the big data world by storm. At Strata, Spark creator Matei Zaharia showed how the technology will get even more powerful in the months to come. Spark has garnered an incredible amount of momentum, largely running Read more…

News In Brief

Actian Claims ‘Permanent Performance Advantage’ with SQL-on-Hadoop Tool

Mar 2, 2015 |

The SQL-on-Hadoop sweepstakes are by no means over. What’s been dubbed the “gateway drug” for Hadoop is just starting to gain traction. But according to Actian, its SQL-on-Hadoop offering, dubbed Vortex, is out to an early–and permanent–lead in the performance department. At the recent Strata + Hadoop World show, Actian pitted Vortex against Cloudera’s Impala right in the booth, where it largely re-created the results of a 2014 TPC Decision Support (TPC-DS) benchmark test that showed Vortex completing a job Read more…

‘Data and Goliath’ A Portrait of Big Data Abuses

Mar 2, 2015 |

A new book by security expert Bruce Schneier is raising serious questions about the state of privacy in the big data age, and whether giving corporations and government access to the most intimate details of our lives in exchange for convenience and security is a tradeoff we should be making. Since 9/11, Schneier has been an outspoken critic of the government’s sometimes ham-handed approach to security. Take the airport security checkpoints, for example. Is the economic loss from asking everybody Read more…

Apache Spark Ecosystem Continues To Build

Feb 25, 2015 |

Apache Spark was everywhere at the recent Strata + Hadoop World conference. From Tableau’s new Spark interface to the new Spark as a service (SaaS) offerings and Intel’s new Spark initiative, the big data framework was very hard to miss. Intel jumped on Spark’s bandwagon last week when it announced it was forming a new initiative around the in-memory framework. “We have engaged with Databricks, one of the pioneers of Apache Spark, to advance analytics capability for the Spark on Read more…

Snowflake Differentiates Itself in Strata Startup Showcase

Feb 23, 2015 |

Snowflake Computing, a big data warehousing as a service provider, took home top honors at the Startup Showcase event held during last week’s Strata + Hadoop World conference. The award is a boost to the Silicon Valley company, which aims to be a one-stop shop for analyzing data generated on the cloud. Snowflake emerged from stealth mode in October with $26 million in cash and a vision to create an “elastic data warehouse” that lives in the cloud. The company, Read more…

Cloudera Brings Kafka Under Its ‘Data Hub’ Wing

Feb 18, 2015 |

Cloudera is making Apache Kafka a supported part of its Hadoop distribution, the company announced today. While Kafka still doesn’t run on Hadoop, Cloudera says the changes it is instituting will help CDH customers build real-time analytics applications that span Hadoop and Kafka. Kafka is an open source message broker that’s designed to handle massive flows of streaming, real-time data, such as log data. The software was originally developed at LinkedIn, which uses it to process hundreds of millions of Read more…

This Just In