Features

Enforcing Hadoop SLAs in a Big YARN World

Jul 23, 2014 |

The Apache Hadoop community has done a truly amazing job developing a scalable and versatile platform for big data analytic workloads. And with the recent introduction of YARN in Hadoop 2, we’re now able to run multiple analytic engines on our clusters simultaneously. Unfortunately, the prospect for resource contention has also gone up, and that will likely increase demand for service level agreement (SLA) enforcement. YARN made its big introduction just as companies started to move their Hadoop deployments out Read more…

FDA Mines Billing Data for Drug Interaction Insight

Jul 22, 2014 |

The Food and Drug Administration is five years into a pilot program aimed at identifying hazardous drug interactions by mining the medical billing records of millions of Americans. The program, dubbed Mini-Sentinel, is a creative application of big data technologies that has the potential to improve people’s lives. The FDA started funding the Mini-Sentinel project in 2009 with the goal of coming up a better way to monitor for unintended side effects of prescription drugs. It’s impossible to eliminate all Read more…

Streaming Analytics Ready for Prime Time, Forrester Says

Jul 22, 2014 |

Analytic platforms that generate insights from data in real time are mature enough for enterprises to begin adopting them, Forrester says in its latest report. While open source streaming analytic products like Apache Storm are proving popular, Forrester says they lack key functionality found in the offerings of proprietary vendors, such as top-rated Software AG. You don’t need a Forrester analyst to know that streaming analytics is red hot at the moment. If Hadoop has opened our eyes to what Read more…

Slicing and Dicing Music Data for Fun and Profit

Jul 21, 2014 |

The advent of big data analytics promises to have a profound impact on many aspects of human life, including how we work and play. Big data is even influencing the arts, where the field of music data science is rearranging our relationship with music. We’re in the midst of a boom in music data science that can be traced back to 1999, when two important events occurred. First, Shawn Fanning unleashed Napster to the world, thereby giving people the power Read more…

For Esri, Analytics All About Location, Location, Location

Jul 18, 2014 |

Certain analytic tools excel at manipulating with certain types of data. When it comes to data with a geographic bent, there may be no more influential vendor than Esri, a Southern California company that has quietly gobbled up a majority share of the geographic information systems (GIS) market. But now the company is positioning GIS as powerful way to visualize all types of data. Esri owns anywhere from 40 to 70 percent of the market for GIS software, according to Read more…

Inside Sibyl, Google’s Massively Parallel Machine Learning Platform

Jul 17, 2014 |

If you’ve ever wondered how your spam gets identified in Gmail or where personal video recommendations come from on YouTube, the answer is likely Sibyl, a massively parallel machine learning system that Google developed to make predictions and recommendations with user-specific data culled from its Internet applications. Dr. Tushar Chandra, a distinguished Google Research engineer, recently shared some information on Sibyl in a keynote presentation at the annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Sibyl is not Read more…

Software-Defined Storage Takes Off As Big Data Gets Bigger

Jul 16, 2014 |

The ongoing explosion of data is forcing users to adapt their storage methodologies beyond traditional file- and block-level storage. Object stores and software-defined storage mechanisms, in particular, are quickly gaining footholds at organizations that need to store massive unstructured data sets. Meanwhile, some vendors are promoting a twist on object stores with new “data-defined” storage techniques. File-level storage works well with traditional structured data, such as what you might find in a commercial accounting system using direct-attached storage devices (i.e. Read more…

Can You Trust Your Algorithms?

Jul 15, 2014 |

Algorithms are critical to how we interact with data. And as the volume and variety of data increases, so does our reliance on algorithms to give us the answers we seek. But how much faith should you put into those algorithms, and how can you be sure they’re not misleading you? They’re not simple questions, but through the use of algorithmic differentiation techniques, data scientists can get more precise answers. Algorithmic differentiation, sometimes called automatic differentiation, is a technique used Read more…

HP Throws Trafodion Hat into OLTP Hadoop Ring

Jul 14, 2014 |

Hewlett-Packard last month quietly unveiled Trafodion, an ANSI-compliant relational SQL database that’s now available as an open source product. With two decades of development at HP and the new capability to run on top of HBase, Trafodion could provide a big boost to efforts to run transactional workloads on Hadoop. The database technology behind Trafodion (which is Welsh for “transaction”) has been around for a long time at Hewlett-Packard, but it was dancing perilously close to the waste bin of Read more…

Where Does Spark Go From Here?

Jul 11, 2014 |

The excitement behind Apache Spark reached an apex last week during the 2014 Spark Summit put on by Databricks, the company behind the in-memory analytics phenomenon. With a large community of users and growing support from software vendors, the future for Spark certainly appears bright. But there’s a large amount of work ahead to fulfill the promise of Spark, including hardening various components. Providing an easier-to-use alternative to MapReduce is the first use case for Spark, which is said to Read more…