Big Data • Big Analytics • Big Insight

Tag: Hadoop

Big Data So Easy a Caveman Could Do It?

Feb 26, 2015 |

Let’s face it: big data isn’t easy. If you’re building a big data application today, you’re up to your eyeballs in things like R and Java, MapReduce and Pig, and Storm and Kafka. There’s a reason data scientists are so hard to find that they’re compared to unicorns. But in the future, the big data application assembly process may be dumbed down to the point where, as the insurance commercial says, even a caveman could do it. That’s the approach Read more…

Making Sense of the ODP—Where Does Hadoop Go From Here?

Feb 24, 2015 |

It was no coincidence that Hortonworks and Pivotal unveiled Open Data Platform last week at the start of Strata + Hadoop World, which is Cloudera’s semi-annual parade to everything Hadoop. But now that the dust has settled on that bombshell, let’s look a little closer at the ODP, the organization’s key members, and what it means to the Hadoop stack and ecosystem going forward. To recap: the ODP was unveiled one week ago by Pivotal, Hortonworks, IBM, and 12 other Read more…

Snowflake Differentiates Itself in Strata Startup Showcase

Feb 23, 2015 |

Snowflake Computing, a big data warehousing as a service provider, took home top honors at the Startup Showcase event held during last week’s Strata + Hadoop World conference. The award is a boost to the Silicon Valley company, which aims to be a one-stop shop for analyzing data generated on the cloud. Snowflake emerged from stealth mode in October with $26 million in cash and a vision to create an “elastic data warehouse” that lives in the cloud. The company, Read more…

Cloudera Brings Kafka Under Its ‘Data Hub’ Wing

Feb 18, 2015 |

Cloudera is making Apache Kafka a supported part of its Hadoop distribution, the company announced today. While Kafka still doesn’t run on Hadoop, Cloudera says the changes it is instituting will help CDH customers build real-time analytics applications that span Hadoop and Kafka. Kafka is an open source message broker that’s designed to handle massive flows of streaming, real-time data, such as log data. The software was originally developed at LinkedIn, which uses it to process hundreds of millions of Read more…

MapR Delivers Bi-Directional Replication with Distro Refresh

Feb 18, 2015 |

A new release of the MapR Distribution including Hadoop unveiled today will enable companies to perform real-time, bi-directional data replication between Hadoop clusters that are thousands of miles apart. The new table replication feature was added to MapR-DB, the NoSQL database included with the high-end edition of MapR’s commercial Hadoop offering. As Hadoop adoption grows, companies are finding it increasingly difficult to ensure that they’re acting on the latest, freshest data. This fast-data problem is particularly evident in organizations that Read more…

Plugging Leaks in Big Data Lakes

Feb 17, 2015 |

The big data lake phenomenon is in full swing at the moment, with Hadoop playing a central role in the storage and processing of massive amounts of data. But without certain processes in place, a data lake will not stand the test of time. Unfortunately, most of those processes must be implemented manually today. People today are expecting too much out of Hadoop, and therefore setting themselves up for failure. While Hadoop provides the basic structure for storing and analyzing Read more…

Pivotal Throws in with Hortonworks and Open Source

Feb 17, 2015 |

Pivotal today pulled the plug on its proprietary big data strategy and uncorked a major repositioning that involves making core products like HAWQ, Greenplum, and GemFire open source and aligning its Hadoop fortunes with one-time rival Hortonworks. The software company also was revealed to be a founding member of the new Open Data Platform, which also launched today. Pivotal’s partnership with Hortonworks has several components, including product integration, joint engineering, and technical support. While Pivotal will continue to develop its Read more…

Why ‘Data Lakes’ May Create Drowning Risks

Feb 16, 2015 |

Many organizations tackling Big Data projects find themselves swimming in uncharted waters, but the concept of a “data lake” may be at least one way to keep them from wading in too deep. A data lake can be defined as an environment where a data warehouse resides within Hadoop. The idea is to bring greater efficiency to managing unstructured information. The trade-off is that those using the data lake approach are putting all of their eggs in one basket, which Read more…

Project Myriad Brings Hadoop Closer to Mesos

Feb 12, 2015 |

One of the challenges of running Hadoop is resource management. The process of spinning up and managing hundreds, if not tens of thousands, of server nodes in a Hadoop cluster—and spinning them down and moving them, etc.–is way too hard to do manually. Automation must come to the table to help Hadoop take the next step forward in its evolution. The big question is how it will unfold. One answer to that question came to the forefront yesterday when a Read more…

How Advances in SQL on Hadoop Are Democratizing Big Data–Part 1

Feb 10, 2015 |

September 2014 marked the anniversary of Edgar F. Codd’s 1969 introduction of “A Relational Model of Data for Large Shared Data Banks”, which is a compellation of research and theories that ultimately provided the foundation for  modern relational SQL databases. Since the start of widespread adoption of relational databases, analytics professionals have invested decades of experience in SQL expertise. More recently, Hadoop and other emerging platforms have disrupted the relational paradigm, introducing unfamiliar concepts for both data management and analysis. Read more…