Tag: Hadoop

Hadoop Data Virtualization from Cask Now Open Source

Sep 25, 2014 |

Continuuity, a big data startup that seeks to drive complexity out of Hadoop by virtualizating data and applications, today announced that it’s changing its name to Cask and making its software open source. The company also open sourced a streaming engine now named Tigon, and announced the hiring of former Intel executive as COO. Former Facebook engineer Jonathan Gray co-founded Continuuity with former Yahoo engineer Nitin Motgi about three years ago to address the challenges they saw enveloping the Hadoop Read more…

Self-Provision Hadoop in Five Clicks, BlueData Says

Sep 17, 2014 |

Forget the data science–in some organizations, just getting access to a Hadoop cluster is a major obstacle. With today’s launch of EPIC, the software virtualization company BlueData says analysts and data scientists can self-provision a virtual Hadoop cluster in a matter of seconds, enabling them to iterate in a faster and more agile fashion. If things go as planned, BlueData‘s new EPIC product will usher in a new level of failure for Hadoop users around the world. “If you want Read more…

Three Things Apache Spark Needs to Out-Hadoop Hadoop

Sep 15, 2014 |

It’s only September, but it’s clear that 2014 will go down as the Year of Apache Spark. While the open source processing framework has gathered an enormous amount of momentum within the Hadoop ecosystem, there are three areas where the Spark community should focus on if it’s going to shine brighter in 2015. Apache Spark stormed the big data scene early in the year, becoming the Hot New Thing in an industry that generates Hot New Things at increasingly breakneck Read more…

Comcast Develops Advanced Advertising Platform to Handle Real Time Big Data

Sep 15, 2014 |

Comcast is working with national, regional and local advertisers to use data in meaningful and privacy-compliant ways to inform their advertising strategies and maximize their advertising spend. For Nathaniel Auvil, a Distinguished Engineer with the company’s Engineering and Platform Services Group, it’s applying the latest in high performance computing (HPC) capabilities to Comcast’s advertising offerings. Specifically, he has been designing and developing systems that enable Comcast to analyze data.  For example, he has developed systems that enables advertising on Comcast’s Read more…

How a Web Analytics Firm Turbo-Charged Its Hadoop ETL

Sep 10, 2014 |

The Web analytics firm comScore knows a thing or two about managing big data. With tens of billions of data points added to its 400-node Hadoop cluster every day, the company is no stranger to scalability challenges. But there’s one ETL optimization trick in particular that helped comScore save petabytes of disk and improve data processing times in the process. ComScore is one of the biggest providers of Web analytics used by publishers, advertising firms, and their clients. If you Read more…

MapR Reports Accelerated OpenTSDB Performance

Sep 9, 2014 |

Eyeing new Internet of Things (IoT) applications, MapR Technologies said its open-source distribution of Apache Hadoop “ingested” more than 100 million data points per second. The performance benchmark for the MapR distribution with its in-Hadoop NoSQL database, MapR-DB, was achieved using only four nodes of a ten-node cluster. By accelerating its OpenTSDB software by a factor of 1,000 on a small cluster, MapR claimed the performance clears the way for managing huge amounts of data along with IoT and other Read more…

Survey: Big Data Deployments Reaching ‘Tipping Point’

Sep 8, 2014 |

Most big data deployments are still being evaluated but may be approaching a “tipping point” as they move toward production, according to big data market researcher Wikibon. Wikibon analysts stressed that much of its recent survey data represents the sentiments of early adopters of big data analytics, adding that analytics technologies are still “relatively immature.” Another Wikibon survey last year concluded that nearly half of big data practitioners had yet to realize a return on their data analytics investment. Despite Read more…

Five Steps to Running ETL on Hadoop for Web Companies

Sep 1, 2014 |

Mention ETL (Extract, Transform and Load) and eyes glaze over. The thought goes: “That stuff is old and meant for clunky enterprise data warehouses. What does it have to do with my Internet/Web/ecommerce application?” Quite a lot, actually. ETL did originate in enterprise IT where data from online databases is Extracted, then Transformed to normalize it and finally Loaded into enterprise data warehouses for analysis. Although Internet companies feel they have no use for expensive, proprietary data warehouses, the fact Read more…

Hadoop Labor Update: Cloudera Talks Impala 2.0 as Hortonworks Previews Kafka

Aug 29, 2014 |

Say what you will about Hadoop (and we do), the big data platform is evolving at an incredible rate. This week, two of the biggest Hadoop distributors, Hortonworks and Cloudera, shared how they’re working to improve two key aspects of the platform: real-time data pipelining via Apache Kafka and SQL-based data warehousing via Impala. Let’s start with Cloudera. This week, the Hadoop distributor announced that the upcoming release of Impala 2.0 will add much more complete SQL functionality to CDH, Read more…

Why Hadoop Isn’t the Big Data Solution You Think It Is

Aug 26, 2014 |

Hadoop carries a lot of promise in the IT world for the way it has democratized access to massively parallel storage and computational power. But the level of hype that surrounds Hadoop is disproportionate to its present capabilities, raising the possibility of a big data letdown of elephantine proportions. The emergence of Hadoop as a next-generation platform for parallel computing has piqued the interest of customers and investors alike. What mid-sized company looking for a big data edge wouldn’t want Read more…