Tag: Hadoop

Five Steps to Running ETL on Hadoop for Web Companies

Sep 1, 2014 |

Mention ETL (Extract, Transform and Load) and eyes glaze over. The thought goes: “That stuff is old and meant for clunky enterprise data warehouses. What does it have to do with my Internet/Web/ecommerce application?” Quite a lot, actually. ETL did originate in enterprise IT where data from online databases is Extracted, then Transformed to normalize it and finally Loaded into enterprise data warehouses for analysis. Although Internet companies feel they have no use for expensive, proprietary data warehouses, the fact Read more…

Hadoop Labor Update: Cloudera Talks Impala 2.0 as Hortonworks Previews Kafka

Aug 29, 2014 |

Say what you will about Hadoop (and we do), the big data platform is evolving at an incredible rate. This week, two of the biggest Hadoop distributors, Hortonworks and Cloudera, shared how they’re working to improve two key aspects of the platform: real-time data pipelining via Apache Kafka and SQL-based data warehousing via Impala. Let’s start with Cloudera. This week, the Hadoop distributor announced that the upcoming release of Impala 2.0 will add much more complete SQL functionality to CDH, Read more…

Why Hadoop Isn’t the Big Data Solution You Think It Is

Aug 26, 2014 |

Hadoop carries a lot of promise in the IT world for the way it has democratized access to massively parallel storage and computational power. But the level of hype that surrounds Hadoop is disproportionate to its present capabilities, raising the possibility of a big data letdown of elephantine proportions. The emergence of Hadoop as a next-generation platform for parallel computing has piqued the interest of customers and investors alike. What mid-sized company looking for a big data edge wouldn’t want Read more…

Microsoft Blends NoSQL with Relational DB in the Cloud

Aug 22, 2014 |

Microsoft yesterday unveiled Azure DocumentDB, a new cloud-hosted databases that adds elements of a relational database, such as SQL-like queries and transactional processing, into a document-oriented NoSQL database. The software giant also added HBase support to its hosted Hadoop solution and revealed a preview of Azure Search. The flexible schemas and distributed scalability of NoSQL databases have garnered them plenty of action lately, particularly when it comes to running large Websites and mobile applications. Document-oriented databases, such as those offered Read more…

TPC Crafts More Rigorous Hadoop Benchmark From TeraSort Test

Aug 18, 2014 |

While Moore’s Law has made computing and storage capacity less expensive with each passing year, the amount of data that companies are storing and the number and sophistication of the algorithms that they want to employ on that data to perform analytics is growing faster than the prices are dropping. And that means the bang for the buck of the underlying hardware and the analytics software that runs atop it matter. The trouble is that benchmarking systems takes far too Read more…

AMPLab’s Tachyon Promises to Solidify In-Memory Analytics

Aug 14, 2014 |

U.C Berkeley’s AMPLab first landed on the radar screens of data scientists with Apache Spark, which promises to provide an in-memory data processing framework to replace or augment MapReduce. More recently, the tech wizzes at AMPLab have whipped up Tachyon, a new distributed file system that sits atop HDFS and aims to allow multiple Hadoop or Spark applications and jobs to access the same data at memory speeds without fears of corrupting it. The rapid rise of Apache Spark demonstrates Read more…

Here’s Another Option for Hadoop Enterprise Search

Aug 8, 2014 |

The software stacks of many Hadoop distributions feature Apache Lucene and Solr as the enterprise search component. But the folks at the French firm Sinequa say Hadoop customers will get more actual work done–and quickly analyze massive amounts of poly-structured data from dozens of other sources in multiple languages–by using its enterprise search solution. Hadoop, machine learning algorithms, and graph databases may get most of the headlines in our big data world, but good old search engines continue to be Read more…

Dremel Builder Gets $7M for SQL-Based Supertool

Aug 5, 2014 |

Big data startup Metanautix emerged from stealth mode today by announcing a $7-million round of venture funding to further development of a SQL-based power tool. Led by the former Google engineer who headed the development of Dremel, the company aims to dissolve product and technology barriers by “re-imagining” SQL at the heart of an emerging big data supply chain. SQL is enjoying a renaissance as the big data boom continues to reverberate throughout the IT and business sectors. While emerging Read more…

Are Data Lakes All Wet?

Aug 4, 2014 |

Enterprise data management platforms known as “data lakes” are being promoted as, among other things, a potential solution to “information siloes” by combining different managed collections of data in an unmanaged data lake. The theory is that data consolidation will increase use and sharing of information while reducing storage and server costs. However, a new market study dismisses most of those claims as a “fallacy,” arguing instead that enterprises still require secure data repositories, in other words, data warehouses. At Read more…

How Streaming Analytics Helps Telcos Overcome the Data Deluge

Jul 30, 2014 |

Real-time streaming analytics is all the rage these days, as organizations seek to wring value from their data as quickly as possible. While the technology is bleeding edge for many, it’s commonplace in the telecommunications industry, where vendors like Guavus are leveraging the power of Hadoop and streaming analytics to help telcos not only survive the data deluge, but thrive within it. Things are a bit different in telecommunications. While companies in other fields may experiment with new technologies, tier-one Read more…