Follow Datanami:

Tag: mapreduce

Univa Gives ‘Pause’ to Big Data Apps

Scheduling workloads on today's big analytic clusters can be a big challenge. Your team may have carefully everything lined up, only to have a last-minute change leave your schedule in shambles. One company that's close Read more…

Pentaho Eyes Spark to Overcome MapReduce Limitations

Pentaho today announced it's supporting Apache Spark with its suite of data analytic tools. While supporting Spark gives Pentaho performance advantages over MapReduce when executing data transformations and running queri Read more…

How Hadoop Solved BT’s Data Velocity Problem

Like most large corporations with millions of customers, BT (British Telecom) has an extensive collection of databases, and is constantly moving data in and out of them. But when data growth maxed out a critical ETL serv Read more…

Wanted: A Plug-In Architecture for Hadoop Development

Hadoop is hard. There's just no way around that. Setting up and running a cluster is hard, and so is developing applications that make sense of, and create value from, big data. What Hadoop really needs now, says former Read more…

Google Cloud Dataflow Now Open for Business

Google today formally took the wraps off Cloud Dataflow, the hosted offering designed to allow developers with average Java and Python skills to build sophisticated analytic "pipelines" that process huge amounts of data. Read more…

From Spiders to Elephants: The History of Hadoop

Have you ever wonder where this thing called Hadoop came from, or even why it's here? Marko Bonaci has wondered such things, too. In fact, he wondered about them so much that he decided to write a History of Hadoop chapt Read more…

AtScale Claims to Mask Hadoop Complexity for OLAP-Style BI

AtScale came out of stealth mode today with new software designed to trick business intelligence tools into thinking that Hadoop is a standard database upon which they can perform OLAP-style analysis, as opposed to the h Read more…

Novetta Throws Entity Analytics Hat Into Hadoop Ring

One of the new big data analytic vendors exhibiting at the recent Strata + Hadoop World conference was Novetta, a firm that's well-known in the Washington D.C. area for its cyber analytic offerings. But now the company i Read more…

Microsoft Readies Major Push Into Big Data

Microsoft has a lot of irons in the fire. Always has and always will. But judging from its recent acquisition of Revolution Analytics, the early success of its hosted machine learning service, and the forthcoming public Read more…

Apache Flink Takes Its Own Route to Distributed Data Processing

Apache Flink, a distributed in-memory data processing framework project born out of Germany, this week graduated the Apache Incubator stage and became a Top-Level Project at the open source software foundation, paving th Read more…

Spark Just Passed Hadoop in Popularity on the Web–Here’s Why

Interest in Apache Spark surpassed Apache Hadoop for the first time last month, according to Google Trends. While it's not a definitive statement of Spark's actual impact on big data processing in the real world, it does Read more…

Spark Smashes MapReduce in Big Data Benchmark

Databricks today released benchmark results for Apache Spark running the Sort Benchmark, a competition for measuring the sorting performance of large clusters. Spark running on Hadoop sorted 100 TB of data in 23 minutes, Read more…

Hortonworks Hatches a Roadmap to Improve Apache Spark

Hortonworks today issued a broad and detailed roadmap outlining the investment it would like to see made to Apache Spark, the in-memory processing framework that has become one of Hadoop's most popular subprojects. The p Read more…

MapR Puts Apache Drill into Hadoop Distro

Organizations today demand tools that provide familiar SQL-based access to data stored on HDFS. Today, MapR Technologies gave its customers yet another SQL interface when it announced support for Apache Drill 0.5 in the Read more…

Three Things Apache Spark Needs to Out-Hadoop Hadoop

It's only September, but it's clear that 2014 will go down as the Year of Apache Spark. While the open source processing framework has gathered an enormous amount of momentum within the Hadoop ecosystem, there are three Read more…

Inside Sibyl, Google’s Massively Parallel Machine Learning Platform

If you've ever wondered how your spam gets identified in Gmail or where personal video recommendations come from on YouTube, the answer is likely Sibyl, a massively parallel machine learning system that Google developed Read more…

Apache Spark Gets YARN Approval from Hortonworks

Hortonworks today announced that Apache Spark is certified to work with YARN, the quarterback calling plays in next-gen Hadoop v2 clusters. The YARN stamp of approval clears the way for Hortonworks to fully support Spark Read more…

Google Re-Imagines MapReduce, Launches DataFlow

It's well known in the industry that more than 10 years ago Google invented MapReduce, the technology at the heart of first-generation Hadoop. It's less well known that Google moved away from MapReduce several years ago. Read more…

Moving Beyond ‘Traditional’ Hadoop: What Comes Next?

The phrase "traditional Hadoop" was heard early and often at this week's 2014 Hadoop Summit. While first-generation Hadoop technologies unlocked previously unseen potential in big data sets, it pales in comparison to wha Read more…

Yahoo: We Run the Whole Company on Hadoop

Hadoop is absolutely critical to the operations of Yahoo, executives with the company said this week at the Hadoop Summit. While the company, which spun out Hortonworks in 2011, is moving away from “traditional” Hado Read more…

Datanami