Follow Datanami:

Tag: mapreduce

Apache Spark Is Great, But It’s Not Perfect

Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. With that said Read more…

What Makes Apache Spark Sizzle? Experts Sound Off

Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us th Read more…

Data Catalogs Scale in the Cloud

Data cataloging software for Hadoop and other big data systems emerged as a hot item at last year's Strata + Hadoop World Expo. Among the proponents of data cataloging, which is designed to help classify and organize Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks. While Hadoop is no longer the conference headliner that it once Read more…

Pepperdata Takes On Spark Performance Challenges

Apache Spark has revolutionized how big data applications are developed and executed since it emerged several years ago. But troubleshooting slow Spark jobs on Hadoop clusters is not an easy task. In fact, it may even be Read more…

Cloudera Unveils Altus to Simplify Hadoop in the Cloud

Running Hadoop, whether on-premise or in the cloud, is neither simple nor easy. Administrators with specialized skills are needed to configure, manage, and maintain the clusters for their clients, who are data scientists Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. But appare Read more…

Meet Ray, the Real-Time Machine-Learning Replacement for Spark

Researchers at UC Berkeley's RISELab have developed a new distributed framework designed to enable Python-based machine learning and deep learning workloads to execute in real-time with MPI-like power and granularity. Ca Read more…

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Getting jobs to run on Hadoop is one thing, but getting them to run well is something else entirely. With a nod to the pain that parallelism and big data diversity brings, LinkedIn unveiled a new release of Dr. Elephant Read more…

Can Hadoop Be Simple Again?

In the beginning, Hadoop had two pieces: HDFS and MapReduce. Developers knew how to use them to build applications, and IT teams knew what it took to operate them. Fast forward to 2016, and developers have a cornucopia o Read more…

Hadoop Past, Present, and Future

Every few years the technology industry seems to be consumed with a shiny new object that gets hyped far beyond reality. At worst, the inevitable bursting of the hype bubble leads to the disappearance of the technology f Read more…

Apache Beam’s Ambitious Goal: Unify Big Data Development

If you're tired of using multiple technologies to accomplish various big data tasks, you may want to consider Apache Beam, a new distributed processing tool from Google that's now incubating at the ASF. One of the cha Read more…

Overcoming Spark Performance Challenges in Enterprise Hadoop Environments

Interest in Apache Spark is ballooning as word spreads about the real advantages it brings to the world of big data analytics. But like most new technologies, adopting Spark is not always smooth sailing--particularly if Read more…

Survey Sees Spark Emerging in 2016

This is the "Year of Spark," asserts a new big data survey on analytics priorities. The survey of more than 250 data scientists and architects, IT managers and business intelligence analysts released on Tuesday (Jan. Read more…

Picking the Right SQL-on-Hadoop Tool for the Job

SQL is, arguably, the biggest workload many organizations run on their Hadoop clusters. And there's good reason why: The combination of a familiar interface (SQL) along with a modern computing architecture (Hadoop) enabl Read more…

ScaleOut Pushes the Bottleneck in Latest IMDG Update

Each computer architecture, by definition, has a bottleneck that prevents it from performing faster. With the latest release of its in-memory data grid (IMDG) for performing data-parallel analytics, ScaleOut Software has Read more…

Cutting: Spark an ‘All-Around Win’ for Hadoop

Hadoop co-creator Doug Cutting said today that Apache Spark is "very clever" and is "pretty much an all-around win" for Hadoop, adding that it will enable developers to build better and faster data-oriented applications Read more…

Spark Is the Future of Hadoop, Cloudera Says

Apache Spark should be considered the default engine for Hadoop workloads going forward, taking the job that MapReduce held for many years, Cloudera announced today. The Hadoop distributor also announced its "One Platfor Read more…

Google Releases Dataflow, Announces Partners

Google is taking the wraps off its Dataflow hosted cloud service while announcing a batch of partnerships and third-party developers as part of an effort to reduce the operational hurdles associated with traditional data Read more…

Does InfiniBand Have a Future on Hadoop?

Hadoop was created to run on cheap commodity computers connected by slow Ethernet networks. But as Hadoop clusters get bigger and organizations press the upper limits of performance, they're finding that specialized gear Read more…

Datanami