Follow Datanami:

Tag: mapreduce

Apache Spark Is Great, But It’s Not Perfect

Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. With that said Read more…

What Makes Apache Spark Sizzle? Experts Sound Off

Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us th Read more…

Data Catalogs Scale in the Cloud

Data cataloging software for Hadoop and other big data systems emerged as a hot item at last year's Strata + Hadoop World Expo. Among the proponents of data cataloging, which is designed to help classify and organize Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks. While Hadoop is no longer the conference headliner that it once Read more…

Pepperdata Takes On Spark Performance Challenges

Apache Spark has revolutionized how big data applications are developed and executed since it emerged several years ago. But troubleshooting slow Spark jobs on Hadoop clusters is not an easy task. In fact, it may even be Read more…

Cloudera Unveils Altus to Simplify Hadoop in the Cloud

Running Hadoop, whether on-premise or in the cloud, is neither simple nor easy. Administrators with specialized skills are needed to configure, manage, and maintain the clusters for their clients, who are data scientists Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. But appare Read more…

Meet Ray, the Real-Time Machine-Learning Replacement for Spark

Researchers at UC Berkeley's RISELab have developed a new distributed framework designed to enable Python-based machine learning and deep learning workloads to execute in real-time with MPI-like power and granularity. Ca Read more…

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Getting jobs to run on Hadoop is one thing, but getting them to run well is something else entirely. With a nod to the pain that parallelism and big data diversity brings, LinkedIn unveiled a new release of Dr. Elephant Read more…

Can Hadoop Be Simple Again?

In the beginning, Hadoop had two pieces: HDFS and MapReduce. Developers knew how to use them to build applications, and IT teams knew what it took to operate them. Fast forward to 2016, and developers have a cornucopia o Read more…

Datanami