Follow Datanami:

Tag: mapreduce

Apache Spark Is Great, But It’s Not Perfect

Apr 3, 2019 |

Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. Read more…

What Makes Apache Spark Sizzle? Experts Sound Off

Mar 11, 2019 |

Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us their take on why Spark has achieved sustained success when so many other frameworks have fizzled. Read more…

Data Catalogs Scale in the Cloud

Nov 10, 2017 |

Data cataloging software for Hadoop and other big data systems emerged as a hot item at last year’s Strata + Hadoop World Expo.

Among the proponents of data cataloging, which is designed to help classify and organize most everything thrown into data lakes, is Waterline Data. Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Jun 16, 2017 |

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks.

While Hadoop is no longer the conference headliner that it once was, the platform is still critical for the daily operations of Yahoo, which officially became part of Verizon Communications this week when the $4.5 billion acquisition finally closed. Read more…

Pepperdata Takes On Spark Performance Challenges

May 24, 2017 |

Apache Spark has revolutionized how big data applications are developed and executed since it emerged several years ago. But troubleshooting slow Spark jobs on Hadoop clusters is not an easy task. Read more…

Cloudera Unveils Altus to Simplify Hadoop in the Cloud

May 24, 2017 |

Running Hadoop, whether on-premise or in the cloud, is neither simple nor easy. Administrators with specialized skills are needed to configure, manage, and maintain the clusters for their clients, who are data scientists, engineers, and analysts. Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

May 19, 2017 |

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. Read more…

Meet Ray, the Real-Time Machine-Learning Replacement for Spark

Mar 28, 2017 |

Researchers at UC Berkeley’s RISELab have developed a new distributed framework designed to enable Python-based machine learning and deep learning workloads to execute in real-time with MPI-like power and granularity. Read more…

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Mar 7, 2017 |

Getting jobs to run on Hadoop is one thing, but getting them to run well is something else entirely. With a nod to the pain that parallelism and big data diversity brings, LinkedIn unveiled a new release of Dr. Read more…

Can Hadoop Be Simple Again?

Sep 19, 2016 |

In the beginning, Hadoop had two pieces: HDFS and MapReduce. Developers knew how to use them to build applications, and IT teams knew what it took to operate them. Fast forward to 2016, and developers have a cornucopia of technologies and frameworks at their disposal. Read more…

Do NOT follow this link or you will be banned from the site!