Tag: mapreduce

Data Catalogs Scale in the Cloud

Nov 10, 2017 |

Data cataloging software for Hadoop and other big data systems emerged as a hot item at last year’s Strata + Hadoop World Expo.

Among the proponents of data cataloging, which is designed to help classify and organize most everything thrown into data lakes, is Waterline Data. Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Jun 16, 2017 |

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks.

While Hadoop is no longer the conference headliner that it once was, the platform is still critical for the daily operations of Yahoo, which officially became part of Verizon Communications this week when the $4.5 billion acquisition finally closed. Read more…

Pepperdata Takes On Spark Performance Challenges

May 24, 2017 |

Apache Spark has revolutionized how big data applications are developed and executed since it emerged several years ago. But troubleshooting slow Spark jobs on Hadoop clusters is not an easy task. Read more…

Cloudera Unveils Altus to Simplify Hadoop in the Cloud

May 24, 2017 |

Running Hadoop, whether on-premise or in the cloud, is neither simple nor easy. Administrators with specialized skills are needed to configure, manage, and maintain the clusters for their clients, who are data scientists, engineers, and analysts. Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

May 19, 2017 |

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. Read more…

Meet Ray, the Real-Time Machine-Learning Replacement for Spark

Mar 28, 2017 |

Researchers at UC Berkeley’s RISELab have developed a new distributed framework designed to enable Python-based machine learning and deep learning workloads to execute in real-time with MPI-like power and granularity. Read more…

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Mar 7, 2017 |

Getting jobs to run on Hadoop is one thing, but getting them to run well is something else entirely. With a nod to the pain that parallelism and big data diversity brings, LinkedIn unveiled a new release of Dr. Read more…

Can Hadoop Be Simple Again?

Sep 19, 2016 |

In the beginning, Hadoop had two pieces: HDFS and MapReduce. Developers knew how to use them to build applications, and IT teams knew what it took to operate them. Fast forward to 2016, and developers have a cornucopia of technologies and frameworks at their disposal. Read more…

Hadoop Past, Present, and Future

May 17, 2016 |

Every few years the technology industry seems to be consumed with a shiny new object that gets hyped far beyond reality. At worst, the inevitable bursting of the hype bubble leads to the disappearance of the technology from relevance (remember Internet browsing on your TV?), but more often the hype subsides until a real but narrower focus for the technology is found. Read more…

Apache Beam’s Ambitious Goal: Unify Big Data Development

Apr 22, 2016 |

If you’re tired of using multiple technologies to accomplish various big data tasks, you may want to consider Apache Beam, a new distributed processing tool from Google that’s now incubating at the ASF. Read more…