Follow Datanami:

Tag: apache spark

Data Lakes Get Structured

Oct 7, 2019 |

The explosion of unstructured and partially structured data has made traditional data lakes harder to manage. Adding to the challenge are “brittle” data pipelines that are time-consuming to create as well as ephemeral. Read more…

StreamSets Eases Spark-ETL Pipeline Development

Sep 5, 2019 |

Apache Spark gives developers a powerful tool for creating data pipelines for ETL workflows, but the framework is complex and can be difficult to troubleshoot. StreamSets is aiming to simplify Spark pipeline development with Transformer, the latest addition to its DataOps platform. Read more…

Program Synthesis Moves a Step Closer to Reality

Jul 8, 2019 |

As data scientists and software developers sort through the plethora of tools and APIs ranging from Python to Apache Spark, automation schemes are emerging to help programmers navigate those tools and the accompanying infrastructure that machine learning and other apps run on. Read more…

Understanding Your Options for Stream Processing Frameworks

May 30, 2019 |

Real-time stream processing isn’t a new concept, but it’s experiencing renewed interest from organizations tasked with finding ways to quickly process large volumes of streaming data. Luckily for you, there are a handful of open source frameworks that could give your developers a big head start in building your own custom stream-processing application. Read more…

Apache Spark Is Great, But It’s Not Perfect

Apr 3, 2019 |

Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. Read more…

Startup MemVerge Combines Memory, Storage

Apr 2, 2019 |

A startup combining persistent memory and data storage has emerged from stealth mode with a platform running on chip maker Intel’s Optane architecture.

MemVerge claims to have invented what it calls “memory-converged infrastructure” that eliminates barriers between memory and storage. Read more…

Here’s What Doug Cutting Says Is Hadoop’s Biggest Contribution

Apr 1, 2019 |

Apache Hadoop isn’t the center of attention in the IT world anymore, and much of the hype has dissipated (or at least regrouped behind AI). But the open source software project still has a place for on-premise workloads, according to Hadoop co-creator Doug Cutting, who says Hadoop will be remembered most of all for a single contribution it made to IT. Read more…

How Walmart Uses Nvidia GPUs for Better Demand Forecasting

Mar 22, 2019 |

During a presentation at Nvidia’s GPU Technology Conference (GTC) this week, the director of data science for Walmart Labs shared how the company’s new GPU-based demand forecasting model achieved a 1.7% increase in forecast accuracy compared to the existing approach. Read more…

What Makes Apache Spark Sizzle? Experts Sound Off

Mar 11, 2019 |

Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us their take on why Spark has achieved sustained success when so many other frameworks have fizzled. Read more…

A Decade Later, Apache Spark Still Going Strong

Mar 8, 2019 |

Don’t look now but Apache Spark is about to turn 10 years old. The open source project began quietly at UC Berkeley in 2009 before emerging as an open source project in 2010. Read more…

Do NOT follow this link or you will be banned from the site!