Follow Datanami:

Tag: apache spark

Here’s What Doug Cutting Says Is Hadoop’s Biggest Contribution

Apache Hadoop isn't the center of attention in the IT world anymore, and much of the hype has dissipated (or at least regrouped behind AI). But the open source software project still has a place for on-premise workloads, Read more…

How Walmart Uses Nvidia GPUs for Better Demand Forecasting

During a presentation at Nvidia's GPU Technology Conference (GTC) this week, the director of data science for Walmart Labs shared how the company's new GPU-based demand forecasting model achieved a 1.7% increase in forec Read more…

What Makes Apache Spark Sizzle? Experts Sound Off

Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us th Read more…

A Decade Later, Apache Spark Still Going Strong

Don't look now but Apache Spark is about to turn 10 years old. The open source project began quietly at UC Berkeley in 2009 before emerging as an open source project in 2010. For the past five years, Spark has been on an Read more…

Microsoft Invests in Databricks

Databricks, the high-flying analytics startup founded by the creators of Apache Spark, announced yet another venture funding haul this week as it hustles to meet what it says is growing demand for its analytics platform. Read more…

Google Brings Kubernetes Operator for Spark to GCP

Those looking to run Apache Spark on clusters managed with Kubernetes will be interested in the new Spark operator for Kubernetes unveiled by Google today. The software, which is in beta, will be supported on the Google Read more…

Google Updates Cloud Database, Developer Tools

Google unleashed a batch of updated tools this week aimed at cloud-based big data and storage options along with the beta release of a developer tool designed to ease use of Apache Spark with the R programming language. Read more…

Databricks Upgrades Spark Support, Adds ML Runtime

Databricks announced support this week for the latest version of Spark, integrating it into its enterprise analytics platform. Along with support for version 2.4 of the stream processing framework integrated as part of D Read more…

Databricks, Talend Expand Cloud Access to Spark

Databricks and Talend, the cloud data integration vendor, are joining forces to help data jockeys scale their integration efforts using the Apache Spark analytics engine hosted on Talend’s cloud. Databricks, the cre Read more…

Hot DataRobot Raises a Bundle

The bucks keep rolling in from technology investors pouring cash into machine learning and data science startups. The latest beneficiary is DataRobot, the machine learning automation vendor. Formed in 2012, the compan Read more…

New Open-Source Projects Emerge for Machine Learning

Two open-source projects contributed by Chinese tech giants Baidu and Tencent will focus on machine and deep learning advances with the long-term goal of making the AI technologies easier to use while advancing cloud ser Read more…

Anaconda: Data Science Exiting Hadoop for the Cloud

Data scientists are embracing cloud-native frameworks as they move on from on-premises data infrastructure previously dominated by Hadoop, concludes a survey on the state of data science. The shift is driven in part b Read more…

Databricks Open Sources MLflow to Simplify Machine Learning Lifecycle

Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine le Read more…

Project Hydrogen Unites Apache Spark with DL Frameworks

The folks behind Apache Spark today unveiled Project Hydrogen, a new endeavor that aims to eliminate barriers preventing organizations from using Spark with deep learning frameworks like TensorFlow and MXnet. It's tou Read more…

Google Cloud Adds Cask Data

Leading cloud providers continue to snap up analytics startups with an eye toward expanding access to big data technologies. Cask Data, developers of an application platform that among other things integrates Hadoop and Read more…

Apache Zeppelin Launches Latest Data Science Notebook

ZEPL, the startup founded by the creators of interactive data analytics tool Apache Zeppelin, has moved its multi-tenant analytics platform out of beta, announcing its general availability this week. The platform is a Read more…

Top 3 New Features in Apache Spark 2.3

It's tough to find a big data project that's had as much impact as Apache Spark over the past five years. The folks at Databricks, who contribute heavily to Spark (along with the wider Spark community) are keeping the pr Read more…

Data Lakes Crest In Drive to Boost Quality

As more data moves to the cloud, the composition of data lakes is shifting to new sources such as NoSQL databases while cloud data repositories emerge amid hybrid deployments, according to a big data survey. The year- Read more…

The Data Science Behind Dollar Shave Club

Dollar Shave Club burst onto the men's hygiene scene in 2011 with a hilarious video and preposterous business plan: selling subscriptions for razor blades at a ridiculously low price. Six years later, the company keeps g Read more…

Datanami