Follow Datanami:

Tag: Spark

Presto Use Surges, Qubole Finds

Don't look now, but Presto, the SQL engine developed by Facebook as a follow-on to Hive, is starting to catch on in a big way. According to a new survey of big data-as-a-service customers by Qubole, Presto logged impress Read more…

Making Hadoop Relatable Again

There has been much debate over the future of Hadoop in recent months. Should it work more like a cloud object store? Should it support GPUs and FPGAs, Docker or Kubernetes (or both)? Should compute and storage be separa Read more…

Weighing Open Source’s Worth for the Future of Big Data

The open source software movement began in earnest 20 years ago, when a group of technology leaders in Silicon Valley coined the term as an alternative to the repugnant "free software." Fast forward to 2018, and the conc Read more…

DataTorrent Glues Open Source Componentry with ‘Apoxi’

Building an enterprise-grade big data application with open source components is not easy. Anybody who has worked with Apache Hadoop ecosystem technology can tell you that. But the folks at DataTorrent say they've found Read more…

The Hybrid Database Capturing Perishable Insights at Yiguo

Yiguo.com is the largest B2C fresh produce online marketplace in China, serving close to 5 million users and more than 1,000 enterprise customers. We have long devoted ourselves to providing fresh food for ordinary consu Read more…

ParallelM Aims to Close the Gap in ML Operationalization

A startup named ParallelM today unveiled new software aimed at alleviating data scientists from the burden of manually deploying, monitoring, and managing machine learning pipelines in production. Dubbed MLOps, Parall Read more…

Snowflake Taps Qubole for Deep Machine Learning in the Cloud

Organizations storing big data in Snowflake's cloud data warehouse can now run machine learning and deep learning algorithms against that data thanks to a new partnership with Qubole. The two companies today announced Read more…

Dr. Elephant Leads the Performance Parade

I started working on big data infrastructure in 2009 when I joined Cloudera, which at the time was a small startup with about 10 engineers. It was a fun place to work. My colleagues and I got paid to work on open source Read more…

Databricks Puts ‘Delta’ at the Confluence of Lakes, Streams, and Warehouses

Databricks today launched a new managed cloud offering called Delta that seeks to combine the advantages of MPP data warehouses, Hadoop data lakes, and streaming data analytics in a unifying platform designed to let user Read more…

Containerized Spark Deployment Pays Dividends

Hadoop has emerged as a general purpose big data operating system that can perform a range of tasks and run all kinds of processing engines. But all that power and flexibility comes with a cost, which is something that o Read more…

DataRobot Reaches Out to SAS, Financial Services

Companies that use DataRobot's software to automate data science tasks can now output models directly from SAS, the dominant analytics company whose software is widely deployed in enterprises around the world. The upstar Read more…

Taking the Data Scientist Out of Data Science

If you were a data scientist three years ago, you could pretty much write your own ticket. Everybody in the industry, it seemed, either wanted to hire a data scientist, or wanted to be one. But today, thanks to a conflue Read more…

IBM Bolsters Spark Ties with Latest SQL Engine

IBM is extending its commitment to Apache Spark as a key component of in-memory analytics with the latest release of its SQL engine for Hadoop. The new version of IBM Big SQL released last week also solidifies the com Read more…

Hadoop Engines Compete in Comcast Query ‘Smackdown’

Who rules the ring when it comes to Hadoop SQL query engine performance? Can flashy newcomers like Presto and Spark take an established giant like MapReduce to the matt? Comcast recently held a competition to crown the b Read more…

Yahoo’s Massive Hadoop Scale on Display at Dataworks Summit

Yahoo put its massive Hadoop investment on display this week at Dataworks Summit, the semi-annual big data conference that it co-hosts with Hortonworks. While Hadoop is no longer the conference headliner that it once Read more…

Hortonworks Shifts Focus to Streaming Analytics

Hortonworks started life providing a Hadoop distribution that allowed customers to process big data at rest. But these days, the company has shifted its much of its attention and resources to streaming analytics, or proc Read more…

Spark’s New Deep Learning Tricks

Imagine being able to use your Apache Spark skills to build and execute deep learning workflows to analyze images or otherwise crunch vast reams of unstructured data. That's the gist behind Deep Learning Pipelines, a new Read more…

Pepperdata Takes On Spark Performance Challenges

Apache Spark has revolutionized how big data applications are developed and executed since it emerged several years ago. But troubleshooting slow Spark jobs on Hadoop clusters is not an easy task. In fact, it may even be Read more…

Cloudera Unveils Altus to Simplify Hadoop in the Cloud

Running Hadoop, whether on-premise or in the cloud, is neither simple nor easy. Administrators with specialized skills are needed to configure, manage, and maintain the clusters for their clients, who are data scientists Read more…

Google/ASF Tackle Big Computing Trade-Offs with Apache Beam 2.0

Trade-offs are a part of life, in personal matters as well as in computers. You typically cannot have something built quickly, built inexpensively, and built well. Pick two, as your grandfather would tell you. But appare Read more…

Datanami