Follow Datanami:

Tag: Matei Zaharia

To Centralize or Not to Centralize Your Data–That Is the Question

Should you strive to centralize your data, or leave it scattered about? It seems like it should be a simple question, but it’s actually a tough one to answer, particularly because it has so many ramifications for how d Read more…

Spark 3.0 Brings Big SQL Speed-Up, Better Python Hooks

Apache Spark 3.0 is now here, and it’s bringing a host of enhancements across its diverse range of capabilities. The headliner is an big bump in performance for the SQL engine and better coverage of ANSI specs, while e Read more…

Databricks Brings Data Science, Engineering Together with New Workspace

Data scientists and software engineers work in different ways and use different tools. But both personas will feel more comfortable developing applications in the new version of Databricks Data Science Workspace, which t Read more…

Will Databricks Build the First Enterprise AI Platform?

Ali Ghodsi might have one of the best jobs in technology right now. As the CEO of Databricks, Ghodsi just completed an oversubscribed $400 million round of funding that gave the company a $6.2 billion valuation. Better s Read more…

Apache Spark Is Great, But It’s Not Perfect

Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. With that said Read more…

What Makes Apache Spark Sizzle? Experts Sound Off

Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us th Read more…

Databricks Open Sources MLflow to Simplify Machine Learning Lifecycle

Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine le Read more…

Spark 2.0 to Introduce New ‘Structured Streaming’ Engine

The folks at Databricks last week gave a glimpse of what's to come in Spark 2.0, and among the changes that are sure to capture the attention of Spark users is the new Structured Streaming engine that leans on the Spark Read more…

Spark Steals the Show at Strata

There was a lot of good stuff on display at last week's Strata + Hadoop World conference. But if there was one product or technology that stood out from the pack, that would have to be Apache Spark, the versatile in-memo Read more…

Datanami