
Tag: Matei Zaharia
Should you strive to centralize your data, or leave it scattered about? It seems like it should be a simple question, but it’s actually a tough one to answer, particularly because it has so many ramifications for how data systems are architected, particularly with the rise of cloud data lakes. Read more…
Apache Spark 3.0 is now here, and it’s bringing a host of enhancements across its diverse range of capabilities. The headliner is an big bump in performance for the SQL engine and better coverage of ANSI specs, while enhancements to the Python API will bring joy to data scientists everywhere. Read more…
Data scientists and software engineers work in different ways and use different tools. But both personas will feel more comfortable developing applications in the new version of Databricks Data Science Workspace, which the company unveiled today at Spark + AI Summit. Read more…
Ali Ghodsi might have one of the best jobs in technology right now. As the CEO of Databricks, Ghodsi just completed an oversubscribed $400 million round of funding that gave the company a $6.2 billion valuation. Read more…
Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. Read more…
Apache Spark is one of the most popular open source projects in the world, and has lowered the barrier of entry for processing and analyzing data at scale. We asked some of the leaders in the big data space to give us their take on why Spark has achieved sustained success when so many other frameworks have fizzled. Read more…
Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine learning models. Read more…
The folks at Databricks last week gave a glimpse of what’s to come in Spark 2.0, and among the changes that are sure to capture the attention of Spark users is the new Structured Streaming engine that leans on the Spark SQL API to simplify the development of real-time, continuous big data apps. Read more…
There was a lot of good stuff on display at last week’s Strata + Hadoop World conference. But if there was one product or technology that stood out from the pack, that would have to be Apache Spark, the versatile in-memory framework that is taking the big data world by storm. Read more…