Follow Datanami:

Tag: apache spark

A Dozen Questions for Databricks CTO Matei Zaharia

Matei Zaharia is a very busy man. When he’s not helping to shape the future of Databricks as its CTO, he is helping to shape the future of computer science as an assistant professor at Stanford University.  He also fi Read more…

Is Real-Time Streaming Finally Taking Off?

Like commercial fusion reactors, real-time streaming is a tantalizing technology, but one that perpetually needs just a few more years (or decades) of R&D. But some in the industry are sensing that something has shif Read more…

Databricks Bolsters Governance and Secure Sharing in the Lakehouse

Data governance is one of the four pillars necessary for the future of AI, along with past-looking analytics, future-looking AI, and real-time decision-making. To that end, Databricks rolled out several new governance ca Read more…

It’s Not ‘Mobile Spark,’ But It’s Close

On April 1, 2015, Apache Spark PMC member Reynold Xin wrote a compelling blog detailing plans to deliver a mobile version of Spark. It was all a joke, of course: Spark was a heavy bit of code designed for distributed sys Read more…

Databricks Scores ACM SIGMOD Awards for Spark and Photon

Databricks announced it has won two awards at the ACM SIGMOD (Association of Computing Machinery’s Special Interest Group in the Management of Data) Conference in Philadelphia. Apache Spark was awarded the SIGMOD Sy Read more…

Spark Gets Closer Hooks to Pandas, SQL with Version 3.2

The Apache Spark community last week announced Spark 3.2, a significant new release of the distributed computing framework. Among the more exciting features are deeper support for the Python data ecosystem, including the Read more…

Machine Learning, from Single Core to Whole Cluster

The demand for production-quality software for mining insights from datasets across scales has exploded in the last several years. The growing size of datasets throughout industry, government, and other fields has increa Read more…

Meet Sean Knapp, a 2021 Datanami Person to Watch

Getting data to the right place at the right time has never been more important than it is now. But for many organizations, the data movement task largely remains a manual affair. Sean Knapp founded Ascend.io because he Read more…

ML Scaling Requires Upgraded Data Management Plan

Successful data strategies are built on a foundation of meticulous data management, creating enterprise architectures that “democratize” data access and usage, yielding measurable results from machine learning platfo Read more…

Cloudera, Nvidia Team to Speed Cloud AI via Spark

Cloud access to GPUs for AI development will expand under a partnership between Cloudera and Nvidia that calls for the data cloud provider to integrate Nvidia’s accelerated Apache Spark 3.0 platform as a way to scale d Read more…

No-Coder Upsolver Aims to Ease Use of Cloud Data Lakes

Upsolver, the no-code data lake platform vendor, has closed a $25 million funding round this week, boosting total venture funding for its cloud analytics tools to about $42 million. The financing round announced Tuesd Read more…

Databricks Plotting IPO in 2021, Bloomberg Reports

Databricks, which runs a unified data platform in the cloud and is the driving force behind Apache Spark, is preparing for an initial public offering (IPO), possibly in the first half of 2021, according to a report in Bl Read more…

Big Data Apps Wasting Billions in the Cloud

Many organizations have shifted to a cloud-first mentality for deploying their big data applications. But without expending effort to optimize or tune these cloud apps, customers will waste billions of dollars’ worth o Read more…

To Centralize or Not to Centralize Your Data–That Is the Question

Should you strive to centralize your data, or leave it scattered about? It seems like it should be a simple question, but it’s actually a tough one to answer, particularly because it has so many ramifications for how d Read more…

Google Cloud’s Dataproc Gets a GPU-Powered Spark Boost

Google Cloud’s Dataproc – its big data platform that allows users to run Apache Hadoop and Spark jobs – is getting a boost. Apache Spark 3 and Hadoop 3 have launched general availability, enhancing users’ data an Read more…

Spark 3.0 Brings Big SQL Speed-Up, Better Python Hooks

Apache Spark 3.0 is now here, and it’s bringing a host of enhancements across its diverse range of capabilities. The headliner is an big bump in performance for the SQL engine and better coverage of ANSI specs, while e Read more…

Databricks Brings Data Science, Engineering Together with New Workspace

Data scientists and software engineers work in different ways and use different tools. But both personas will feel more comfortable developing applications in the new version of Databricks Data Science Workspace, which t Read more…

Databricks Cranks Delta Lake Performance, Nabs Redash for SQL Viz

Today at its Spark + AI Summit, Databricks unveiled Delta Engine, a new layer in its Delta Lake cloud offering that uses several techniques to significantly accelerate the performance of SQL queries. The company also ann Read more…

Spark 3.0 to Get Native GPU Acceleration

NVIDIA today announced that it’s working with Apache Spark’s open source community to bring native GPU acceleration to the next version of the big data processing framework. With Spark version 3.0, which is due out n Read more…

Kaskada Accelerates ML Workflow with Its Feature Store

There’s a lot of surface area in the typical data science workflow for the purveyors of automation to attack. What moves the needle for the folks at the startup Kaskada is the feature engineering and deployment stage, Read more…

Datanami