Follow Datanami:

Tag: apache spark

ML Scaling Requires Upgraded Data Management Plan

Apr 16, 2021 |

Successful data strategies are built on a foundation of meticulous data management, creating enterprise architectures that “democratize” data access and usage, yielding measurable results from machine learning platforms.

The reality, according to an examination of the emerging “AI organization,” is that few data-driven organizations are able to deliver on their data strategy. Read more…

Cloudera, Nvidia Team to Speed Cloud AI via Spark

Apr 13, 2021 |

Cloud access to GPUs for AI development will expand under a partnership between Cloudera and Nvidia that calls for the data cloud provider to integrate Nvidia’s accelerated Apache Spark 3.0 platform as a way to scale data science workflows. Read more…

No-Coder Upsolver Aims to Ease Use of Cloud Data Lakes

Apr 6, 2021 |

Upsolver, the no-code data lake platform vendor, has closed a $25 million funding round this week, boosting total venture funding for its cloud analytics tools to about $42 million.

The financing round announced Tuesday (April 6) was led by Scale Venture Partners. Read more…

Databricks Plotting IPO in 2021, Bloomberg Reports

Oct 26, 2020 |

Databricks, which runs a unified data platform in the cloud and is the driving force behind Apache Spark, is preparing for an initial public offering (IPO), possibly in the first half of 2021, according to a report in Bloomberg last week. Read more…

Big Data Apps Wasting Billions in the Cloud

Jul 29, 2020 |

Many organizations have shifted to a cloud-first mentality for deploying their big data applications. But without expending effort to optimize or tune these cloud apps, customers will waste billions of dollars’ worth of computing resources, according to a new report. Read more…

To Centralize or Not to Centralize Your Data–That Is the Question

Jul 14, 2020 |

Should you strive to centralize your data, or leave it scattered about? It seems like it should be a simple question, but it’s actually a tough one to answer, particularly because it has so many ramifications for how data systems are architected, particularly with the rise of cloud data lakes. Read more…

Google Cloud’s Dataproc Gets a GPU-Powered Spark Boost

Jul 7, 2020 |

Google Cloud’s Dataproc – its big data platform that allows users to run Apache Hadoop and Spark jobs – is getting a boost. Apache Spark 3 and Hadoop 3 have launched general availability, enhancing users’ data analytics capabilities with a series of new features – and naturally, those features are now available on Google Cloud’s Dataproc image version 2.0. Read more…

Spark 3.0 Brings Big SQL Speed-Up, Better Python Hooks

Jun 25, 2020 |

Apache Spark 3.0 is now here, and it’s bringing a host of enhancements across its diverse range of capabilities. The headliner is an big bump in performance for the SQL engine and better coverage of ANSI specs, while enhancements to the Python API will bring joy to data scientists everywhere. Read more…

Databricks Brings Data Science, Engineering Together with New Workspace

Jun 25, 2020 |

Data scientists and software engineers work in different ways and use different tools. But both personas will feel more comfortable developing applications in the new version of Databricks Data Science Workspace, which the company unveiled today at Spark + AI Summit. Read more…

Databricks Cranks Delta Lake Performance, Nabs Redash for SQL Viz

Jun 24, 2020 |

Today at its Spark + AI Summit, Databricks unveiled Delta Engine, a new layer in its Delta Lake cloud offering that uses several techniques to significantly accelerate the performance of SQL queries. Read more…

Datanami