Follow Datanami:

Tag: apache spark

Spark 3.0 Brings Big SQL Speed-Up, Better Python Hooks

Jun 25, 2020 |

Apache Spark 3.0 is now here, and it’s bringing a host of enhancements across its diverse range of capabilities. The headliner is an big bump in performance for the SQL engine and better coverage of ANSI specs, while enhancements to the Python API will bring joy to data scientists everywhere. Read more…

Databricks Brings Data Science, Engineering Together with New Workspace

Jun 25, 2020 |

Data scientists and software engineers work in different ways and use different tools. But both personas will feel more comfortable developing applications in the new version of Databricks Data Science Workspace, which the company unveiled today at Spark + AI Summit. Read more…

Databricks Cranks Delta Lake Performance, Nabs Redash for SQL Viz

Jun 24, 2020 |

Today at its Spark + AI Summit, Databricks unveiled Delta Engine, a new layer in its Delta Lake cloud offering that uses several techniques to significantly accelerate the performance of SQL queries. Read more…

Spark 3.0 to Get Native GPU Acceleration

May 14, 2020 |

NVIDIA today announced that it’s working with Apache Spark’s open source community to bring native GPU acceleration to the next version of the big data processing framework. With Spark version 3.0, which is due out next month, organizations will be able to speed up all of their Spark workloads, from ETL jobs to machine learning training, without making wholesale changes to their code. Read more…

Kaskada Accelerates ML Workflow with Its Feature Store

Feb 5, 2020 |

There’s a lot of surface area in the typical data science workflow for the purveyors of automation to attack. What moves the needle for the folks at the startup Kaskada is the feature engineering and deployment stage, which it’s seeking to streamline with a new automated feature store. Read more…

Data Lakes Get Structured

Oct 7, 2019 |

The explosion of unstructured and partially structured data has made traditional data lakes harder to manage. Adding to the challenge are “brittle” data pipelines that are time-consuming to create as well as ephemeral. Read more…

StreamSets Eases Spark-ETL Pipeline Development

Sep 5, 2019 |

Apache Spark gives developers a powerful tool for creating data pipelines for ETL workflows, but the framework is complex and can be difficult to troubleshoot. StreamSets is aiming to simplify Spark pipeline development with Transformer, the latest addition to its DataOps platform. Read more…

Program Synthesis Moves a Step Closer to Reality

Jul 8, 2019 |

As data scientists and software developers sort through the plethora of tools and APIs ranging from Python to Apache Spark, automation schemes are emerging to help programmers navigate those tools and the accompanying infrastructure that machine learning and other apps run on. Read more…

Understanding Your Options for Stream Processing Frameworks

May 30, 2019 |

Real-time stream processing isn’t a new concept, but it’s experiencing renewed interest from organizations tasked with finding ways to quickly process large volumes of streaming data. Luckily for you, there are a handful of open source frameworks that could give your developers a big head start in building your own custom stream-processing application. Read more…

Apache Spark Is Great, But It’s Not Perfect

Apr 3, 2019 |

Apache Spark is one of the most widely used tools in the big data space, and will continue to be a critical piece of the technology puzzle for data scientists and data engineers for the foreseeable future. Read more…

Do NOT follow this link or you will be banned from the site!