Databricks Offers Automation Throughout the End-to-End Data and Machine Learning Lifecycle
SAN FRANCISCO, August 21, 2019 — Databricks, the leader in Unified Analytics and original creators of Apache Spark, today announced that its Unified Analytics Platform now offers automation and augmentation throughout the machine learning lifecycle. The broader augmented analytics offering not only automates machine learning model building, but also extends to automated data preparation and model deployment. The new automated machine learning (AutoML) capabilities empower expert and citizen data scientists alike.
“Gartner predicts by 2020, more than 40% of data science tasks will be automated, resulting in increased productivity and broader use by citizen data scientists”. To accelerate this automation and help data science teams provide value to their business, Databricks’ Unified Analytics Platform is using machine learning to augment data preparation, visualization, feature engineering, hyperparameter tuning, model search, automatic model tracking, reproducibility, and deployment. Centered around an integration with the open source framework MLflow, this AutoML offering enables citizen data scientists, not just experts, to augment their data science and machine learning workflows at scale.
“Data scientists and machine learning engineers are continuously looking for ways to accelerate and scale their machine learning initiatives,” said Adam Conway, vice president of product management at Databricks. “By introducing the concept of ‘low-code’ and ‘no-code’, AutoML represents a fundamental shift in the way organizations approach machine learning and data science. With the right automation, AutoML can dramatically shorten time-to-value for data science teams.”
This offering provides AutoML capabilities at different levels of control and automation.
- AutoML Toolkit: Automated end-to-end machine learning pipeline, including feature engineering, model search, and deployment, is available via Databricks Labs custom solutions. AutoML Toolkit executions are automatically tracked in MLflow.
- Automated Model Search: Optimized and distributed conditional hyperparameter search with enhanced Hyperopt and automated tracking to MLflow.
- Automated Hyperparameter Tuning: Optimized and distributed hyperparameter search with enhanced Hyperopt and automated tracking to MLflow. Deep integration with PySpark MLlib’s Cross Validation to automatically track MLlib experiments in MLflow.
- Integration with Azure Machine Learning: Building upon the open source MLflow collaboration between Databricks and Microsoft announced in April, this integration allows customers access to the automated machine learning capabilities offered by Azure Machine Learning.
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the original creators of Apache Spark, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks has secured investments from Andreessen Horowitz, Coatue Management, Microsoft, New Enterprise Associates (NEA), Battery Ventures, Green Bay Ventures, and Geodesic, among others, and has a global customer base that includes Viacom, Shell and HP.