April 12, 2017

Databricks Eyes Data Engineers With Spark Cloud

George Leopold

(Leigh Prather/Shutterstock)

Apache Spark creator Databricks rolled out a new version of its cloud platform based on Spark that specifically targets data engineering workloads.

The company said Wednesday (April 12) its data science platform would enable data engineers to combine SQL, structured streaming, ETL and machine learning workloads running on the cluster-computing framework. The goal is to accelerate secure deployment of data pipelines in production, the San Francisco-based company said.

The data engineering platform also seeks to move Spark deeper into enterprises by delivering what Databricks calls a “unified data analytics platform” that promotes collaboration among data scientists and decision makers. With that in mind, the cloud platform integrates with the company’s data science “workspaces” to streamline the “transition between data engineering and interactive data science workloads.”

The new Databricks platform also appears to address the growing demand for data engineers, a relatively new position that is a kind of hybrid between data analysts and data scientists. Data engineers excel at manipulating huge amounts of data and ensuring the entire big data software stack can scale to support massive workloads.

Databricks maintains organizations face challenges building Spark-based systems to meet the demands of data engineers that include tasks such as data cleansing and analysis. Hence, it is offering a “unified environment” to boost collaboration between data engineers and scientists along with Spark performance increases to develop, for example, intelligent algorithms for automating business processes.

The company claims its cloud-based platform can deliver as much as a ten-fold boost in an optimized version of Spark that handles a variety of instance types. It also is offering an accelerated access layer via Amazon Web Services’ (NASDAQ: AMZN) Simple Storage Service. Meanwhile, tools and services such as Amazon Redshift data warehousing along with machine learning frameworks like TensorFlow are deployed via REST APIs that also launch clusters and jobs.

At the same time, the expanding Spark community has been working to boost the performance of applications running under the SQL and Dataframe APIs, which have been stabilized.

Databricks said pricing for its data-engineering platform is based on workloads such as ETL and automated jobs, which works out to 20 cents per Databricks unit plus the cost of the AWS cloud.

The data-engineering platform is among a raft of new Spark features and enhancements planned for this year. Apache Spark creator Matei Zaharia said recently they include the introduction of a standard binary data format, better integration with Kafka, and even the capability to run Spark on a laptop. Automated creation of continuous applications in Spark remains a long-term goal, Zaharia said during a recent company event.

Recent items:

What’s In the Pipeline For Apache Spark?

Databricks CEO on Streaming Analytics, Deep Learning and SQL

Applications: Enterprise Analytics

Technologies: Cloud, Frameworks

Sectors: Financial Services, Government, Healthcare, Manufacturing, Other, Retail

Vendors: AWS, Databricks

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Databricks Eyes Data Engineers With Spark Cloud

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Databricks Eyes Data Engineers With Spark Cloud

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link