Follow Datanami:
December 2, 2014

Databricks Debuts Online Classes to Teach Distributed Analytics Using Spark

BERKELEY, Calif., Dec. 2 — Databricks, the company founded by the creators of the popular open-source Big Data processing engine Apache Spark, today announced the launch of two massive open online courses (MOOCs) focused on distributed analytics using Apache Spark. The courses will be made available in Spring 2015 via BerkeleyX, in collaboration with the MOOC provider and online learning platform, edX.

With the explosive demand and rising adoption for Apache Spark, the two five-week courses augment Databricks’ efforts to grow the Spark community, enabling students to gain hands-on experience with Spark’s combination of sophisticated analytics and real-time capabilities to deliver deeper insights, faster. The launch of these courses comes at the heels of a series of Apache Spark training offerings from Databricks, including the Spark Certification Program for System Integrators and the Spark Certification Program for Developers.

“Spark is the most active open source project in the Big Data ecosystem, and continues to be deployed by enterprises across multiple verticals due to its speed and efficiency, ease of use, and single unified system for the complete data analytics pipelines,” said Matei Zaharia, co-founder and CTO at Databricks. “As we continue to foster and grow the Spark community to meet that demand, we are excited to launch these two MOOCs, making hands-on, practical courses available to a community that will advance Spark’s adoption with greater ease.”

Both courses will use the Python interface to Spark, making them widely accessible to data scientists and developers. The courses include:

Introduction to Big Data with Apache Spark  Students will learn how to apply data science techniques using parallel programming in Spark to explore big (and small) data. The course will identify the most common responsibilities of data scientists and teach students how to use Spark to deliver against these expectations.

When: February 23 – March 27, 2015
Professor: Anthony D. Joseph, Professor in Electrical Engineering and Computer Science at UC Berkeley and Technical Advisor at Databricks

Scalable Machine Learning – The course will present the underlying statistical and algorithmic principles required to develop scalable machine learning pipelines and provide hands-on experience using Apache Spark. Students will use Spark to implement scalable algorithms for fundamental statistical models while tackling key real-world problems from various domains.

When: April 14 – May 18, 2015
Professor: Ameet Talwalkar, Assistant Professor of Computer Science at UCLA and Technical Advisor at Databricks

Both courses are available to the public for free and are now open for enrollment on the edX website. edX Verified Certificates are also available for a fee. For more information, visit: https://www.edx.org/

Datanami