Big Data • Big Analytics • Big Insight

July 16, 2014

Cloudera Introduces New Apache Spark Training Course

PALO ALTO, Calif., July 16 — Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop, today announced the first hands-on Apache Spark training course that will enrich developers’ experience with this groundbreaking new processing engine. The three-day course, called Cloudera Developer Training for Apache Spark, will prepare developers and software engineers to build complete, unified applications that combine batch, streaming, and interactive analytics on all of their data. With Cloudera Developer Training for Apache Spark, data professionals can take advantage of this next-generation framework’s advantages for speed, ease of use, and advanced analytics to enable faster business decisions and better user outcomes.

Spark is an open source data analytics framework originally developed in the AMPLab at the University of California, Berkeley that complements Hadoop as part of an enterprise data hub. Broadly embraced by the open source community, Big Data vendors, and data-intensive enterprises for its stream processing capabilities and its support for complex, iterative algorithms, Spark offers performance gains that enable applications to run on the data in a Hadoop cluster at speeds up to 100 times faster than traditional MapReduce programs. Cloudera was also the first company to offer commercial support for Spark as part of a Cloudera Enterprise subscription and recently announced a collaboration with Databricks, IBM, Intel, and MapR to broaden support for Spark as the standard data processing engine for the Hadoop ecosystem.

Through instructor-led discussions and interactive, hands-on exercises, participants will dive deep into the technical applications of Spark to understand how it relates to the rest of the Hadoop ecosystem and write sophisticated parallel applications. Developers will learn real-world best practices drawn from Cloudera’s work with Spark on some of the largest clusters in development and production:

  • Using the Spark shell for interactive data analysis
  • The features of Spark’s Resilient Distributed Datasets
  • How Spark runs on a cluster
  • Parallel programming with Spark
  • Writing Spark applications
  • Processing streaming data with Spark

“Spark offers clear benefits for realizing sophisticated analytics and is quickly becoming the future of data processing on Hadoop,” said Sarah Sproehnle, vice president, Education Services, Cloudera. “With Spark, customers can realize immediate business advantages. For example, Spark Streaming enables businesses to process live data as it arrives in the enterprise data hub, rather than having to wait to batch-process it later. The fact that the same codebase can be used for streaming data and data-at-rest significantly reduces development time for Big Data applications, speeding up time-to-insight by several orders of magnitude and decreasing the need for expensive specialized systems. This is just one case where the benefits of Spark have a direct impact on a company’s bottom line.”