February 18, 2014

Spark Graduates Apache Incubator

Tiffany Trader

As we’ve touched on before, Hadoop was designed as a batch-oriented system, and its real-time capabilities are still emerging. Those eagerly awaiting this next evolution will be pleased to hear about the graduation of Apache Spark from the Apache Incubator. On Sunday, the Apache Spark Project committee unanimously voted to promote the fast data-processing tool out of the Apache Incubator.

Spark was created in 2009 in University of California Berkeley’s AMPLab, and open sourced in 2010. The company formed to support Spark, Databricks, has raised nearly $14 million from venture firm Andreessen Horowitz to commercialize both Spark and its sister project, the SQL query engine Shark.

Post graduation, a project management committee will be established for the big data software, according to a report from The Register, and Databricks co-founder and CTO Matei Zaharia will take on the role of Vice President, Apache Spark.

Databricks refers to Apache Spark as “a powerful open source processing engine for Hadoop data built around speed, ease of use, and sophisticated analytics.” The computing framework supports Java, Scala, and Python and comes with a set of more than 80 high-level operators baked-in.

Spark runs on top of existing Hadoop clusters and is being pitched as a “more general and powerful alternative to Hadoop’s MapReduce.” Spark promises performance gains up to 100 times faster than Hadoop MapReduce for in-memory datasets, and 10 times faster when running on disk.

It supports SQL queries, streaming data, and complex analytics, and can also combine these capabilities by supporting multiple workloads that previously required separate engines (e.g. MapReduce, SQL and machine learning).

The promotion of Spark out of Apache Incubator is one more sign of a maturing Hadoop ecosystem. Where MapReduce is best suited to the high-latency batch model, Spark extends Hadoop’s viability for real-time transactional databases. The software is fully compatible with the Hadoop Distributed File System (HDFS), HBase, as well as all Hadoop storage systems. It also has built-in scripts for running on Amazon EC2.

Spark has gained traction quickly. Cloudera is shipping Spark as part of its Hadoop distro. Its active developer community includes over 100 contributors from more than 30 organizations. Users of the framework include Alibaba, Baidu, Intel, IBM’s Almaden research group, TrendMicro, Yahoo, and others companies both large and small.

The Apache site offers Spark project downloads as well as installation instructions, video tutorials, and documentation. The current version, Spark 0.9.0, was released February 2, 2014.

Related Items:

Rethinking Real-Time Hadoop

Cloudera Shuffles Its Product Deck in Pursuit of ‘Data Hub’ Strategy

Databricks Parters with Cloudera for Analytics

Applications: Enterprise Analytics

Technologies: Middleware

Sectors: Other

Vendors: Cloudera

Tags: Hadoop, mapreduce, Shark, Spark

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Spark Graduates Apache Incubator

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Spark Graduates Apache Incubator

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link