Survey: Spark Going ‘Mainstream’
That rumbling sound you hear is Apache Spark entering production deployments in public clouds along with surging use of the cluster-computing framework’s streaming and machine learning capabilities, according to a new vendor survey that also found more diverse users and use cases.
Databricks Inc., the San Francisco-based startup behind Apache Spark, released survey results on Tuesday (Sept. 27) revealing steady momentum as the Spark user community more than tripled over the past year to 225,000 members.
“The results indicate that Spark has moved well beyond the early-adopter phase at high-tech companies and is now mainstream in large data-driven enterprises,” the startup asserted.
As developer participation soared, the Databricks poll of 900 organizations found that Spark deployments in the public cloud are surging as more industries shift to cloud computing. For example, the survey found that public cloud deployments of Spark jumped 10 percent over last year to 61 percent in 2016. Meanwhile, Spark deployments continue to drop for on-premises cluster managers, the survey found.
Spark also is spurring the surge in fast data analytics, with more than half of more than 1,600 respondents pointing to data streaming as a key component for deploying real-time streaming and analytics platforms. While production use of Spark streaming surged 57 percent over the past year, the adoption rate for Spark’s machine learning library, MLlib, also grew 38 percent year-on-year.
Along with Spark-based streaming and machine learning applications entering production, the Databricks survey found that deployments of other Spark components such as DataFrames more than doubled over the last year. DataFrames is a distributed collection of data organized in named columns. The survey found that production deployments rose to 38 percent over the last year.
Meanwhile Spark SQL deployment rose 16 percent year-on-year to 40 percent of those polled.
Based on its survey results, Databricks said it expects Spark momentum to continue building as a diverse set of new users embraces the data-processing engine. One reason is simplicity, a characteristic found lacking in another recent industry survey that cited “inflexibility” in current data analytics infrastructure as a key reason for many failed big data projects.
Hence, Databricks executives noted that ease-of-use along with better performance headed the list of key Spark features most often cited by users. They also cited accessibility of common programming languages supported by Spark, including R and SQL, “suggesting new users are not only data engineers but data analysts,” the company said.
Meanwhile, Spark usage among Windows users also increased by 9 percent over the previous year to 32 percent of those surveyed, Databricks reported. “These attributes make Spark an attractive engine for performing advanced analytics across industry verticals in solving complex data problems, by users from different functional roles,” Reynold Xin, Databricks’ chief architect and co-founder, noted in a statement releasing the survey results.