Survey Sees Spark Emerging in 2016
This is the “Year of Spark,” asserts a new big data survey on analytics priorities.
The survey of more than 250 data scientists and architects, IT managers and business intelligence analysts released on Tuesday (Jan. 19) found that nearly 70 percent of users expressed interest in deploying Apache Spark in the coming year. While current leader MapReduce is expected to remain the dominant compute framework in production, survey sponsor Syncsort noted that the “high level of interest should translate into more Spark deployments, mostly running on Hadoop.”
“The ability to design data transformations once and then run them anywhere—across Hadoop MapReduce, Spark, Linux, Windows, or Unix, on premise or in the cloud—is critical,” the survey concluded.
Another conclusion drawn from the survey was that the “ability to transform and prepare data in flight will be more important, eliminating the need for staging increasing volumes of data.”
Added Tendü Yoğurtçu, general manager of Syncsort’s Big Data unit: “Though challenging, this will also create an opportunity to deliver next generation data integration products, future-proofing user’s applications while taking advantage of highly scalable and distributed platforms like Apache Hadoop and Spark,” either on-premise or in the cloud.
Syncsort, Woodcliff Hills, N.J., also said its “Hadoop Perspectives” survey found that the number of users switching to Hadoop will continue to increase based on a range of cost and operational benefits. Among them is the desire to make more data available to more business users across organizations.
At the same time, more respondents said they want to leverage Hadoop for “advanced use cases” like crunching unstructured data from social media sources as well as rolling out Internet of Things (IoT) strategies. Data executives also cited the desire to make greater use of predictive analytics and visualization to gain deeper insights into the customers’ preferences.
Meanwhile, 40 percent of respondents said they currently use Hadoop as a cheaper alternative for storage and processing in their data warehouses. The survey also noted that Hadoop has “yet to be leveraged for mobile apps and software,” with a mere 4.9 percent of respondents reporting utility for those use cases.
Based on the results of the survey conducted late last year, Syncsort also predicted greater use of streaming, real-time data sources along with greater emphasis on data governance and security as the pace of production deployments quickens in 2016.
In the first instance, “the best business decisions often require the most recent data available,” the survey found. The most popular use cases included fraud detection, analytics on telemetry and security data, insurance claim validation and IoT deployments.
Meanwhile, the survey predicted that more organizations would adopt a “Hadoop first” approach to data management, “skipping traditional and more expensive platforms and applying metadata, lineage, security, and other data management measures on Hadoop from the start.”
3 Major Things You Should Know About Apache Spark 1.6
Spark Streaming: What Is It and Who’s Using It?