Investments in Fast Data Analytics Surge
Companies are quickly ramping up their investments fast data analytics and real-time stream processing frameworks and lowering spending on batch technologies in an attempt to get on top of growing data volumes and velocities, a new survey says.
According to OpsClarity‘s 2016 State of Fast Data & Streaming Applications Survey, 89 percent of more than 4,000 survey respondents say they’re currently using batch analytics, compared to 65 percent who say they’re using “near real-time pipelines.”
Looking forward, 92 percent of survey respondents indicate they plan on increasing their investments in streaming data applications over the next year, while nearly 79 percent say they planned on reducing or eliminating investments in batch processing.
Those numbers, more than any others in the survey, demonstrate the shift that’s occurring in the big data analytics space, as companies look to take advantage of surging data flows to create a competitive advantage for themselves.
“Real-time data and stream processing is becoming central to how a modern company harnesses data,” Apache Kafka creator Jay Kreps says in a statement that accompanied the release of the survey. “For modern companies, data is no longer just powering stale daily reports. It’s being baked into an increasingly sophisticated set of applications, from detecting fraud to powering real-time analytics, to guiding smarter customer interactions.”
OpsClarity, which develops software for monitoring stream processing applications, also examined the most popular tools and frameworks used in the emerging fast data ecosystem. Not surprisingly, Apache Kafka was the hands-down winner when it came to message brokers, with 86% saying they use this technology, followed by Apache Flume (22%), Rabbit MQ (21%), Amazon SQS (11%) and Amazon Kinesis (11%).
Apache Spark (70%) was the most popular data processing engine. According to the survey, Spark out muscled MapReduce (50%), Apache Storm (27%) and Apache Samza (4%) in the data processing category. Interestingly, while Kafka Streaming was not yet released when the survey was conducted this spring, 32% of survey respondents say they were interested in using it for stream processing. Fast data professionals are also keeping their eyes on Apache Flinch and Apache Apex, according to the survey.
The Hadoop Distributed File System (HDFS) was the most popular data sink, as 54% of survey respondents report using this core Hadoop component. Apache Cassandra came in number two at 42%, followed by ElasticSearch (38%), relational database management systems (38%), HBase (34%) and MongoDB (21%). Most survey respondents report using two to three data stores on average, the survey says.
Open source technology is preferred by 91% of survey respondents, with nearly half (47%) reporting that they only use open source software. Only 9% of survey respondents reported using only commercial software; 47% report using both commercial and open source software.
A lack of experience is the biggest barrier to creating fast data pipelines, according to the survey, followed closely by lack of visibility into performance and reliability; the tediousness and time-consuming nature of issue resolution; and the lack of coordination between development and operations.
OpsClarity surveyed more than 4,000 professionals who are active in the big data/fast data space, including developers, data architects, data scientists, DevOps professionals, and senior IT managers. The survey was conducted on-site at industry events like Strata Hadoop World, SRECon, and Kafka Summit, as well as through emails sent to a targeted list of professionals.
You can read the full report here.