In October, Cloudera acquired Eventador, which developed a SQL-based streaming data analysis solution based on Apache Flink. With the work to integrate that product with the Cloudera Data Platform (CDP) complete, the company today re-launched it as Cloudera SQL Stream Builder.
Cloudera SQL Stream Builder lets customers use SQL to develop queries for streaming data. Eventador originally developed the software, which builds upon Apache Flink to continuously execute SQL queries against data streaming in platforms like Apache Kafka. This SQL-based approach eliminates the need for specialty Java or Scala skills to analyze such data, the company says.
According to Cloudera, the software enables users to perform sophisticated data analysis through a simple interface. In addition to supporting syntax checking, error reporting, schema detection, and query creation, its materialized view engine can output live aggregated datasets that can be accessed by other applications via a simple REST API.
Streaming analytics enables companies to query extract value out of data in a given time window before it decays and loses value. However, accomplisying this task with SQL, rather than high-level languages like Java or Scala, is not easy, says Dinesh Chandrasekhar, head of product marketing for Cloudera.
“Unlike database tables which typically have a fixed number of rows at any given point in time, streams are unbounded,” Chandrasekhar writes in a blog post. “This means that they are continuous by nature and have no limit. They also don’t come in sequentially. Some messages can come in late or out of order too. This makes it challenging to adopt SQL as-is to query data streams.”
To account for the challenge in using SQL to analyze timestamped data in certain windows of time, SQL Stream Builder adds several addition keywords to the basic SQL grammar, Chandrasekhar writes.
Cloudera SQL Stream Builder is designed to make streaming data analytics simple
“They look and function like regular SQL but you also have a lot of additional constructs to allow you to group the streams over a specific time window,” he writes. “ It also supports a range of aggregation functions so that you can perform various enrichment tasks on the streams like finding averages, sums, counts etc. This immediately allows the data analysts and data scientists in the organization to query data streams using SQL! This is what we call the democratization of real-time data within the organization.”
Following the acquisition of Eventador, it took Cloudera about five months to integrate SQL Stream Builder with CDP and its Shared Data Experience (SDX) module. The integration with SDX provides SQL Stream Builder users the assurances that the data is secured and governed in the same manner and using the same rules and configurations that are used with data residing in other components of CDP.
According to Cloudera, SQL Stream Builder augments Cloudera DataFlow (CDF), the Apache NiFi-based product (formerly known as Hortonworks Data Flow) that ingests, curates, and analyzes data at high volume and high scale. Cloudera also develops a product called Streams Messaging, which is largely based on Apache Kafka and is offered within both CDF and CDP.
SQL Stream Builder provides another option for Cloudera customers to develop queries for data that is flowing in Streams Messaging and CDF. And while SQL Stream Builder is based on Apache Flink, users can also use Flink or Kafka Streams to develop the queries.
Cloudera CEO: Enterprise Data Cloud Vision Nearly Complete
Apache Flink Powers Cloudera’s New Streaming Analytics Product
Understanding Your Options for Stream Processing Frameworks