Follow Datanami:
April 15, 2020

Pepperdata Adds Kafka Monitoring to Tune Queries

A new tool for tracking data analytics performance adds monitoring capabilities based on the Apache Kafka streaming data platform. The combination aims to provide better visibility across analytics stacks deployed in hybrid configurations.

The result is said to be improved understanding of query execution and database performance.

Pepperdata, the analytics performance specialist, said this week its Kafka-based monitoring tool targets mission-critical streaming applications. The Streaming Spotlight tool integrates the Kafka distributed event streaming platform to boost visibility into Kafka cluster metrics.

Kafka serves as a central hub for messaging systems, often handling trillions of messages daily. The complexity of managing data pipelines has grown, requiring monitoring tools to help maintain optimal efficiency, the Silicon Valley company said Tuesday (April 14).

“The ability to be alerted to faults, and then pinpoint those issues in near real time for remediation is no longer a luxury in mission-critical environments,” said Pepperdata CEO Ash Munshi. Hence, the Spotlight framework provides a new monitoring capability as Kafka emerges as a de facto standard for streaming applications.

The company notes that the shift from traditional processing on databases to streaming architectures is creating requirements for tracking Kafka performance as it is adopted for agile microservices communications.

The new tool provides alerts about key Kafka performance metrics used to detect anomalies in IT infrastructure. That enables correlation of Kafka metrics with other analytics aimed at application and infrastructure metrics, including Hbase, Hive, Impala and Spark.

The monitor also tracks Kafka data streaming capacity requirements to maintain performance while forecasting IT resource requirements. “Kafka is not only a highly distributed environment, it’s also high volume, high velocity and diverse in terms of events or records coming in,” Charles Marker, Pepperdata’s vice president of engineering, noted in a blog post. “Since Kafka events are real time, you need real-time visibility.”

Among the outputs from application, platform and query “spotlights” is an assessment of query execution and database performance, the company said.

The Spotlight platform provides insights required “to tune analytics workloads, and we’re seeing more and more queries make up the bulk of workloads in these environments” said Kirk Lewis, a Pepperdata field engineer.

In one example, the monitoring tool “captures all the application query stages, and shows the ones that were not executed,” Lewis added. The visualization is designed to show which query workloads are ripe for tuning, debugging and optimization for better performance.

Those capabilities are touted as allowing IT operators to spot resource-intensive queries, then “optimize query performance at scale,” Pepperdata said. Meanwhile, developers could use Spotlight to spot problems with queries, identify bottlenecks and resolve application issues.

The framework combines root cause analysis with greater visibility into query workloads, “including delayed and most expensive queries as well as wasted CPU and memory queries,” the company said.

Along with Hive, other frameworks that can be monitored include Amazon Redshift, IBM BigSQL and the Snowflake cloud-based analytics platform.

Recent items:

Kafka Transforming Into ‘Event Streaming Database’

Pepperdata Takes on Spark Performance Challenges