Follow Datanami:
April 9, 2013

Erecting Operational Intelligence Using Machine Data

Isaac Lopez

In 2011, rains from the monsoon season in Thailand triggered flooding that caught residents by surprise, resulting in hundreds of deaths and billions in economic damages. Today sensors and real-time analytics are being used to develop early warning systems that authorities hope will save lives and resources in the event of future floods.

After post flood analysis reportedly discovered that a significant amount of the loss was preventable, a networking reseller in Bangkok decided to use their expertise to build a flood warning system powered by Splunk Inc.’s machine data software platform.

“When they did the analysis on what caused the financial damage from the floods, they realized that the biggest problem was not the flooding itself, but the fact that people could not get out of the way in time,” explained Tapan Bhatt, Senior Director of Product Marketing at Splunk.

As a result, “Water Alert!” was born – a real time water level monitoring system that collects, indexes, and analyzes open data made available through the “Department of Drainage and Sewarage” in Bangkok. Using the Splunk architecture, TCS has implemented a system that monitors and will send out alerts to subscribers of the system to alert them as water levels rise so that they can prepare and evacuate as necessary.

It’s an interesting use case for Splunk’s machine data search and aggregation software that has the potential to save lives, but it’s not necessarily how Splunk founders Erik Swan and Rob Das envisioned their product would be used when they launched it in 2006.

“It’s our customers who are taking us in those directions,” said Bhatt explaining that the original vision for Splunk was as a sort-of “Google for IT data” to help administrators monitor their end-to-end infrastructure to pinpoint performance issues within a network. Bhatt explains that once customers got their hands on the Splunk tools, they started to piece together their own ideas using the platform. “Suddenly, our customers started realizing that they can do a lot more things beyond that, and they started using us for all different types of use cases, which has evolved into the position that we have today.”

That position is as a sort of erector set for machine data. Splunk boasts a virtual library of creative use cases stretching across industry verticals, including government, online services, financial services, retail, telecoms, etc. for everything from security, payment processing, fraud detection, financial trading system efficiency, troubleshooting, and more.  

Bhatt says that customers often use the system to create real-time operational dashboards for whatever type of data they are monitoring. So whether it is water levels, web traffic, network or security monitoring, the chief limitation is having the sensor data to feed into Splunk’s three-tiered architecture, explains Bhatt.

The technology works by setting up what Bhatt referenced as data “forwarders,” the first tier of the Splunk architecture. These forwarders are essentially lightweight Splunk instances that typically reside where the data originates. “It’s not unusual for one of our customers to have 5,000 forwarders,” explains Bhatt.

Once set up, these forwarders consume the data by performing such operations as tagging the metadata (source, sourcetype, and host), configure buffering, compress the data, SSL security, etc. and then send the data into the second tier of the Splunk architecture, the indexers.

Bhatt explains that the indexers do as you would expect – they index the data so that the search head (which is the third tier of their system) can access the data. The indexers can be distributed, so you could have multiple indexers that are getting data from different forwarders that are fault tolerant, baking resiliency into the system, explains Bhatt. If one indexer goes down, another can pick up the weight.

“We have customer who might be using Splunk to index a gigabyte of data per day, and then we have customers using essentially the same technology to scale to hundreds of terabytes of data per day,” says Bhatt.

One of the major use cases that Splunk highlights is application management, explained Bhatt. In line with that  use case, the company this week announced version 5.0 of the Splunk App for monitoring Microsoft Windows Server, which they claim simplifies Windows server monitoring. Using the app, Splunk says that enterprises can monitor everything from their Windows Servers to the thousands of Windows-based laptops and PCs the enterprise might have in the field.

Beyond this sort of pre-packaged use case, which may be very useful (if not a little dry), Bhatt says that Splunk is always looking forward to seeing what their customers might do next with their system as the big data scene develops.  Splunk boasts 5,200+ customers for their platform, but Bhatt says he still sees plenty of room for maturity in the machine generated big data arena.

“I think what might happen over time is that there will be much more maturity in terms of people understanding exactly the right type of use cases and how people can use [big data technology], because that is lacking a lot right now.”

Related items:

Splunk Announces New Tools to Aid Big Data Developers 

Rometty: The Third Era and How to Win the Future 

Data Athletes and Performance Enhancing Algorithms