Follow Datanami:
December 3, 2013

Splunk Gets ‘Lucid’ About Solr-Powered Search

Alex Woodie

Splunk and LucidWorks today announced a partnership that aims to help Splunk customers get more value out of the machine data they’re storing. The partnership does this by hooking Splunk into LucidWorks Solr-powered search engine and delivering contextual connections between Splunk data streams and data stored in other systems.

The new Solr Monitor App for Splunk Enterprise offering is aimed at existing Splunk customers. Part of the partnership entails enabling Splunk users to access LucidWorks‘ Solr-powered search capabilities directly from their Splunk Enterprise user interfaces. LucidWorks is the commercial vendor behind version the open source Apache Lucerne/Solr search project, and LucidWorks Search its flagship product.

The announcement makes Splunk another one of the big data repositories that LucidWorks works with. LucidWorks already has similar partnerships in place with NoSQL database vendors like Datastax and MongoDB, as well as Hadoop vendors like MapR Technologies and others; Cloudera opted for open source Apache Lucene/Solr. In addition to big data repositories, Solr is a ubiquitous search engine used to crawl and index Web sites, relational databases, and content management systems.

The new offering puts Solr-driven dashboards and other “content” directly in front of Splunk users eyeballs. It also brings administrative capabilities to allow managers to monitor how Splunk users are using search, and wizard-driven utilities to streamline the process of connecting LucidWorks Search with a Splunk Enterprise installation.

But the most promising part of the partnership is the way that Solr can connect Splunk-resident data with data stored elsewhere, and to enable users to make contextual connections between them on the spot using natural language search processing, explains LucidWorks chief product office Will Hayes, who recently joined LucidWorks from Splunk, both of which call the San Francisco Bay Area home.

“One thing that was always an issue for me there [at Splunk] was, I’ve got all this machine data that gives me so much intelligence around what’s going on, whether it’s website security or operations,” Hayes tells Datanami. “Splunk does a lot of amazing things. I’m very proud of the work we’ve done there. But it’s really limited architecturally to just that machine data, just that text data with times stamps. That’s really what it was optimized for.”

Splunk is great for alerting you to events that may be of interest to you. If you’re a security analyst, for example, Splunk will tell you a large network transfer was initiated across the firewall, but it doesn’t tell you anything else about that transfer, such as the contents of the payload or anything about the user who initiated the transfer. That context is needed to determine if the file transfer is a security breach or an abnormal but harmless event.

“Maybe I’m storing my user profile information, so Spunk tells me the user name equals Scott,” Hayes says. “In Solr, I have all this information about Scott’s location, his department, where he reports in the organization, his salary. There was always this gap between what machine data can tell you and what context was being stored in things like Solr.”

With Solr crawling out into their other systems and keeping an index of the data at the ready, it’s a simple matter of cross-referencing the value and following the pointer stored in Solr. With the additional information provided by Solr, the security analyst can dive into the information about Scott, and decide whether to take further action.

“We built an interface, so from right inside of Splunk, you can find a machine log of interest, and you can do a single click and say ‘Go pull me this data from Solr. Show me this user Scott. Which documents has Scott authored inside SharePoint?'” Hayes says.

“So Solr is telling me this guy Scott is a pharmacist or a researcher, and this email contains information about a particular therapy or a drug. That’s where the contextual intersection between the machine data inside of Splunk, and all the rich document data and personnel data–really everything else–inside of Solr can be joined together at runtime.”

Solr Monitor App allows Splunk Enterprise has already undergone a private beta test, and is available now. Splunk customers still must purchase the LucidWorks product separately.

Hayes is clearly bullish on the possibilities of Solr and the way it can help users pull useful information out of their big repositories of variously structured data.

“People talk about NoSQL. That’s exactly what Solr has been doing,” he says. “It’s been consuming lots of structured, unstructured, and semi-structured data, and allowing you to search it. First and foremost, it allows you to do key word searching. The capabilities underneath are the same as what makes the NoSQL data stores interesting.”

Related Items

Hadoop Distros Orbit Around Solr

MapR Distributes LucidWorks Search with MapR Platform for Apache Hadoop

LucidWorks Integrates with MongoDB