IBM Taps Zaloni to Ride Herd on Hadoop
One of the bumps on the road to Hadoop Nirvana is the overall lack of controls built into the platform. For a large enterprise, the immaturity of the stack presents major concerns in the areas of productivity and security. One vendor hoping to capitalize on the need for better Hadoop management tools is Zaloni, which got a big boost last week when IBM agreed to OEM its software and sell it as part of InfoSphere BigInsights.
In some ways, Hadoop is a victim of its own success. Organizations have rushed in to take advantage of the new processing paradigm that provides powerful ways to analyze vast stores of data, only to find the supporting cast of tools a little thin and a little green. Security is not strong out of the box, and Hadoop doesn’t provide the same types of monitoring and management tools that large enterprises may be used to with their transactional and analytical systems.
Now that Hadoop has a foothold in the enterprise, it’s only natural that the Hadoop community responds by not only strengthening the Hadoop “kernel,” if you will, but also by fleshing out the entire ecosystem surrounding that core. We’re witnessing that maturation and building process right now, both by the major Hadoop distributors like Cloudera, Hortonworks, and MapR Technologies, but by third-party vendors like Zaloni too.
Zaloni’s offering, called the Bedrock Data Management Platform, was designed to serve as a lightweight framework that organizations can use to build big data applications on Hadoop. The Java-based suite of software addresses four key areas–data ingestion, workflow and orchestration, metadata management, and security–that the Hadoop distributions do not provide but that are required by large enterprises, says Zaloni CEO Ben Sharma.
It’s all about spending deployments, Sharma says. “These big data applications take a really long time to build and deploy, which increase the cost and the risk,” he tells Datanami. “When a CIO is looking at Hadoop as an enterprise data platform, unless this level of maturity is there in the platform, it’s hard to justify building some of these use cases they’re looking at building. They don’t want to think about it as an unfenced file system. They want to have more of a structure and discipline around how people are accessing the data and how the data is brought in.”
Bedrock’s landing zone can be used to prep, tag, and transform data before loading it into Hadoop for processing. The software also integrates closely with MapReduce and Hive to run the transformation comments of ETL within Hadoop–what the industry calls ELT. The software often works with ETL tools, such as Informatica or Talend. And for those used to writing scripts to execute data movement into Hadoop, Bedrock’s HTML5 interface will streamline the task of designing data workflows.
Sharma talks about Bedrock providing an “end to end data pipeline” that production teams can use to monitor data flows in Hadoop, and send up alerts when stuff goes wrong. “That’s where they can use our end to end data pipeline, to stitch it together, along with the metadata layer,” he says. “These are typical features that you’d need in an enterprise data platform from an operational standpoint, and that’s what we’re bringing to the table.”
The software goes beyond what open source tools like Uzi can provide, Sharma says. Because Uzi only provides data movement capabilities for data in Hadoop data, it doesn’t add much value to customers with enterprise data warehouse investments.
On the compliance front, Bedrock can tokenize data flowing into Hadoop, providing an added layer of security. It also can be used to ensure adherence to service level agreements (SLAs) when running a Hadoop service.
Bedrock provides several capabilities in the area of metadata. For starters, it timestamps everything as it comes into the environment, which allows Bedrock to track the data as it flows into Hadoop and other downstream systems. It can also help add structure to data that may not have much metadata in it, such as log data and machine data.
“It is significant to have the metadata, so that your data scientists aren’t spending half or more of their time trying to figure out what this data means,” Sharma says. “As a data scientist, if I’m trying to find a certain field, I can first query the metadata and, based on it, I can see which data sets it belongs to, and based on that, run queries and MapReduce and get my results.”
IBM obviously sees value in the work Zaloni has done with Bedrock. That’s no small feat, considering the army of tools IBM already has with its InfoSphere, Guardium, and MDM product lines. As part of the partnership with IBM, Zaloni will be working with IBM to integrate with some of those products, Sharma says.
Zaloni was founded in 2007, and has been working with Hadoop since 2009, primarily with Fortune 100 firms, Sharma says. The company is based in the Research Triangle Park area of North Carolina, and has 115 employees. It’s self-funded at this point, but is in the middle of a Series A round of financing to grow the business.
“We’re not a five-person company trying to sell an idea,” Sharma says. “We have marquee customers within different verticals–teleco and financial services. What we now need to do is scale out and grow.”