The Last Hadoop Data Management Tool You’ll Ever Buy?
The rise of big data has shaken up the data warehousing market, and one of the established vendors still looking to regain its footing is Informatica, which last year was taken private in a $5.3-billion leveraged buy-out. With today’s launch of Big Data Management 10.1, the company is betting that Hadoop customers will spring for an end-to-end tool that checks off multiple boxes.
There’s no shortage of tools for managing data in a Hadoop environment. That’s not necessarily a good thing, according to Informatica’s Vice President of Marketing for Enterprise Software Ash Parikh. Instead, of picking from a wide diversity of open source and vendor-supplied tools, Parikh says Hadoop users would be better off buying a single unified toolset that does everything they need.
“The new [big data] world is very different. But still, the principles of data management are the same,” Parikh tells Datanami. “You still need to make sure you have data that can be trusted, that you can access to all the data that’s being thrown at you, that you can transform it and do everything you need to do with it.”
Most companies that are using Hadoop accomplish this data management in a manual fashion, often with a lot of hand coding, he says. That approach may work for a while, but it falls apart when the scale of the big data initiatives grows, or the pace of the work picks up.
Automation and repeatability are clearly in Hadoop’s future. The question is whether Informatica can regain its standing as the gold-standard for data management in the enterprise. The company is placing its bets on its singular Big Data Management suite, which attempts to hit multiple data birds with a single stone.
Parikh says the Big Data Management offering–which was repositioned last fall as the go-to product in Informatica’s Hadoop strategy–was rebuilt from “the ground up” to run natively on Hadoop. While the software builds on the lessons Informatica learned from its years as a dominant supplier of ETL tools, it is much more than an ETL tool, he says.
“You need something to get data into Hadoop. You need to transform and cleanse it and you need to match it and link it. How are you going to do it?” Parikh says. Currently, the work is done either through hardcoding or by cobbling together multiple tools, but both options leave something to be desired.
“How many open source tools are you going to use?” he says. “If your steps are acquiring, ingesting, transforming, securing, matching, and governing [data] how many tools does it take your customer to bring together to actually solve the problem? We offer a single integrated metadata-driven tool which is optimized with excellent speed and performance to help you do this.”
Informatica is hoping that its end-to-end mantra resonates with enterprises struggling to scale up their Hadoop initiatives while keeping IT departments and business managers happy. Analysts say we’re in the midst of an inflection point in the market for data prep tools, and that 60 percent of the market will be served by self-service solutions designed to be used by business people, as opposed to IT professionals. Informatica has a strong reputation among CIOs and IT managers, but is still building credibility in the market for self-service tooling, and completing with companies Trifacta, Paxata, Tamr, Snap Logic, and others.
Parikh says the current crop of self-service tools can only satisfy part of the big data equation. “You already have fragmentation at the data level. Do you want fragmentation at the tool layer?” Parikh asks. “Informatica is building on our rich heritage in data management…to build this technology from the ground up to create this infrastructure for big data management.”
With the new 10.1 release, Informatica is adding support for Apache Spark. Big Data Management customers now have a choice to use the native YARN engine, called Blaze, the new engine built on Spark, or a MapReduce-based implementation. The Blaze engine is two to three times faster than the Spark engine, but people want choice, Parikh says. The new release also brings support for hosted Hadoop solutions, such as Microsoft‘s Azure HDInsights and Amazon‘s Elastic MapReduce (EMR).
Hadoop is (or should be) on the radar screen of every major company in the world, thanks to the dramatically reduced costs it brings to the storage equation. Customers that previously were constrained in the amount of data they could store in their data warehouses are finding Hadoop a great solution for either offloading some workload from the data warehouse or creating a new data repository.
As freeing as Hadoop has been for data storage, the big data platform is running into trouble when it comes to management. What some have called Hadoop’s “junk drawer” problem threatens to derail data gathering and analytics initiative before they really get going. That ultimately is the problem Informatica–and just about every other data management tool vendor–is aiming to solve. It just so happens Informatica is aiming to solve the entire problem in one fell swoop.
“We’re offering it up for all those people who want to balance self-service with IT governance and control,” Parikh says. “We’re offering up something called the intelligent data lake, which takes self-service data prep to a whole new level. It brings the business person and IT person together.”
Big Data Management 10.1 is available via both traditional license and subscription methods. Informatica declined to provide specific pricing.