Hortonworks Previews Future After Massive Funding Haul
Fresh from the game changing announcement that they have received a $50 million dollar haul from new investors, Hortonworks is in full swagger as Hadoop Summit kicks off today. The company put more cards on the table this morning, revealing the launch of the “community preview” of the Hortonworks Data Platform (HDP) 2.0, bringing new tools to the Hadoop workbench. The company also revealed an extension of their strategic partnership with Teradata, allowing the traditional database vendor to resell their HDP platform.
The news of the fresh and substantial funding for Hortonworks is a significant shot in the arm for both the company and the Apache Hadoop open source community in general. The Hortonworks business model revolves around developing, distributing and supporting the open source Apache Hadoop framework, while harboring no proprietary layers – something that they point out that none of their competitors can claim.
Hortonworks CEO, Rob Bearden says that the new funding will be used in part for further investing in the Hortonworks engineering, which given Hortonworks business model, may prove to be a tide that floats all the boats in the Hadoop community, where several vendors rely on the core Apache distro to roll their own versions of the burgeoning data platform.
As part of that rising tide, Hortonworks has been working diligently with several contributors in the Apache Hadoop community on a shiny, new resource manager, YARN (currently in beta), which is seeing the light of day for the first time today as part of Hortonworks’ HDP 2.0 Community Preview. Where previously Apache Hadoop consisted of two sub-components (being the HDFS file system, and the MapReduce processing engine), YARN represents an evolution as a third sub-component in the base Hadoop framework (which is expected to be released as part of the of the Apache Hadoop 2.0 beta imminently).
“When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets,” wrote Arun Murthy, architect on YARN with Hortonworks in a recent article. “And do so in a way where multiple types of applications can operate efficiently and predictable within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.”
This approach doesn’t come without criticism, with some critics saying that projects such as YARN run the risk of reinventing the wheel. “More mature, readily available solutions already exist to resolve the scheduling requirements in the Hadoop ecosystem,” wrote Fritz Ferstl, CTO of Univa in Datanami recently. “It will take the likes of YARN…years to provide the same features and reach the same level of scalability and maturity.”
“I think the delineation is that this is all within the Hadoop platform,” explained Dave McJannet, Marketing VP at Hortonworks. “It’s not just about resource management, [but also] providing a common access model and a common security model for all of the components that plug into the infrastructure.”
Hortonworks says that the YARN based Hadoop architecture is already running at scale at Yahoo!, currently deployed on 40,000 nodes for 6+ months, a proof point that they say makes YARN ready for broader access.
Part of this broader access includes a certification program that aims to broaden the community for YARN through supporting application developers in building and certifying applications to use the new architecture. Hortonworks says that they have more than 14 partners at launch, including IBM (DataStage), Microsoft, Platfora and others, with Splunk, Elastic Search, Altiscale, and Concurrent already having certified applications.
Hortonworks Pushing Hadoop Adoption with Teradata Appliance
As part of the announcement stack, Hortonworks and Teradata announced that they are pushing the adoption of the Hortonworks Data Platform (HDP) through an extension of their strategic partnership. The partnership enables Teradata to resell and offer support for HDP, which will be sold as part of four different Teradata distributions – two premium Teradata appliances, a new commodity offering (through a partnership with Dell), and a software only solution supported through Teradata.
The two premium appliances include the Teradata Aster Big Analytics Appliance, which is their full service platform with all the Teradata Aster bells and whistles, including their proprietary SQL-MapReduce with Apache Hadoop, and over 70 pre-packaged analytic functions. The other appliance, the Teradata Appliance for Hadoop, will offer the standard Hortonworks Data Platform on hardware that they say is integrated and optimized for enterprise-class data storage and management. A software stack, including the Informatica platform, Protegrity Big Data Protector, and Revelytix Loom will also be included.
While the distribution represents another feather in Hortonworks cap, it also puts one squarely in the Hadoop cap, further legitimizing it as a core component in mainstream IT as another appliance comes online aimed at the enterprise datacenter.
Both companies refer to an emerging data architecture, which is becoming familiar in Hadoop circles by now, where Hadoop is being used to land data where organizations might not have figured out its value – which especially includes unstructured data that cause problems in traditional database settings).
Key to this announcement says Steve Wooledge, VP of Teradata’s Unified Data Architecture, is Teradata’s training and consulting services, aimed at easing the deployment and integration of the notoriously difficult Hadoop.