Follow Datanami:
November 17, 2014

Seagate Announces ClusterStor Hadoop Workflow Accelerator

CUPERTINO, Calif., Nov. 17 — Seagate Technology plc, a world leader in storage solutions, today announced availability of the ClusterStor Hadoop Workflow Accelerator, a new solution providing the tools, services, and support for High Performance Computing (HPC) customers who need the best performing storage systems for Big Data Analytics. The Hadoop Workflow Accelerator is a set of Hadoop optimization tools, services and support that leverages and enhances the performance of ClusterStor™, the market leading scale-out storage system, designed for Big Data analysis. Computationally intensive High Performance Data Analytics (HPDA) environments will benefit from significant reductions in data transfer time with the Hadoop Workflow Accelerator. This solution also includes the Hadoop on Lustre Connector, which allows both Hadoop and HPC Lustre clusters to use exactly the same data without having to move the data between file systems or storage devices.

“Data-intensive computing has long been a part of HPC, but newer analytical approaches using Hadoop and other methods, such as graph analytics, will help drive strong growth in high performance data analysis, which is the market for Big Data needing HPC. The Hadoop Workflow Accelerator is designed to serve both the technical computing and commercial sides of this converging Big Data-HPC market that IDC forecasts will exceed $4 billion in 2018,” said Steve Conway, IDC Research Vice President, High Performance Computing. “IDC research shows that 29% of HPC sites already use Hadoop. The market will welcome tools that boost Hadoop performance and efficiency.”

The Hadoop Workflow Accelerator supports Hadoop distributions based on Open Source Apache Hadoop. Seagate is working with leading Hadoop distributors to offer best-in-class solutions to HPC customers and will provide tighter integration between the Hadoop Workflow Accelerator and other Hadoop distributions in future releases.

“Organizations not only want to manage the tremendous volume of data that they are collecting from a wide variety of sources, they also want to derive new insights that enable actionable intelligence and improve operational efficiency. Seagate’s award-winning ClusterStor scale-out HPC solutions, now with our Hadoop Workflow Accelerator options, enable organizations to optimize Big Data workflows and centralize data storage for High Performance Data Analytics solutions,” said Ken Claffey, Vice President of ClusterStor, Seagate Cloud Systems and Solutions. “TeraSort benchmark results have the Hadoop Workflow Accelerator outperforming Hadoop on the Hadoop Distributed File System (HDFS) by 38% on the same hardware. The Hadoop Workflow Accelerator meets our customer’s performance demands and optimizes the performance of Hadoop Ecosystem deployments, thus helping customers achieve the fastest time to results for their data intensive workloads and hardware configuration.”

The Seagate ClusterStor systems’ innovative scale-out HPC architecture enables a central repository allowing both HPC and Hadoop analytics tools to be run simultaneously on the same data sets in ClusterStor. The Hadoop Workflow Accelerator significantly reduces time to results by enabling immediate Hadoop data processing from the start of each job, and eliminates the time consuming step of bulk copying large amounts of data from a separate data repository. With the Accelerator, Hadoop environments can now scale computing and storage resources independently, increasing flexibility to optimize analysis resources, while supporting centralized high-performance data repositories of 100’s of PBs of storage capacity.

Hadoop Workflow Accelerator detail:

  • Tests run with Hadoop Workflow Accelerator on applications such as Mahout, Hive and Pig showed marked improvements for Apache Hadoop 1.0 distributions over standard storage configurations. TeraSort benchmarks show that the Hadoop Workflow Accelerator outperforms Hadoop on HDFS by up to 38%. Details on these and other benchmarks are available from Seagate.
  • The Hadoop Workflow Accelerator includes the Seagate developed Hadoop on Lustre Connector and an array of ClusterStor performance optimization best practices, system tuning methods, installation and configuration management tools, and professional services.
  • Expanding compatibility, the Seagate engineered family of Hadoop on Lustre Connectors extend support to several Hadoop eco-system packages such as Mahout, Hive and Pig to take advantage of the parallel read/write performance of the Lustre file system operating with high-speed networks such as 40 Gig-E and Infiniband.
  • The ClusterStor Hadoop Workflow Accelerator is compatible with both Hadoop 1.0 and Hadoop 2.0 or Yarn distributions and requires no code changes or re-compiling of either Hadoop or Lustre systems.
  • The Hadoop Workflow Accelerator is compatible with existing HDFS-based Hadoop installations. There’s no need to migrate data to Seagate ClusterStor prior to using the Accelerator as users can read from or write to ClusterStor and HDFS interchangeably, while running Hadoop jobs.
Datanami