BlueData Announces Bare-Metal Performance for Hadoop on Docker Containers
SANTA CLARA, Calif., Mar 15, 2017 — BlueData, provider of a Big-Data-as-a-Service (BDaaS) software platform, today announced breakthrough performance results. The results from a new Intel benchmarking study show comparable performance for Hadoop when running in a bare-metal environment or in a containerized environment using the BlueData EPIC software platform. This study proves that it is possible to deliver the benefits of containerization for Big Data workloads without paying a penalty in performance.
This groundbreaking benchmarking milestone was the result of ongoing collaboration between the Intel and BlueData software engineering teams. The detailed results, methodology, and specific Hadoop software and Intel hardware configurations for the benchmarking are described in this new white paper: Bare-metal performance for Big Data workloads on Docker* containers.
Intel Xeon processor architecture provides a high-performance, security-enabled, and robust foundation for Big Data analytics. Leveraging the power of Docker containers, the BlueData EPIC software platform makes it easier, faster, and more cost-effective to deploy Big Data infrastructure and applications — including Hadoop, Spark, Kafka, Cassandra, and more — whether on-premises, in the public cloud, or in a hybrid architecture.
Working closely with BlueData, Intel ran systematic performance comparisons using the BigBench benchmark kit for on-premises test environments on Intel Xeon processors:
- Apples-to-apples comparison: The Intel team evaluated and benchmarked identical configurations for a bare-metal environment versus a containerized environment using BlueData EPIC. Both test environments used the same hardware and were configured using the same Hadoop software – benchmarked at 10, 20, and 50 Hadoop compute nodes with 10 terabytes of data in HDFS. The Big Data workloads in both test environments were deployed on the Intel Xeon processor E5-2699 v3, which help reduce network latency, improve infrastructure security, and minimize power inefficiencies. Both test environments also used Intel Solid-State Drives to optimize the execution environment at the system level.
- Ground-breaking performance results: The results from this in-depth benchmarking study show performance for containerized Hadoop on BlueData EPIC to be comparable to the performance for bare-metal Hadoop. In fact, in some cases, the containerized environment achieved superior performance to the bare-metal environment. For example, the BlueData EPIC platform demonstrated an average 2.33%* performance gain versus bare-metal across three test runs for 50 Hadoop compute nodes and 10 terabytes of data in HDFS. This performance boost is due to BlueData EPIC’s proprietary IOBoost technology, which enhances input/output (I/O) performance using asynchronous storage I/O and data caching.
- Industry-standard benchmark with real-world use cases: BigBench is an industry-standard benchmark to measure the performance of Big Data analytics frameworks in the Hadoop ecosystem, including MapReduce, Hive, and Spark MLlib. This benchmark provides a realistic measurement and comparison of performance by implementing 30 queries that simulate Big Data processing, analytics, and reporting in real-world use cases. The BigBench data model includes structured data, semi-structured data, and unstructured data; it covers a range of essential functional and business aspects for Big Data use cases.
- No modifications to the Hadoop software: BlueData EPIC allows enterprises to deploy Big Data frameworks and distributions unmodified, running in Docker containers. The use of Docker is completely transparent, but BlueData customers benefit from the agility, flexibility, and efficiency advantages of containers. For this benchmarking study, both the bare-metal and containerized test environments used the same open source Hadoop distribution with no modifications. Because BlueData runs Hadoop distributions and other Big Data frameworks completely unmodified, these performance results also apply to other Hadoop distributions as well as other Big Data frameworks such as Spark standalone.
- Collaboration with Intel: In August 2015, Intel and BlueData embarked on a strategic technology and business collaboration agreement. One of the goals was to ensure optimal performance for BlueData EPIC running on Intel Xeon processors. The outstanding results for BlueData EPIC in this benchmarking study were due in part to the ongoing engineering collaboration between Intel and BlueData to investigate, benchmark, test, and continuously improve the software platform. Working together, BlueData and Intel have shown that the performance for Hadoop in a containerized environment (using the BlueData EPIC software platform) is on par with the identical set-up on bare-metal.
“Intel has been great in helping us to optimize and enhance the BlueData EPIC software platform, putting it through its paces to get the best possible performance,” said Kumar Sreekanti, co-founder and CEO at BlueData. “Together, we’ve shown that you can achieve the same performance — or even better — for Big Data workloads running on our container-based platform. The results are a testament to the collaboration between our teams.”
Intel’s collaboration, performance testing, and feedback helped BlueData make ongoing software enhancements to ensure high-performance Big Data deployments. The collaboration resulted in an unprecedented performance milestone for Big Data workloads running in Docker containers. With this breakthrough, BlueData and Intel can enable enterprises to take advantage of containerization to simplify and accelerate their on-premises Big Data implementations — while ensuring the best possible performance. And with BlueData, these customers can run Big Data analytics using the same Docker-based application images for both on-premises and public cloud deployments — leveraging the inherent infrastructure portability of containers.
“BlueData delivers greater simplicity, agility, and cost-efficiency for Big Data deployments,” said Michael Greene, vice president and general manager of System Technologies and Optimization in the Software and Services Group, Intel Corporation. “Now, working together, we’ve demonstrated that you can achieve these benefits while also ensuring performance that’s comparable to bare-metal Big Data implementations.”
BlueData is highlighting these performance results and demonstrating the BlueData EPIC software platform running on Intel Xeon processors at the Strata + Hadoop World event in San Jose this week (at booth #1415). BlueData’s co-founder and chief architect, Thomas Phelan, will be presenting “Benchmarking Performance for Hadoop on Docker Containers versus Bare-Metal” in the Intel booth (#917) at the event on Thursday March 16th at 10:45am.
Intel white paper: Bare-metal performance for Big Data workloads on Docker* containers
BlueData blog post: The Proof is in the Pudding (I Mean, in the Benchmarking)
Intel blog post: Performance and Agility with Big Data in a Containerized Environment
* The study used the BigBench benchmark kit (https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench) and the query-per-minute (Qpm@Size) performance metric, where size is the scale factor of the data. The results showed that the BlueData EPIC platform demonstrated an average 2.33% performance gain versus bare-metal (as measured by Qpm@10TB) over three test runs for a 50 Hadoop compute node configuration with 10 terabytes of data in HDFS.
Note: The BigBench benchmark kit used for these performance tests is not the same as TPCx-BigBench, the TPC Big Data batch analytics benchmark. As such, these results are not directly comparable to published results for TPCx-BigBench.
About BlueData Software, Inc.
BlueData is transforming how enterprises deploy their Big Data applications and infrastructure. The BlueData EPIC software platform uses Docker container technology to make it easier, faster, and more cost-effective for enterprises of all sizes to leverage Big Data — enabling Big-Data-as-a-Service either on-premises or in the cloud. With BlueData, they can spin up virtual Hadoop or Spark clusters within minutes, providing data scientists with on-demand access to the applications, data, and infrastructure they need. Based in Santa Clara, California, BlueData was founded by VMware veterans and its investors including Amplify Partners, Atlantic Bridge, Ignition Partners, and Intel Capital. To learn more about BlueData, visit www.bluedata.com or follow @bluedata.