January 13, 2014

Toward Comprehensive Big Data Benchmarking

Nicole Hemsoth

It’s difficult enough to keep track of the ebb and flow of new tools that span the big data ecosystem, let alone keeping tabs on the latest measures and standards by which to evaluate the influx. While there are standards for a number of specific application areas, architectures, and programmatic approaches, it’s difficult to get a more comprehensive view across solutions and system-wide needs.

A team of researchers from the Institute of Computing Technology at the Chinese Academy of Sciences have tackled the problem of benchmarking big data with a new tool called BigDataBench.

The effort, based on input from direct research and a team of outside industrial partners, will examine the whole of larger application scenarios where there are “diverse and representative datasets.” They are using the core of 19 existing benchmarks that bring in pieces of information from application scenarios, the operations/algorithmic angle, data source handling, software stacks and application types. They noted that using a standard Xeon (E5645) there were some notable comparisons with their benchmarks versus the discrete types (PAR-SEC, HPCC and SPECCPU among them). Their results can be found in more detail here.

The benchmarking suite includes six real-world data sets, and 19 big data workloads, covering six application scenarios: micro benchmarks, “cloud OLTP,” relational query, search engine, social networks, and e-commerce.

BigDataBench also provides several big data generation tools to generate scalable volumes of big data, e.g, PB scale, from small-scale real-world data while preserving their characteristics. A full spectrum of system software stacks for real-time analytics, offline analytics, and online service is being included. The sample data sets including those from Wikipedia (over 4 million articles), a Google Web graph with 875,713 nodes and 5,105,039 edges, massive e-commerce transaction data in structured format and more. This provides a well-rounded view of different data types and puts the results in more defined context.

As the team notes, “considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architectures.” The problem is, most of the benchmark efforts to date only consider specific applications or software stacks and are too limited for a broader effort.

The benchmark is freely available via the open source project but does require some ramping up time as it’s not simple to navigate. However, for those looking to move beyond a monolithic benchmarking effort, especially if using Xeon E5-series processors, this could be a handy tool.

Applications: Data Mining

Technologies: Middleware

Sectors: Academia

Vendors: Startups and More...

Tags: cloud, oltp

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Toward Comprehensive Big Data Benchmarking

Join the discussion Cancel reply