March 28, 2016

BI on Hadoop–What Are Your Options?

Jacques Nadeau

via Shutterstock

In the era of RDBMS and modern data warehouses, business intelligence was mostly a solved problem. Any reasonably advanced tool would work with any reasonable database, and the only real work was deciding what to collect and how to present it. However. the rise of big data and its associated technologies has forced the market solve all these old problems all over again, and we’re now left with a proliferation of software that can be difficult to differentiate.

During the course of my Strata + Hadoop World presentation, titled “BI on Hadoop: What are your options?” we’ll look at three primary categories of solutions to the ‘BI-on-Big-Data’ problem.

The first one is ‘ETL to RDBMS,’ in which pre-packaged or custom software is employed to create a relational database based on information extracted from a big data source. This approach essentially reduces the contemporary problem to the earlier and better-understood problem of ‘BI on RDBMS.’ In this section popular ETL tools are named, and an example flow of how to create an RDBMS from big data is shown.

The second category is a class of software that could be described as ‘monolithic solutions.’ This software takes an all-in-one approach that solves the problem of querying and visualizing big data all within a single package. We’ll discuss the architecture of three of these tools (Platfora, Datameer, and Zoomdata) and point out how these design choices influence the experience of using the software.

The final category is SQL-on-Big-Data solutions, which is comprised of three important sub-categories: native SQL (Drill, Impala, Presto), batch SQL (SQL on Hive and Spark SQL), and OLAP cubes. Fundamentally, these solutions provide a query engine layer on top of big data that provides an interface for SQL-enabled BI tools. We’ll be taking some time to compare and contrast these tools, and attendees curious about SQL-on-Big-Data will leave with a strong sense of what defines each sub-category.

Following the SQL-on-Big-Data section there will be a brief demo, in which Yelp datasets stored on both MongoDB and Hadoop are accessed from Tableau via Drill. The wrap-up for the talk will consist of a summary of the properties of these solutions and a heuristic for guiding enterprise adopters to the BI solution that might work best for them.

My session takes place Thursday March 31 from 2:40 to 3:30 in room LL20 D. For more info, click here.

About the author: Jacques Nadeau is cofounder and CTO of Dremio, a big data software startup aimed at the development of Apache Arrow, a new data format for columnar in-memory analytics. Prior to Dremio, Jacques led the Apache Drill development efforts at MapR Technologies.

Applications: Enterprise Analytics, Visualization

Technologies: Frameworks, Middleware

Sectors: Financial Services, Healthcare, Retail

Vendors: Dremio, MapR

Tags: Apache Arrow, bi, big data, Data Analytics, database, Dremio, Drill, in-memory analytics, SQL on Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

BI on Hadoop–What Are Your Options?

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

BI on Hadoop–What Are Your Options?

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link