Follow Datanami:
October 7, 2013

Kiji Opens the Big Data Kimono with ‘Chirashi’

Alex Woodie

The folks behind the Kiji Project have shipped a new release of the open source big data kit, the “Chirashi” release of its Kiji Bentobox, that should make it easier to build big data applications on Hadoop and HBase.

The Kiji Project is an open source framework sponsored by WibiData, a San Francisco, California startup founded by Christophe Bisciglia, who previously founded Cloudera, and Aaron Kimball, who created Apache Sqoop. The company was founded in 2010 and in 2012 introduced the open source Kiji Bentobox to speed and simplify the roll-out of big data apps running on HBase.

The Bentobox SDK is designed to bring the same level of automation and ease of use to HBase that the Spring framework brings to Java. The key component of Bentobox is KijiSchema, a Java API based on the Apache Avro data serialization tool that basically adds structure to less-structured data as it flows into HBase so that it can be more effectively analyzed.

This approach allows big data applications to “gracefully” adapt as the structure of datasets processed by Hadoop and HBase change over time and are asked to support complex data types, column keys, and time-series data. WibiData CTO Kimball explains this in more technical depth in this YouTube video:


“We share a data dictionary for each table with anybody who uses this information,” Kimball says in the video. “Compare this to an application built on top of HBase in a more direct fashion. You’d need to read the source code of any applications that read and write from HBase in order to determine where the data is and how it’s used. The KijiSchema layout management system puts all this information in one centralized place that people can query without needing to resort to reading source code.”

Other pieces that, up to this point, have been found in the Kiji Bentobox include KijiMR, which enables MapReduce to be used in a real-time manner; KijiHive for HiveQL access; and a Scala-based scripting language used for building machine-learning applications called KijiExpress.

With the release of the new “Chirashi” release of the Bentobox SDK, or version 1.2, the folks at WibiData have updated the various components. They have also added a couple of new ones, including KijiScoring, which helps developers create real-time predictive models and scoring functions. This new component could be used in an e-commerce setting, such as by factoring in a user’s geolocation data when deciding which offers or recommendations to push out to a user.

The Chirashi upgrade brings us version 1.3 of the KijiSchema, which supports layout validation. This helps developers by guaranteeing that new schema layouts are fully compatible, the company says. KijiSchema 1.3 also introduces experimental support for running on secure Kerberized Hadoop and HBase clusters, as well as better performance analysis of MapReduce jobs.

WibiData also bolstered its support for Hive. The company says that users gain support for all primitive and complex types in Apache Hive DDL generation as well as the ability to write back to Kiji tables. Bentobox users can access all the data in Kiji using HiveQL, JDBC/ODBC, or other Hive compatible business intelligence tools, the company says.

Chirashi is compatible with Cloudera’s CDH 4.1, CDH 4.2, and CDH 4.3. The Bentobox SDK also underpins several WibiData products, most notably the WibiData SDK and WibiData Core, which the company sells and supports through traditional subscriptions and licenses. The company also sells several big data solutions for specific industries. Its customers include OPower, an electricity usage analysis firm; Atlassian, a software as service (SaaS) company; and Mobile Posse, a mobile add delivery platform.

Related Items:

A Tale of Two Hadoop Journeys

MapR Gooses HBase Performance in Pursuit of Lightweight OLTP

Cloudera Search 1.0: Like Googling Hadoop