September 26, 2016

Yahoo Unleashes HBase Transaction Manager

George Leopold

A transaction manager for the NoSQL database HBase has been approved as an open-source incubator project, according to project sponsor Yahoo.

The HBase transaction manager dubbed “Omid” (“Hope” in Persian) is the latest in a string of Hadoop ecosystem projects backed by Yahoo that also includes Pig, Storm and YARN. Yahoo said Omid addresses the growing need for transactions in many applications using NoSQL data stores as the primary source of data. One example, it said, is in incremental content processing systems.

Yahoo engineers noted that Omid attempts to leverage the scalability of NoSQL data stores such as HBase along with the concurrency and all-or-nothing atomicity provided by transaction processing systems.

They further claimed that Omid is among the few open source transactional frameworks capable of scaling to thousands of clients triggering transactions on application data. That translates into scaling beyond 100,000 transactions per second on “mid-range hardware” with minimal impact on accessing data stores.

Yahoo’s research arm launched Omid on 2011 and it has been used internally since 2014 along with Hadoop technologies to run its incremental content ingestion platform for search and personalization applications. The company estimates Omid is serving millions of transactions per day over HBase data.

“We think [moving Omid to Apache] is the next logical step after having battle-tested the project in production at Yahoo and having open-sourced the code in Yahoo’s public Github in 2012,” the Web giant said in a blog post.

The Omid developer community has been gaining momentum in recent months, Yahoo said. For example, collaboration with contributors to the Apache Hive data warehouse infrastructure at Hortonworks led to the storing of Hive metadata in HBase using Omid. Elswhere, Omid could be used as a transaction manager in other SQL abstraction layers such as Apache Phoenix while running on top of HBase.

Alternatively, the company added, it could be used as transaction coordinator for in distributed systems such as the Apache DistributedLog project or Pulsar, a distributed “publish and subscribe” messaging platform released by Yahoo to the open source community earlier this month. Yahoo designed Pulsar to scale horizontally on commodity hardware, and to provide messaging as a service to multiple applications. The system can scale to handle millions of independent topics and millions of messages published per second, according to Pulsar’s GitHub page.

The transaction manager utilizes a lock-free approach to support multiple clients and relies on a centralized conflict detection component to resolve write-set collisions among concurrent transactions. Developers added that Omid requires no modifications to the underlying HBase key-value data store.

It also features a simplified API that mimics transaction manager APIs in relational databases. Client and server configuration processes also were simplified to help both application developers and system administrators.

Recognition of Omid as an Apache incubator project should accelerate the development of more features designed to boost performance and improve latency, company developers predicted.

Recent items:

Yahoo’s New Pulsar: A Kafka Competitor?

Inside Yahoo’s Super-Sized Deep Learning Cluster

Share This