Follow Datanami:
May 23, 2012

Yahoo’s Genome Brings Data as a Service

Datanami Staff

When one thinks about companies with big data at their core, Yahoo might come to mind as an afterthought, even though the company has been dabbling in ways to wrangle massive web data since its inception at Stanford in the mid-1990s.


While not necessarily a “big data” vendor (at least until this week) the company has been instrumental in pioneering work on Hadoop along with other notable projects in web and search mining and machine learning.


This week Yahoo announced it would be turning some of its research efforts outward with the intention of showing it’s capable of competing with established analytics platform providers, specifically in the lucrative online advertising market. The added benefit to the service is that it frees companies from having to buy their own infrastructure and experts to man the analytics operations.


While the new “Genome” service is targeted at customers that are finding new ways to target advertising down to the ultra-granular user level, there are a few notable elements that are worth pointing out, especially as they can apply to businesses that are still in search of a reliable, tested and scalable platform for big ad analytics.


The data as a service offering will let advertisers comb through Yahoo’s own terabytes of data across its own networks and those of its partners to let advetisers mash their data together with that of Yahoo and company’s in real-time.


As Jaikumar Vijayan describes, Genome is based on technology from interclick, a company that Yahoo acquired last December. At its core is a 20-terabyte in-memory database that pulls in and analyses real-time behavioral and advertising-related data from Yahoo’s multi-petabyte scale Hadoop clusters. The company is using a blend of proprietary technology and best-of-breed commercial products from vendors such as Netezza and Microstrategy to do the data analytics on the real-time data


Yahoo is not the first web giant on the block to create a service that lets users mash their own data with that of a large web services provider in a cloud/data-as-a-service model. Google’s BigQuery, for instance, which launched a couple of weeks ago allows users to do approximately the same thing—but with Google data, which is arguably more prolific.


Other companies that offer similar data-as-a-service offerings that let users mash their data in with that of other large-scale sources include Metamarkets, which also is a major player in the quick-time online advertising market.


Related Stories


Six Super-Scale Hadoop Deployments


Inside LinkedIn’s Expanding Data Universe


Mega-Retail’s Big Data Hunch