Hadoop Helps Maintain Music Genome Project
Keeping track of 67 million monthly users listening to 1.38 billion hours’ worth of music is quite the data intensive project for popular music recommendation site Pandora, which employs Hadoop as part of their music delivery platform.
The online company must keep a significant big data operation to keep its recommendations relevant, keep its listeners, and, incidentally, turn a profit. Pandora’s Brett Uyeshiro shared his company’s big data collection, aggregation, and analysis methods at the SVForum.
“Hadoop is the core of our analytics infrastructure,” Uyeshiro said. Pandora’s Hadoop cluster involves 200 nodes and holds about 1.3 petabytes. The company made the switch a few years ago when they realized their old system was able to take in more data but not actually do anything with it. As a result, according to Uyeshiro, they “migrated all of our business analytics and all of the offline processing work that our recommendation team does to Hadoop.” As a result, according to Uyeshiro, they “migrated all of our business analytics and all of the offline processing work that our recommendation team does to Hadoop.”
Pandora is, of course, a business that makes money through ad revenue and premium subscriptions. With as many users as they have, they need to analyze big data to deliver the proper ads to the people likely to respond to them.
However, the interesting use case that Pandora presents is the process through which they have gained so many users. Pandora is popular because it is adept at playing music that users like.
It seems a simple concept, but playing to people’s preferences is not actually that, well, simple. A large amount of people do not really know what they like. For example, a person who enjoys Irish folk music, Dropkick Murphys, and bluegrass possibly has no idea that Mumford and Sons could secretly be their favorite band.
About five years before Pandora became an internet service, they started the Music Genome Project that set out to determine which properties of music link preferences. As a result of the project, highly trained music analysts have come up with 400 attributes for pop songs and about 450 for classical and jazz pieces.
Pandora currently employs 25 musical analysts who listen to music all day and rate songs based on those attributes according to Uyeshiro. In essence, those analysts do the old-school analytics that gets imported into the Pandora computer analytics system that is presented to the users.
From there, the users generate about 25 billion pieces of feedback, according to Uyeshiro. Pandora takes that data and develops playlist metrics that keep listeners happy through a philosophy of “experiment first, standardize later.”
An example of that experimentation was that the programmers toyed with the idea of playing more Beatles songs for users that liked the Beatles. However, as they tried out that adjustment to the algorithm, they noticed that the total listening hours devoted to the Beatles remained the same.
Among the top metrics the team has found over years of “experiment first, standardize later” include ‘listener return rate,’ or the percentage at which users continue listening to a certain playlist. That metric has supplanted ‘thumb-up percentage,’ which fell out of favor as more people listened to Pandora on their phones in their pockets or in their cars, as well as ‘total listening hours,’ which gave too much of a priority to heavy listeners.
Companies frequently look to big data analytics to understand consumers’ preferences. In this case, however, Pandora helps listeners understand what they themselves will enjoy as those preferences are almost never set in stone and are often unknown.
ScaleOut Building Real-Time Bridge to Hadoop with hServer
Baldeschwieler: Looking at the Future of Hadoop
Hadoop Data Management Set to Fly with Falcon