Follow Datanami:
July 21, 2014

Slicing and Dicing Music Data for Fun and Profit

Alex Woodie

The advent of big data analytics promises to have a profound impact on many aspects of human life, including how we work and play. Big data is even influencing the arts, where the field of music data science is rearranging our relationship with music.

We’re in the midst of a boom in music data science that can be traced back to 1999, when two important events occurred. First, Shawn Fanning unleashed Napster to the world, thereby giving people the power to share music in a way they never could before. That was also the year the Music Genome Project started using algorithms to categorize the world’s music across 400 attributes.

In the ensuring years, music would be among the first “killer apps” to benefit from the incredible technological gifts the Internet had given us in the areas of mobile computing, social media, and cloud storage. As millions worked feverishly to rip their CDs and encode them as MP3s, we suddenly became aware of the importance of metadata, which would become critical when Apple gave us iTunes and the first iPod in 2001.

In 2008, we were given another way to explore new music when Pandora Internet Radio launched its first streaming music app for the iPhone. As the custodian of the Music Genome Project, Pandora leveraged its extensive music database and classification engine to make customized music pandorarecommendations, and to stream that music to us over the Internet.

Since Pandora opened its venerable box, we’ve witnessed a flurry of similar offerings that leverage analytics to benefit both music producers and consumers. On the supply side, we see offerings like Next Big Sound, which uses predictive analytic tools to help record companies and producers choose which bands are most likely to be successful. Next Big Sound is currently branching out into the literary world, where it hopes to do for book publishers what it did for record companies.

Pandora remains very popular on the demand side, but not everybody is enthused with its licensing requirements, and it’s limited to customers in the United States, New Zealand, and Australia. In short, the world always favors lower prices and choice, and now startups are looking to chip away at Pandora’s early dominance in the field. The outfit with the best music data science and scientists will help determine the winners and the losers.

One of the up and coming music data science companies is Senzari, which launched its MusicGraph in late 2013. MusicGraph uses a combination of graph database technologies and machine learning algorithms to give customers new ways to explore music and artists. Senzari says its MusicGraph has about 1 billion data points from 20 million songs and millions of individual artists across all musical genres. But this dataset goes beyond the typical music metadata and includes proprietary data generated via feature extraction from the music itself, such as tempo, chord progression, and key. There’s even lyrical data, such as “bag-of-words” and concept extractions derived from lyrics.

senzari logoMusicGraph is the engine behind Senazari’s music streaming and recommendation service, called Wahwah. Today, Senzari unveiled MusicGraph.ai, a new offering designed to help record labels and other music supply-siders build their own sophisticated streaming and recommendation systems. Access to the MusicGraph.ai API starts at about $499. With a few hundred more spent on Amazon Web Services (AWS) storage, an online music company could be up and running quickly.

Buoyed by big data analytics and cheap online storage, the online streaming and recommendation business is booming. One of the established leaders in this relatively new field is The Echo Nest, a “music intelligence” company that provides the backend analytic engine that powers popular streaming channels and sites like iHeartradio, MOG, Rdio, SiriusXM, and Spotify, which acquired The Echo Nest in March.

The Echo Nest was founded by two MIT Ph.Ds., Tristan Jehan and Brian Whitman, and was backed by the MIT Media Lab before being bought by Spotify, a large Hadoop user that’s a big data success story in its own right. The Echo Nest boasts a music database with more than 1.1 trillion data points about 35 million songs and 2.8 million artists. In addition to powering more than 400 music streaming services, The Echo Nest slices and dices its huge musical database to let users discover interesting bits of information about music.echonest logo

For example, one of the pieces of data The Echo Nest provides for each song is its “Discovery score,” which attempts to identify songs from relatively unknown artists that are becoming more popular. In March, the company’s chief data alchemist analyzed the top 10,000 songs on the Discovery list, and found a disproportionate number of the most up-and-coming songs are from artists hailing from island countries and from Scandinavia, with Iceland at the top of the list. Apparently, being hold up indoors during cold winters is conducive to making good music, the company theorized.

While firms like The Echo Nest, Senzari, and Gracenote (another data science engineering company) are focused on using music data science to create new business models based on personalized recommendations, music data science projects exist in the open source world too. Take, for example, the Million Song Dataset, a freely available collection of metadata and audio features based on (you guessed it), one million songs.

The Million Song Dataset started in 2011 as a collaborative project between The Echo Nest and LabROSA, and was supported in part by the National Science Foundation, which started the Listening Machine Project in 2003. The goal of the Million Song Dataset is to provide a reference dataset for evaluating research and enabling data scientists to come up with better, more scalable algorithms for the creation of Music Information Retrieval (MIS).

Computers lack the creativity necessary to create good music, buy they are great at breaking down the phonic variables in the music itself, extracting human sentiment (via the Web) based on the songs we hear, and recommending similar music. While people’s taste in music is always changing, the world of music data science is evolving rapidly at the moment, and that’s good news for those who enjoy exploring new music and artists.

Related Items:

SoundCloud Liberates Data with Hadoop, Pentaho

Data Analytics Powers ‘Next Big Book’

Spotify Jumps on a New Elephant; Switches to Hortonworks

Datanami