Follow Datanami:
September 17, 2013

Spotify Jumps on a New Elephant; Switches to Hortonworks

Isaac Lopez

Swedish music service, Spotify, which boasts a 690-node Hadoop cluster – believed to be Europe’s largest commercially used cluster, has decided to switch elephants mid-stream. Where the company had previously used Cloudera’s Hadoop distro, they will now be saddled up with Hortonworks.

For those not familiar with Spotify, they are one of the most popular music streaming services on the Internet, with more than 24 million active users and six million paying subscribers. Using playlists and preferences from users, the company uses data analytics to help form the experience that their audience has, including such products as their data-driven Spotify Radio.

Spotify, which launched in 2008 with a 30-node cluster, was among the earliest adopters of Hadoop technology – a fact which made them low hanging fruit when Cloudera launched the first commercially available Hadoop distribution (Cloudera’s CDH) in March 2009. While Cloudera does a lot of development with and for the open source Apache Hadoop community, their distribution of Hadoop contains a fair amount of proprietary code – a fact which raises some hackles among some Hadoop users concerned with vendor lock-in.

This issue is the stone in which Hortonworks grinds. Rather than taking the open source Apache Hadoop software and building proprietary code into it, the Hortonworks Data Platform (HDP) contains the open source, vanilla Apache release of Hadoop, with no proprietary add-ins. According to Spotify’s Wouter de Bie, team lead for data infrastructure, Hortonworks approach to Hadoop was a big factor in their switch.

“The cultural fit was an important factor in our selection and we have appreciated Hortonworks’ relaxed, helpful and open approach,” said de Bie in a statement. “Their true open source approach and the work they have done to improve the Apache Hive data warehouse system also aligns well with our needs, as we use Hive extensively for ad-hoc queries and for the analysis of large data sets.”

Spotify uses Hadoop as part of its mission critical infrastructure, using it for personalization for the users of the services, as well as running data analytics for bands and record labels providing information such as geo-location data to determine where an artist has a strong fan base, thus optimizing marketing and concert location decisions.

Spotify will be running the Hortonworks Hadoop distro using the Debian operating system, which Hortonworks says will eventually enable them to offer Hadoop to customers running either Debian or Ubuntu operating systems in the future.

While the Hadoop wars have been hot for some time, the competition between Cloudera and Hortonworks has been heating up in recent months. This summer, Hortonworks turned heads when it announced a $50 million dollar funding haul, giving the company new life and credibility in a market that Cloudera looked to be running away with. While some figures still have Hortonworks significantly behind Cloudera in installations, buzz has started to pick up around Hortonworks as they get their 100% open source story out.

Related items:

The Five Types of Hadoop Data 

Hortonworks Previews Future After Massive Funding Haul 

Cloudera Search 1.0: Like Googling Hadoop 

Datanami