Follow Datanami:
August 10, 2015

Will Scala Take Over the Big Data World?

When it comes to picking a language for writing big data applications, developers have an embarrassment of riches at their disposal. Python and R have proven popular among data scientists, while Java has been the go-to language for those developing apps on Hadoop. With the rise of Scala-based big data frameworks like Apache Spark and Apache Kafka, we’re seeing Scala becoming more prominent among big data practitioners.

Scala is a JVM-compatible functional development language that was originally developed by Martin Odersky about 15 years ago, long before the current big data craze. Proponents of the language often cite its speed and expressiveness as key advantages over other general-purpose languages. The language has been heavily adopted by Web 2.0 and social media companies, such as Foursquare and Twitter, which in 2009 migrated from Ruby to Scala for most of its back-end systems.

Scala’s big data cred got a shot in the arm when Apache Spark first arrived on the scene in 2013. Whereas most first-gen Hadoop applications (i.e. MapReduce) required Java skills, Spark gave the developer a choice of languages for developing apps. Spark was actually written in Scala, and supports Scala, Java, Python, and R.

Apache Kafka, a next-gen messaging bus for big data, was also developed in Scala, as were Apache Samza (a stream processing framework developed) and Scalding, a Scala API that sits on top of Cascading to ease development of Hadoop.

‘Modern and Whole’

One big proponent of Scala for big data development is Gemini Solutions CEO Theo Nissim. Gemini is a provider of custom big data engineering services based in the Silicon Valley and with offices in Europe. While the company doesn’t dictate what technologies its customers should use, Gemini’s 130-plus engineers increasingly find themselves pulling Scala out of the toolbox during big data engagements.gemini solutions logo

“We do a lot of Scala these days,” Nissim tells Datanami. “We do it because, historically, some of our architects enjoyed functional programming. Then we discovered that quite a few people are using Scala out there, and are basically using it as a different flavor of Java, not as much for functional programing, but because it’s modern and whole.”

Gemini is currently involved in a project that involves collecting and analyzing a significant amount of data from wearable devices and displaying results in a mobile app. Most of the work is being done in Scala, which Nissim says is getting the right level of traction and has a bright future ahead of it.

“We like to play with technologies that are evolving and new and have potential,” he says. We think the tool is maturing is excellent, and because it couples very well with all kinds of big data infrastructures, the Sparks of the world.  It couples very well with them, and it just lends itself very well to the big data manipulations.”

Rise in the Standings

Gemini isn’t the only firm finding itself using Scala for big data projects. The popularity of Scala spiked this spring, according to the TIOBE Index, which tracks the popularity of programming languages. For years, Scala bounced around between positions 30 and 50 on the list, but suddenly it came in at number 25.

Scala’s sudden rise caught the eye of folks of TIOBE managing director Paul Jansen, who said he expected to see Scala in the top 20 a long time ago.

“There has been a positive vibe on Scala for many years now, but industry was a bit reluctant to adopt Scala because it wasn’t mainstream yet and functional programming languages such as Scala were considered academic toy languages until recently,” Jansen told InfoWorld in an April interview. “Now we see that multinationals are trying out Scala for some of their development.”

Ontypesafe_2e of the most popular uses of Scala is building big data pipelines in Apache Spark. Later this month, Scala enthusiasts will descend upon the San Francisco Bay Area for a pair of conferences, including Scala Days 2015 San Francisco and Scala By The Bay.

While Scala would appear poised to make a big move on the back of big data, it has a ways to go before it knocks off the incumbants. For starters, it doesn’t have a huge company with deep pockets backing it. Typesafe, which Odersky founded several years ago to provide support for Scala, has raised several million dollars, but that’s a far cry from what Oracle or Microsoft can provide with Java and .NET, which are still among the most popular languages.

And Java keeps getting better. In a Datanami article last year, author Joshua Fox argues that the latest version of Java, version 8, narrows the functionality gap between Scala and Java, and makes it the idea language for Spark development.

While Scala has dipped in the standings just a bit in the standings lately (it has since dropped below 30 on the TIOBE Index), it would appear that it has a bright future ahead of it—especially in big data.

Related Items:

Python Versus R in Apache Spark

Apache Spark and Java 8: The Big Data Team for 2015