Follow Datanami:
October 29, 2019

Scala Gets Its Notebook with Netflix’s Polynote

Data scientists who work in Scala will be interested to know that Netflix has created its own data science notebook that features first-class support for the Java-based language. Called Polynote, it was released as open source last week.

Polynote, as the name suggests, is a polyglot product, and as such supports multiple languages, including Scala, Python, and SQL. Developers can write code in any of these languages and Polynote will execute it faithfully across its cells.

Netflix engineers created Polynote after experiencing shortcomings with existing notebook tools, especially with respect to their support of Scala, Netflix engineers Jeremy Smith and Jonathan Indig, and Faisal Siddiqi wrote in a blog post last week.

“While Python developers are used to working inside an environment constructed using a package manager with a relatively small number of dependencies, Scala developers typically work in a project-based environment with a build tool managing hundreds of (often) conflicting dependencies,” the wrote.

Netflix built Polynote to work more like a full-fledged integrated development environment (IDE) than typical data science notebooks. To that end, the company built Polynote’s code interpretation from scratch instead of than relying on a read-evaluate-print-loop (REPL) like a traditional notebook.

Netflix provides first-class Scala support in Polynote

By avoiding REPL to evaluate cell code, Polynote avoids the hidden build-up of state that can impact the results in other notebook environments, Netflix says.

“By keeping track of the variables defined in each cell, Polynote constructs the input state for a given cell based on the cells that have run above it,” the company says. “It ensures reproducibility by making it far more likely that running the notebook sequentially will work.”

The notebook’s rich text editor offers other IDE-like features, including syntax error highlight, auto-complete, and parameter hints. It also offers integration with Apache Spark, which should help mitigate some of the dependency issues that are often experienced by Spark developers, the company says.

“With Spark, developers are working in a cluster computing environment where it is imperative that their distributed code runs in a consistent environment no matter which node is being used,” Netflix writes. “Finally, we found that our users were also frustrated with the code editing experience within notebooks, especially those accustomed to using IntelliJ IDEA or Eclipse.”

Polynote hooks into visualization libraries from matplotlib, which is fairly standard among notebooks. It also supports data exploration capabilities thanks to a data schema view, table inspector, plot constructor, and support for Vega.

Netflix is open sourcing Polynote under and Apache 2 license. The project is being maintained at GitHub. More information can be found at the project’s website at polynote.org.

Related Items:

The Rise of Data Science Notebooks

Apache Zeppelin Launches Latest Data Science Notebook

IBM Seeks Data Science Unity with New Spark-Based ‘Experience’

Datanami