June 28, 2022

John Snow Labs Releases Spark NLP 4.0

Today, John Snow Labs announced the release of Spark NLP 4.0, the latest version of its NLP library built on Apache Spark ML. Spark NLP 4.0 features new question answering annotators, major performance improvements, optimizations on new hardware platforms, and over 1,000 pre-trained transformer models available in multiple languages.

The new question answering annotators allow Spark NLP to answer arbitrary natural language questions based on a given document, and they not only provide answers but can also explain where they came from in a document. Fine-tuned and ready for immediate use are pre-trained QA models based on BERT, ALBERT, DeBERTa, RoBERTa, DistilBERT, Longformer, and XLM-RoBERTa, which the company says enables support for multiple languages, document types, and performance goals.

John Snow Labs also announced that Spark NLP 4.0 is optimized for the latest hardware tech, including support for Apple’s M1 chip and Intel’s oneAPI Deep Neural Network Library (oneDNN). The company says transformer-based models running on CPU chips can be improved by 97% when oneDNN is enabled. Also announced is support for the latest runtimes of Databricks, AWS EMR, and Kubernetes.

The Spark NLP 4.0 release also features accuracy improvements to named entity recognition (NER) and coreference resolution. For NER tasks, the company says Spark NLP provides the most accurate model on the CoNLL-2003 benchmark for open source NLP libraries. For coreference resolution, the platform uses BERT-based span classification which the company asserts is more effective than traditional approaches and libraries.

John Snow Labs says 33% of the world’s enterprises are using Spark NLP, and according to Gradient Flow, 59% of AI practitioners in the healthcare and life sciences use the tool.

Benchmark results showing Spark NLP’s accuracy. Source: John Snow Labs

“As the most widely used NLP library in the enterprise, we have a responsibility to deliver accurate, production-grade, state-of-the-art NLP software,” said David Talby, CTO, John Snow Labs. “With the pace of technology and business evolution, last year’s best-of-breed AI tools are already falling behind. Our promise to our customers and the open source community is that we will always keep them state-of-the-art—and this new release delivers on that promise.”

For those wishing to learn more about the platform’s NLP capabilities, John Snow Labs is providing a few upcoming opportunities. Tomorrow at the Databricks Data + AI Summit, Talby is giving a talk entitled “State-of-the-Art Natural Language Processing with Apache Spark NLP,” at 11:30 a.m. PST. The company is also presenting a demo for Patient Cohort Building with NLP and Knowledge Graphs from 12:20 to 12:40 p.m. PST on the Data + AI Summit’s virtual platform.

Additionally, the company will be holding its third annual NLP Summit from October 4-6 online. The free event will include over 50 technical sessions with topics relevant for open source, healthcare, and finance.

Related Items:

Databricks Launches Lakehouse for Healthcare and Life Sciences

Spark NLP Crosses Five Million Downloads, John Snow Labs Announces

NLP in the Cloud Is Growing, But Obstacles Remain