Follow Datanami:
November 8, 2021

Big Data Investment Pays Off for PerkinElmer


Nearly a decade ago, the folks at PerkinElmer made the decision to rebuild their flagship software offering for managing scientific research. As data was growing and becoming more diverse, the Oracle-based relational technology that underpinned the platform struggled to keep up. The company sought a new technological direction with its cloud-based Signals Research Suite, and found it with emerging open source frameworks, including Spark, Kubernetes, and Elasticsearch.

Based near Boston, Massachusetts, PerkinElmer has built a solid reputation for providing a range of products and services for some of the largest pharmaceutical companies in the world, including names like Bristol-Meyers Squibb, Merck, Pfizer, Johnson & Johnson, and Glaxo SmithKline, among others. The 84-year-old publicly traded company operates in 150 countries, and brings in around $4 billion a year in revenue.

In their quests for creating the next blockbuster drug or treatment, the research and development arms of these pharmaceutical firms are constantly formulating and testing new compounds to determine which candidates to take to the next stage. (The company also serves other manufacturers, such as those that develop paint and other coatings.)

For years, PerkinElmer has developed a software product called an electronic notebook that automates many steps in this process, from gathering raw data from scientific instruments and equipment and crunching the results, to managing the workflow and presenting the results.

PerkinElmer previously developed version of this electronic notebook was an on-prem system based on Oracle’s database. The software was capable of doing sophisticated analysis, such as enabling scientists to perform searches using chemical equations and fuzzy logic. This capability was important as it allowed scientists to search not only for exact matches among chemical compounds, but compounds that are related in some way.

However, as the pharma and manufacturing companies scaled their operations, the Oracle-based offering was starting to hit its architectural limit. Big pharma firms needed to store data about tens of millions of molecules, including tens of billions of test results. Scaling the infrastructure was becoming costly.

According to David Gonsalvez, the director of product portfolio and informatics R&D for PerkinElmer, one of PerkinElmer’s early customers were spending millions of dollars to scale its software on Oracle Exadata machines.

“They just threw money at it,” Gonsalvez said. “The only way they could get the system to scale is by being the forefront. They were the first ones to use the Exadata technology…They had these massive machines to be able to cope with the demands of this large electronic notebook datasets.”

About eight years ago, PerkinElmer made a strategic decision to shift directions. Instead of trying to scale the product vertically atop relational technology like Oracle’s database, PerkinElmer decided to embrace some of the new open-source databases and technologies that scaled horizontally. At the same time, it also decided to move to the cloud and the software-as-a-service (SaaS) delivery model that it enabled.

The company adopted MongoDB to store research documents, Elasticsearch to search across the documents, and Apache Spark to power parallel data processing. It also uses Kubernetes to orchestrate the containerized software on the AWS cloud, and it uses S3 to store all the data. Finally, it has an OEM agreement with TIBCO to embed Spotfire as an analytics tool across the Research Signals Suite.

The search function in Elasticsearch is particularly useful for helping scientists find related molecules, Gonsalvez said.

“That’s a very advanced search technology that depends on understanding the chemical drawing and being able to match one chemical variant against millions of chemicals from the database,” he told Datanami. “That is a technology that has existed in the industry for about 30 years. It was based and embedded inside of Oracle. So Oracle was the search engine, if you will, the database. And we extended Oracle to be able to do those special chemical searches.”

That capability, which Gonsalvez likens to an FBI fingerprint database, now exists in Elasticsearch as a graph-based entity that connects all the atoms and their bonds. “We have very fast parallel processes that compare the query molecule against every single target molecule,” he said. “We taught Elasticsearch how to search for molecules.”

Apache Spark brings the heft required to crunch massive amounts of data. Some of the company’s customers need to be able to compare tests results for 100,000 different compounds, and the capability to create large Spark clusters on-demand gives ElmerPerkin customers the headroom they need.

“So the combination of infinite storage on the cloud, parallel data processing in Spark, and very advanced scientific indexing with Elasticsearch allows us to build an end to end system that captures the content, that processes the data at scale and …. finding and then analyzing the results,” Gonsalves said.

The first iteration of this re-architected Signals Research Suite came to market about three years ago, and it has been continually enhanced. In late October, the company announced that it has completed the integration of the suite’s various components, including Signals Notebook, Signals VitroVivo, and Signals Inventa 3.0. The Signals Research Suite now feels like a unified product as opposed to separate products that have been fused together, Gonsalves said.

While PerkinElmer still has many customers running the on-prem version of software, the future is definitely the new suite running in the cloud. Customer demand has been strong, according to Gonsalves, who says he no longer has to pitch customers on the benefits of cloud computing and SaaS. They already know.

In the end, what separates PerkinElmer from its competitors, Gonsalves said, is the delivery of the core scientific capabilities that customers have expected, but at massive scale in the cloud, with no infrastructure worries, to boot.

“PerkinElmer really had to foresight seven or eight years ago to really retool,” Gonsalves said. “We didn’t continue using the relational technologies of the traditional software approaches. We really retooled. It was a huge investment but it has paid off in terms of our success now.”

The transition took seven or eight years, and cost tens of millions of dollars. But the ability to bring the decades of institutional knowledge that PerkinElmer possesses for managing high volumes of scientific data into a new cloud-based platform was well worth the investment, Gonsalves said.

“We’ve kind of leapfrogged the industry in terms of the core architectural capabilities,” he said. “Our competitors are still using Oracle and relational databases. We’ve modernized into the cloud-based systems.”

Related Items:

Key Questions to Ask of Your Scientific Data Platform for Single Cell Analysis

AI-Powered Drug Development in a Post-COVID World

Can AI Find a Cure for COVID-19?