Follow Datanami:
March 10, 2022

Databricks Launches Lakehouse for Healthcare and Life Sciences

Databricks has announced the launch of its new lakehouse platform, the Databricks Lakehouse for Healthcare and Life Sciences.

According to the company’s press release, Databricks Lakehouse for Healthcare and Life Sciences is a “single platform for data management, analytics and advanced AI use cases like disease prediction, medical image classification, and biomarker discovery.” GE Healthcare, Regeneron, ThermoFisher and Walgreens are among the platform’s early adopters.

On its product page, Databricks notes four main problems within healthcare data, including incomplete or fragmented patient care data, high cost and complexity of managing rapidly growing volumes of healthcare data, slowed delivery of real-time insights for critical care decisions, and a lack of strong machine learning capabilities for predictive analytics and data modeling.

The new platform promises to solve these issues by unifying structured and unstructured patient data, scaling data in the cloud for population-scale health insights, enabling real-time analytics with rapid ingestion and processing of streaming data, and advancing machine learning for predictive and research analytics.

Source: Databricks

More specifically, the company says the platform “offers customers tailored data and AI solutions” through analytics accelerators, open source libraries, and a community of partners and organizations that includes Lovelytics for automated streaming data ingestion, John Snow Labs for analysis of unstructured text data with natural language processing, and ZS Associates for whole genome processing in biomedical research. Other features of the platform include ML-based disease risk prediction, digital pathology classification automated with deep learning, and tools for data modeling and cohort building.

“The opportunity for healthcare to be transformed with data and AI cannot be overstated. As organizations fully transition to electronic medical records, new data types like genomics evolve, and IoT and wearables take off, the industry is awash in massive amounts of data. But this data is siloed, and teams don’t have the tools to properly use it,” said Michael Hartman, SVP of Regulated Industries at Databricks. “With Lakehouse for Healthcare and Life Sciences, we can drive transformation across the entire healthcare ecosystem and help empower our customers to solve specific industry challenges and, ultimately, drive better outcomes for the future of healthcare.”

This is the third data lakehouse platform the company has released so far this year, as it follows the Databricks Lakehouse for Retail and Databricks Lakehouse for Financial Services. To the uninitiated, the word “lakehouse” might sound like an empty buzzword, but the technology is gaining popularity for its effectiveness. Organizations within the healthcare and life sciences industries have typically used more traditional data architectures like data warehouses and silos, which are initially easy to use but are costly to scale and maintain as a company’s data and AI/ML workloads increase. Data lakes were born from the need for high performing and large scale platforms capable of supporting substantial workloads with real-time data ingestion, but they can be challenging to build and maintain due to the time, resources, and skilled data engineers required to do so.

Source: Databricks

When you combine the ease and functionality of a traditional warehouse with the speed and scalability of a data lake, you have a lakehouse. As Datanami’s Alex Woodie has noted, a lakehouse “provides the flexibility to handle less structured data types, such as text and image files, that are commonly used in data science and machine learning projects, but it also borrows from the data warehouse discipline, particularly in terms of ensuring the quality of the data and making sure that its lineage is tracked and governed.” Lakehouse platforms can automate the ingesting, processing, and optimizing of data within an infrastructure, which can enable companies to achieve more with their data—in this case, promoting better patient outcomes and facilitating innovation in healthcare research and pharmaceutical manufacturing.

“We recognize the important role that data plays in getting our products into the hands of those that need them the most, and the Databricks Lakehouse for Healthcare and Life Sciences solution helps us achieve that goal,” said Feng Liang, Sr. IT Director, Thermo Fisher Scientific. “This modern platform for data and AI has enabled us to eliminate costly data silos, unlock new opportunities to innovate, and become a more data-driven organization.”

Related Items:

Databricks Sees Lakehouse Validation in $1.6 Billion Round

Lakehouses Prevent Data Swamps, Bill Inmon Says

Databricks SQL Now GA, Bringing Traditional BI to the Lakehouse