How Providence Overcame Security Obstacles to Unlock Medical Data in the Cloud
Healthcare is one of the most data-rich industries, but thanks to strict privacy and security laws, data scientists haven’t been able to do much with it. But now, thanks to the confluence of a strong security setup in the cloud and the use of privacy-preserving techniques for analytics, Providence Health is starting to take the handcuffs off its data scientists and unleash innovation on big medical data.
With 52 hospitals, more than 1,000 clinics, and about 120,000 employees, Providence Health and Services is one of the largest healthcare groups in the country. Like most healthcare companies, Providence Health takes steps to maintain the integrity of its patients’ data. After all, nobody wants to be fall afoul of HIPAA, which carries fines up to $50,000 per violation.
That security focus was top of mind as the Renton, Washington company set out to renovate its data analytics architecture, starting with a migration of an aging SQL Server data warehouse into its Microsoft Azure cloud. As the company worked with Databricks and others to set up the new data environment in late 2019, it took extra precautions to ensure that tight control was maintained over the data, said Lindsay Mico, Providence’s director of data science.
“Providence sets a uniquely high bar for what a secure cloud looks like,” Mico said. “Every tech company that I work with….has a mindset that this is what a secure cloud looks like. And then they start working with us, and they’re quickly disabused of where that bar is set.”
For starters, plain vanilla network configurations in Azure would not suffice. Public IP addresses may be standard fare for out-of-the-box Azure deployments, but that would not be accepted in Providence’s Azure deployment, Mico says. Ensuring only private IP addresses were used required controlling the Azure Virtual Network (VNet) at a low level, he said.
“We worked hand in hand with Microsoft and with Databricks to craft new deployment architectures that allowed us to properly safeguard patient data,” Mico told Datanami in an interview at Databricks’ Data + AI Summit in San Francisco earlier this summer. “We needed something that was closed off from the public Internet.”
Snowflake was also involved in Providence’s new cloud-based data architecture. The initial project was migrating the aging SQL Server warehouse into the Snowflake data warehouse, which would provide a more scalable system for analyzing business and clinical data. (Providence is big enough that it can accommodate users on multiple data warehouses, so Databricks and Snowflake environments exist independently.)
Cost control was a big initial use case for Providence’s new cloud warehouse, which are used for traditional analytics as well as building and running machine learning models. As a not-for-profit Catholic hospital with a stated mission to help the poor and the needy, Providence has a duty to deliver affordable care, and analytics can help in that regard, Mico said.
“We also know healthcare has a cost bubble. Costs keep going up for patients. Healthcare systems are on extremely thin margins. It’s a lose-lose,” he said. “So finding ways to use data and analytics to control cost–it’s existential for healthcare in general.”’
Some of the initial use cases involved using machine learning models to forecast patient demand, including things like acuity and length of stay. Those predictions are fed into a staffing model that tells Providence what it’s staffing demands will likely look like over the ensuing two months.
The unlimited elasticity in the cloud is a big upgrade over what the company used before, Mico said. “When I joined Providence, we ran models on computers under desks,” he says. “I couldn’t fit a mid-size boosting model. That would crash our Cloudera cluster regularly. Those problem are solved.”
While the data warehouse migration started before the COVID-19 pandemic, Mico and his team did much of the work in the middle of the pandemic. The healthcare company leveraged several out-of-the-box machine learning models that came with its electronic medical records (EMR) software from Epic Systems, which worked well, Mico said. As the largest Epic user, the success of the machine learning models bodes well for Epic’s R&D team.
If there was a silver lining to COVID-19, it was that it accelerated the deployment of next-gen systems, including telehealth, which Providence was already investing in, Mico said.
“I think it was serendipity, in a way,” he said. “We had prepared. We had a really strong infrastructure in place to handle telehealth. We were able to move huge volumes to telehealth. It also gave a boost to predictive analytics. We deployed a number of models around mortality risk, ICU length of stay, and a few others in the early days of the pandemic. Those were applications that were built into Epic. It’s just the first few steps of a really long journey. Ther’s so much more opportunity for AI to improve clinical care.”
For instance, the company is running some real-time analytics on Health Level Seven (HL7) medical documents that originating in Epic. Providence is using Spark Streaming to process that data before it’s loaded in real time into Delta Tables on the Databricks warehouse, Mico said.
“That work started as an effort to build what we call a mission control center, a real time view of what’s happening in the hospitals,” he said.
One of the early use cases for the mission control center is to gain visibility into an individual hospitals resource to determine whether it can handle an incoming patient. This is a useful tool to help prevent hospital overcrowding, which was a very real threat during the peak of COVID-19.
“[It’s] a good starting point, but you can see that once you have a real time view into what’s happening with a healthcare system – who’s there, what do you need, and relating back to their chart — you can then start predicting what’s coming next,” he said. “You can start optimizing decisions about clinical care or about operations. So I’m pretty jazzed about mission control.”
Such systems are not put into place without a lot of forethought. Mico is one of a handful of people who sit on Providence Health’s predictive analytics steering committee, which makes decisions about what types of predictive systems should be deployed.
The company is looking at leveraging some more powerful AI technologies, including deep learning, to further optimize its operations and improve medical care. Specifically, it’s working with John Snow Labs and its Spark NLP models to be able to extract meaningful data out of doctor’s notes.
Security and privacy are paramount when dealing with this level of sensitive data, so the first order of business with Spark NLP is the de-identification of doctor’s notes about patients. The healthcare company is using pretrained models from John Snow Labs that can spot identifiers like dates, names, addresses, and ZIP Codes.
“It works surprisingly well, even just out of the box,” said Nadaa Taiyab, a senior data scientist with Tegria, a technology and services company owned by Providence.
After tagging the identifiers, Providence then obfuscated the data by replacing it with dummy data, thereby de-risking the private health information (PHI). This process enables Providence to use the aggregated medical data for advanced analytics and training machine learning models.
While the obfuscation step reduces the risk of PHI falling into the wrong hands, there are times when real patient data is necessary, especially when using machine learning models, Taiyab said.
“If you aggregate it, you can’t use it for machine learning, if you’re trying to predict something at a patient level,” Taiyab said. “If you want to predict it at a group level, that’s one thing. But you need to have patient-level data” for patient-level predictions.
Providence is also able to use its patient data to further medical research through the Institute for Systems Biology (ISB), a Seattle, Washington medical analytics firm founded by Dr. Leroy Hood, one of the researchers in the Human Genome Project. The data security work done by Providence has enable it to share data with ISB, which Providence acquired in 2016.
According to Mico, ISB’s ability to mine Providence’s large repository of medical data has been important for ISB’s research into health conditions, such as long COVID-19. “That’s just an example of what it means when you’re able to stage and integrate data in a secure cloud environment,” he said.
The investment in building a secure cloud data architecture hopefully will pay dividends as Providence explores additional ways of using advanced analytics and AI to improve its healthcare mission.
“Providence has come up with a blueprint for what a secure cloud for healthcare looks like,” Mico said. “It’s taken a lot of reperch, a lot of learning, a lot of collaboration with our partners. And we improve it at every step of the way. Our deployment models change as we learn new things. But we think we have a very strong secure blueprint for it.”
At the end of the day, we’re barely scratching the surface of what advanced analytics and AI can do for healthcare. Access to data remains one of the primary roadblocks to progress in this regard. The work that Providence has done to mitigate the security and privacy risks is a good first step, but there much more work to do.
Mico recounted how he attended an AI medicine conference in San Francisco in 2021. “Every startup said the same thing,” he said. “If we just have the data, we could build the most amazing stuff.”