Follow Datanami:
May 30, 2016

Big Data and the Cloud: Uncover New Insights Hiding in Your Data

Adam Lorant


Modern data science can unlock new innovation in healthcare, bioinformatics, genetic research, and other related fields. New personalized medicine programs, for instance, can identify previously unrecognized disease risk factors by applying analytics to vast amounts of genomic and clinical data. Hospitals can pore through EMR and operational data to pinpoint sources of infection. Public health agencies can use longitudinal population data to more accurately inform policy.

These are just a few examples. But all depend on one basic premise: many more researchers and analysts having access to much more data. And many healthcare organizations are still a long way from reaching that goal.

In most organizations, there is no single place where data resides. Rather, data is diffused across file servers and databases in various locations and multiple formats (Word documents, PDFs, spreadsheets, relational databases, EMRs) that can’t be easily consolidated.

Investigators are intimately familiar with the barriers this presents to effective research, and they’re trying to deal with it in various ways: Open-source, web-based services that allow researchers to store some data centrally, but limit the formats and volume of data that can be collected. Enterprise data warehouse suites that can centralize information, but take months of planning and massive capital investment to get up and running. “Do-it-yourself” Hadoop database software projects that leave administrators with the equivalent of a 1,000-piece LEGO kit with no instructions.

Even if researchers can overcome the technical barriers to creating a big data warehouse, they are still held back by data privacy and compliance concerns. No one wants to be responsible for protected health information (PHI) somehow finding its way to a publicly accessible server and earning a HIPAA violation against the institution.

If healthcare researchers are going to uncover the wealth of new insights hiding in their data, they need to find better ways to consolidate silos of information, while assuring that data privacy, security, and governance remain intact. Fortunately, modern cloud-based data warehouses can accomplish both. By harnessing the combined power of big data and the cloud, researchers and analysts can gain insights faster and increase the value of their data, without compromising their institutions or patients.

Growing Appeal of the Cloud

Not long ago, storing sensitive healthcare information in the cloud was a nonstarter. Compliance officers were simply not comfortable moving data beyond their control. By 2014, 83 percent of healthcare IT organizations were using cloud services, according to HIMSS Analytics. Why the shift? Two reasons:

First, the advantages of the cloud have become too compelling to ignore. Forward-looking researchers and academic centers see how much faster their research could be moving if data could be shared more easily. Cloud-based data warehouses that can be accessed by approved investigators anywhere, anytime, offer the easiest way to do it.

Genomics data represents some of the most compelling data today for researchers (Gio.tto/Shutterstock)

Genomics data represents some of the most compelling data today for researchers (Gio.tto/Shutterstock)

From a business perspective, healthcare organizations see the same cloud benefits as every other industry: faster deployments, consumption-based pricing and pay-as-you-grow scalability that makes better economic sense than building out internal capacity themselves. As Forrester notes, “On-premises solutions require investments akin to home ownership; when something breaks, it’s up to you to fix it. Cloud and SaaS are more akin to renting, and you’re only paying for the space you use; repairs, ongoing maintenance, unexpected expenses are the responsibility of the landlord.”

The other big change has been in liability issues surrounding cloud data hosting. Previously, healthcare organizations bore full responsibility for anything that happened to their data in the cloud. Today, organizations can enter into business associate agreements (BAAs) with healthcare-focused cloud service providers that share liability. These service providers are certified by organizations like HITRUST, and share responsibility for data protection and compliance in their clouds.

Capitalizing on Big Data in the Cloud

So what do organizations gain when they use cloud-based big data warehouses? First, they can consolidate all their data more easily and automate data collection.

Modern big data solutions can ingest data from many different sources, in many different formats, quickly and easily. That includes complex data types—unstructured and semi-structured, huge genomics files, imaging studies, EMR data. Healthcare-focused data warehouse are designed with pre-built libraries to accommodate all of these, extract the information, index it and transform it so it’s readily usable.

Modern cloud-based data warehouses also accelerate searches. They automatically catalog and apply metadata to information as it’s collected to describe exactly what it contains, at a granular level. This means that researchers can search catalogs of metadata, rather than raw data files, and find what they’re looking for much faster.

They can see exactly what data is there, and short-circuit the all-too-common process of waiting weeks or months for approval to access a data store, only to find that the information it contains isn’t what they need. Instead, they query the catalog—show me women with melanoma age 19-45—and see how many records exist. They can determine immediately if it’s worth the time to request formal approval to access the data, if there is enough for the study, or if they need to change their criteria.

Additionally, all PHI within that data is still protected. They’re not seeing the actual data—just a catalog generated from metadata. They can identify the data set they need in minutes, without compromising data security or compliance, in a self-service manner.

Modern cloud-based big data warehouses are also designed for privacy and governance. This is crucial, as much of healthcare research today is translational—bridging the traditional separation between researchers and clinicians—and making data privacy controls essential.

Cloud-based big data solutions can employ several mechanisms to make privacy and governance simpler. First, they can automate de-identification of PHI in line with Safe Harbor guidelines and the HITRUST framework. This is a huge difference from how de-identification is typically done today—via a person at a machine manually processing records.

Second, modern solutions employ sophisticated policy frameworks that let organizations tightly control who can see what, in which context, even within a single data asset. For example, a clinician at an academic center may be able to see a patient’s full record. A researcher or analyst with the center may be able to access the same record, but will see only de-identified information with no PHI. Modern systems can do this automatically, generating specific data sets appropriate for each requestor in accordance with organizational policy. And these capabilities can be fully audited for compliance.


Healthcare organizations may have viewed cloud and big data as too complex and risky a few years ago. Today, they represent an enormous opportunity.

With liability concerns addressed, and data consolidation, cataloging and policy largely automated, cloud-based big data warehouses are much less complex than the disparate data silos healthcare organizations are dealing with now. It’s time to bring the speed, scale and economics of the cloud to healthcare data, and give researchers the tools to uncover new insights faster.

PHEMI Adam Lorant


About the author: Adam Lorant is VP of product and solutions at PHEMI. Adam co-founded and served as the VP Marketing/Product Management for several successful startups including PolarBlue Systems, OctigaBay Systems (acquired by Cray Inc. in 2004) and Abatis Systems, which was sold to Redback Networks in 2000 as one of the largest private acquisitions in Canadian history.  Throughout his entrepreneurial career, Adam has helped generate over one billion dollars for investors.  He has over 25 years of experience in strategic marketing, product management, and business innovation.  Adam holds an MBA from the Ivey School of Business at the University of Western Ontario and a B.Sc. in Electrical Engineering from the University of Toronto.

Related Items:

How Spark and Hadoop Are Advancing Cancer Research

Data Gravity Pulls to the Cloud

Phemi’s Big Data Approach: Never Trust, Always Verify