Follow Datanami:
October 8, 2015

Health Care Emerges as Hadoop Use Case

Unstructured data is estimated to account for nearly 80 percent of health care information, a hodge-podge of physicians’ notes, sensor data from medical devices, lab results, X-ray and MRI images along with clinical and financial data. Add to the mix several layers of patient privacy and medical IT regulations and you get a use case ripe for the application of big data analytics.

MapR Technologies is among the Hadoop specialists eying the big data market for health care estimated by market researchers like McKinsey & Co. to be worth as much as $450 million in health care cost savings. Using tools like Hadoop to gain access to unstructured medical data that is growing exponentially is seen as one way to increase efficiency and cut costs while improving patient care.

MapR’s and other distributions of Apache Hadoop are being promoted as among the tools needed to capture patient information as medicine moves to an “evidence-based” approach that collects clinical data and feeds it into clinical and other analytics platforms. Among the advertised outcomes are earlier detection and diagnoses of diseases along with more effective treatment based on information like a patient’s genetic makeup. Clinical analytics also can be used to adjust drug dosages to limit side effects, improve effectiveness and reduce costs for expensive drugs.

For its part, MapR is citing a list of health care and life sciences use cases as ideal for big data technology in general and, specifically, for its distribution of Hadoop. A prime example is genome processing and DNA sequencing in which current big data architectures leverage high-performance computing and either storage area networks or network attached storage.

MapR, San Jose, Calif., argues that these architectures are facing networking bottlenecks, slowing the distributed sorting of medical data. Hence, MapR and others have begun converging storage and computing on a single platform as a way to reduce the cost of storing large volume of sequencing data. Meanwhile, its Hadoop distribution aims to provide real-time access to clinical data stores.

In another use case, MapR said an unidentified health care organization that collects petabytes of treatment and claims data is using its Hadoop distribution to create a new data repository service for it enterprise customers. Customers could use the repository to run an expanded set of analytics on their own data.

MapR promotes its architecture as meeting multi-tenancy requirements through a volume-based isolated environment that also provides secure access for end users.

Combining unstructured data from multiple sources is also being used to improve diagnostic accuracy. So-called “assisted diagnosis” combines medical expert systems with individual data sets and big data algorithms. MapR’s Hadoop distribution allows predictive modeling and machine learning to crunch large sample sizes to help pin down a diagnosis, the company said.

Recent items:

Health Alliance Looks to Leverage, Secure Patient Data

NIH Effort Looks to Compress Big Genomics Data