Follow Datanami:
June 23, 2014

FDA Beefs Up IT Infrastructure for Big Data Push

As the health care market moves quickly – some would say too quickly – to leverage big data technology, a key industry regulator also is moving to beef up its IT infrastructure to handle the huge datasets generated daily by manufacturers, service providers and scientists.

The U.S. Food and Drug Administration, which regulates everything from pharmaceuticals and medical devices to product recalls and pet food ingredients, expects drug and device trials, for example, to generate an avalanche of research data. “These data sets are not only larger than ever before, they are also arriving more frequently than ever and varying enormously in format, and quality,” Taha Kass-Hout, FDA’s chief informatics officer and director of its Office of Informatics and Technology Innovation, noted in a June 19 blog post.

This year alone, for example, FDA expects to receive between 1.5 million and 2 million individual regulatory submissions through its eSubmission Gateway. The system was upgraded on June 1, and Kass-Hout said some submissions are up to terabyte in size, the “very definition of big data.”

Regulatory guidelines for, say, new drug approvals, require that drug companies submit vast amounts of trial data so that regulators can look for unintended side effects. The more data, the better the chances of spotting a troubling trend.

Public data also is flowing out. Earlier this month FDA rolled out an openFDA initiative designed to make it easier for web developers, researchers and the public to access datasets collected and maintained by the agency. For example, a pilot program would provide public access to an estimated 3 million adverse drug reaction reports or medication errors submitted to the FDA Adverse Event Reporting System since 2004.

The initiative will be expanded later to include FDA databases on product recalls and product labeling.

The new database provides access in a structured format and includes a search-based application-programming interface to find both structured and unstructured regulatory data online.

Among other users, the initiative is expected to make FDA data more accessible to data visualization specialists and researchers so they can search, query and pull public information from FDA datasets.

Kass-Hout added that the FDA is shifting its infrastructure to the cloud to “handle vast amounts of data and provide powerful tools to identify and extract the information we need to collect, store and analyze.” FDA’s shift to cloud computing and storage “gives us the ongoing, simultaneous capacity to collect, control and analyze enormous amounts of data,” the FDA’s IT chief stressed.

In another example, he said FDA is partnering with state and local health organizations to identify food-borne pathogen contaminants. These data are sequenced, stored and analyzed to understand, pinpoint and contain future outbreaks, Kass-Hout noted.

Ultimately, the agency hopes the transition to the cloud and greater use of data analytics will open the door to new data mining techniques and other ways of promoting public health.

Related items:

Big-Data Backlash: Medical Database Raises Privacy Concerns

Bioinformatics: A Data Deluge With Hadoop to the Rescue

Datanami