Follow Datanami:
March 23, 2016

Big Data and the White House’s Cancer Moonshot


The White House wants to invest $1 billion in a new Cancer Moonshot aimed at accelerating research into ways to detect, treat, and prevent the debilitating series of diseases. It’s not the first time the government has announced a major effort to eradicate cancer by spending big on medical science. But one key difference this time may be the recent advances the industry has made in data science.

President Obama picked his VP, Joe Biden, to lead the Cancer Moonshot effort. Biden, whose son Beau tragically died of brain cancer last year at the age of 46, has been outspoken in the need for big changes in the way doctors and the healthcare establishment approach cancer, which will kill an estimated 600,000 Americans this year, and afflict for the first time 1.6 million more.

The goal of the Cancer Moonshot is to double the current pace of innovation in finding breakthroughs to cancer. That’s a tall order, considering the $5.2 billion that’s currently allocated to the NIH’s Cancer Center Institute, and the more than $100 billion spent annually around the world on cancer treatment.

But it’s telling that VP’s plan doesn’t involve big spending on new cancer treatment centers or new programs dedicated to the fight. Instead, Biden wants to shortcut the bureaucracy and find a way to bring government, industry, researchers, patient groups, and philanthropies together in a way that creates a result that’s more than the sum of their parts. And a big part of that involves more and better sharing of data.

“From my own personal experience, I’ve learned that research and therapies are on the cusp of incredible breakthroughs,” Biden wrote in a post to Medium in January. “Just in the past four years, we’ve seen amazing advancements. And this is an inflection point….But the science, data, and research results are trapped in silos, preventing faster progress and greater reach to patients.”

(Drop of Light/

Vice President Joe Biden is leading the White House’s Cancer Moonshot initiative (Drop of Light/

Data and technological innovation can play major roles in revolutionizing how medical and research data is shared and used to reach new breakthroughs, Biden says. If the initiative succeeds in freeing up a fraction of the vast collections of cancer-related data that various parties have collected and stored over the years, then the Moonshot could be a success.

“Almost every cancer center keeps a database of information–genetic history, medical records, and tissue banks–that might hold the key to improving certain cancer therapies,” Biden wrote. “Allowing researchers and oncologists to tap into this treasure trove of information is absolutely vital to speeding up the pace of progress toward a cure. If we ensure this data is interoperable and accessible for scientists, researchers and physicians, the consensus is that we can absolutely speed up research advances, improve patient care and get ourselves closer to a cure.”

One person who’s optimistic about data’s contribution to the Cancer Moonshot is Nidhi Aggarwal. Aggarwal, who is the product and strategy lead for the big data preparation software provider Tamr, says more open and efficient sharing of data—such as genetic histories, medical records and tissue bank data—could have a big impact on cancer research.

“There are thousands of potential sources, public and private, that could potentially be leveraged to assist research–everything ranging from a clinical trial data set that has a few thousand attributes, to a genomic data set that could have millions of attributes,” Aggarwal tells Datanami. “It’s both a volume and variety problem.”

Tamr this week announced that it will give away copies of its big data prep software to all researchers affiliated with the Cancer Moonshot. Tamr’s software uses machine learning algorithms to help accelerate the organization, preparation, and integration of semi-structured and unstructured data for analysis. The company also relies on a pool of crowdsourced human experts to guide and train the algorithms.

Pharmaceutical firms like Novartis, GlaxoSmithKline, and Merck are already using Tamr’s software to help get a handle on unwieldy clinical research data. In the Cancer Moonshot, “big cancer data” is both a challenge to be overcome and a path to success, and Tamr is fully on board to helping researchers make the most of it.


Tamr product and strategy lead Nidhi Aggarwal

Major challenges remain for the Cancer Moonshot besides making sense of unwieldy data, including the lack of unified data standards. Aggarwal notes that some attempts have been made to enforce data standardization in a “top down” manner, such as the FDA’s requirement that, by the end of the year, all electronic data for clinical trial submissions conform to Clinical Data Interchange Standards Consortium (CDISC) standards, and the Office of National Coordinator (ONC)’s plan to build a “an interoperable, private and secure nationwide health information system.” Neither the FDA nor the ONC have had much luck completing their data standardization goals.

Instead of relying on a federal agency to oversee standardization, Aggarwal recommends that Biden take the SpaceX approach, and look for public/private partnerships to move the ball forward.  She also says that standardization and unification have to be design principles used from the beginning; you can’t bolt it on after the fact.

Lastly, she notes that the scale of big cancer data and the challenges of interoperability are so daunting that humans shouldn’t try to solve it alone. Luckily, we’re seeing huge advances made in the areas of machine learning and artificial intelligence, she notes—just the sort of stuff that Tamr is doing in the private sector.

“If we want to bring all these data together, in order to build the kinds of statistical models that separate signal from noise in human biology and diseases like cancer, we must transform the data into a common data model,” Aggarwal says. “Data transformation is this critical part of integrating any data for analysis, but especially when integrating data from thousands of potential sources where there may be conflicting data models or no model at all.”

You can read Aggarwal’s open letter to Vice President Biden at the Tamr website.

Related Items:

Europe Eyes Big Data for Sustainable Healthcare

The NIH Pushes the Boundaries of Health Research with Data Analytics (sponsored post)

Five Ways Big Genomic Data Is Making Us Healthier