April 13, 2012

Wrangling Big Data to Fight Pediatric Cancer

Anatol Blass, Ph.D.

High-performance computing and the cloud are enabling vast improvement in scientists’ ability to simulate and analyze data, and genetic sequencing and research are accessible to more scientists, researchers and medical professionals than ever before.

But a new bottleneck has emerged: we are drowning in data. The trick is how to efficiently manage the volume and complexity of that data while making it secure yet accessible to many.

In order to address the big data bottleneck, Dell is building a unique cloud environment for a pediatric cancer trial in conjunction with the Translational Genomics Research Institute (TGen) and the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC).

The collaborative effort is creating a model for how to use HPC and cloud computing to simplify information access and sharing and bridge the information gaps between science and medicine. Through the trial, scientists and oncologists are identifying targeted and personalized treatments for children fighting neuroblastoma.

The cloud will provide the additional computing capacity to support the “real time” processing of patient tumors and prediction of the best drug therapy for a specific patient, based on the genetic makeup of that child’s tumor.

This clinical trial involves dozens of scientific and medical partners across the country. Providing information technology to analyze laboratory results and to support collaboration across a secure network of clinical sites is crucial to creating a knowledge base that supports clinical decision-making.

Because TGen’s research is so cutting edge, scientists and doctors require flexibility to follow their research as it evolves. The effort involves studying tumor samples from patients, getting the genomic sequencing data from lab instruments, analyzing that data, reporting the findings to a tumor board and ultimately using the results to make decisions about the best treatment for the patient.

One of the chief challenges was that the newest lab instruments have the capacity to generate raw data at an increasingly faster rate than ever anticipated by Moore’s Law. The quantity of data being produced from a single instrument is doubling about every 12 months, while at the same time the cost to analyze it is falling by half.

The end result is that the total amount of genomic data being generated is doubling nearly every six months. Moreover, the data objects produced are complex files with important metadata properties about the samples they came from and the instruments that produced them. And the files can be extremely large, up to 3TB depending on the instrument. The data associated with a particular patient currently is about 200TB and growing. Because this is an active area of research, data needs to be kept available to validate and compare analysis algorithms.

Additionally, for this clinical trial there are 11 participating sites both generating and analyzing data. A hybrid approach was required to manage the data coming from the instruments and to be able keep large amounts of data accessible to all of the sites to facilitate collaboration in a secure, cost-effective manner. It was also important to localize data near HPC capacity both in the cloud and on premise to speed analysis and validation.

The cloud became the medium of exchange for data as well as analysis capabilities, allowing researchers to share their raw information as well as algorithms for analyzing that data. As a result, TGen and its collaborators can quickly turn data into knowledge, knowledge into diagnosis, diagnostics into therapies, and therapies into better quality of life for patients.

A colleague of mine coined the phrase “cloud-to-ground” to describe the architecture built to address these issues: an environment that could manage data and not just archive it. We needed to create a virtual library of data that could be accessed by researchers and allow data to be checked out and analyzed using HPC capabilities.

We are using Dell’s innovative technology to enable fluid integration between premise-based capabilities (the ground) and virtual capabilities (the cloud). This provides the framework to move the data fluidly through the research lifecycle, protect it, and make it available for future use. Data can be ingested at various sites, moved to the cloud and then made available for analysis either in premise-based HPC environments or any HPC cloud environment.

The unique challenges of personalized medicine require us to address data volume, complexity and locality, as well as collaboration. By creating integrated hybrid cloud environments, we can harness the power of Big Data and unleash the potential of personalized medicine.

http://www.technologyreview.com/biomedicine/24580/?a=f

http://www.technologyreview.com/biomedicine/24590/

http://genomebiology.com/2010/11/5/207

About the Author

Anatol Blass, Ph.D. is a System Consultant with Dell Healthcare and Life Sciences. He has worked with leading academic, research and biotechnology companies to integrate and analyze laboratory data and create knowledge. As the lead architect for Dell’s collaboration with TGen, he is working to address the technology challenges of the world’s first personalized medicine clinical trial for pediatric cancer.

New Platform Caters to Cancer Research

The Path to Personalized Medicine

Technologies: Cloud, Systems

Sectors: Biosciences, Healthcare

Vendors: Dell

Tags: anatol blass, big data, cancer, cloud, dell

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 18, 2024

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Wrangling Big Data to Fight Pediatric Cancer

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In