Follow Datanami:
August 6, 2013

Stem Cell Treatments Get a Big Data Boost

Alex Woodie

People who seek stem cell treatments to fight diseases like leukemia are finding matching cord blood donors around the world much more quickly than before thanks to a new search engine from the German company Cytolon that employs big data technology and techniques.

The new search engine, called CordMatch, allows doctors to use the Internet to find donated umbilical cord blood units that are good matches to the particular genetic makeup of their leukemia patients. Cord blood, which is rich human stem cells, is heavily sought as a treatment for people afflicted with leukemia, and has several advantages over stem cells isolated from bone marrow.

Previously, it could take hours to comb through the various cord blood databases to find a suitable donor for somebody needing stem cell treatment. The work largely involves ensuring good matches between human leukocyte antigen (HLA) codes, which are controlled by a person’s DNA and are essential elements in the human immune system. Stem cells are a precious commodity, so great lengths are taken to ensure a compatible match of the HLA codes before stem cells are dispatched from donor stores around the world.

Instead of painstakingly matching HLA codes using traditional database tools and techniques, Cytolon sought a more modern, big data approach. The company brought in the American company CSC (formerly Computer Science Corp.) to help it develop CordMatch.

What the partners came up with is a combination of frameworks, algorithms, and associative database technologies that bear a closer resemblance to Facebook than traditional relational database technology.

At the heart of CordMatch is the graph database Neo4j, which is used to store and represent data by using graph structures with nodes, edges, and properties. According to CSC, CordMatch’s graph database houses all known HLA codes–more than 2 million of them–which are stored as nodes. The edges between the HLA nodes are used to identify the relationships between the HLA codes–there are more than 60 million of them–as well as the hierarchy between them, CSC says.

CSC developed a unique algorithm to query the database using single and double cord searches. This algorithm leverages existing algorithms to optimize search speed and accuracy, the company says. A successful cord blood unit match–which is defined as occurring when 6 of 10 HLA values match–can be had in a matter of seconds.

“The biggest challenge was solving the complexity of the data and the high-level implementation. So we used the most modern frameworks and the most innovative data graph,” says Simon Hanika, a senior application developer for CSC.

Harald Diehl, CSC’s lead architect on the project, compared CordMatch to Facebook in its ability to rapidly search and make connections in a pool of data. “Like Facebook, it is ‘who knows who,’ so we can match all corresponding codes to one code,” he says in the story on the CSC website.

Related Items:

The Power and Promise of Data Driven Medicine 

Patterns and Predictions Announces “The Durkheim Project” 

Biotech Gets a Boost With New UK Big Data Institute 

Datanami