Follow Datanami:
April 22, 2014

Startup Says ‘What If’ to Genetic Data Analytics

Alex Woodie

The surge in popularity of genetic testing is creating a tidal wave of data that people are eager to use to improve their health. But as we saw with 23andMe, there is a concern that raw genetic information can be dangerous in the wrong hands. Today BaseHealth emerged from stealth mode with an innovative approach to this problem that blends data from genetic testing, scientific research, and a patient’s own medical history, all under the supervision of a doctor.

Prior to being asked to stop offering its services by the FDA in December, 23andMe was at the forefront of the explosion of personal consumption of genetic data. For about $1,000, the company would sequence a person’s genetic code and send them a report that showed them how their genetic disposition compared to the average for a number of diseases. Because of the complexity of the interplay between genes and other factors, the FDA decided this information should be consumed under the guidance of a trained physician. 23andMe continues to operate, and markets itself as an “ancestry” service.

BaseHealth has emerged as one of a handful of companies looking to be that “doctor-driven 23andMe,” and therefore capitalize on the huge demand for–and immense promise of–personalized, genetic-directed healthcare. While genetics plays an important role in determine a person’s likelihood of contracting a disease, it’s still just one part in a complex and interconnected puzzle, BaseHealth co-founder and CEO Hossein Fakhrai-Rad tells Datanami.

“For example with type 2 diabetes, if look at just two lifestyle factors, like diet or physical activity, but ignore categories of risk like your genetic risk…then you end up not having a complete 360-degree view that can help you and your physician be on top of it,” he says. “With Genophen, we’ve integrated genomic data with lifestyle data and medical data into a health management platform that allows patients and physicians to engage in a collaborative way.”

For each patient, Genophen starts by obtaining the patients’ entire genetic sequence, as well as his medical history and lifestyle data, such as height, weight, history of smoking, alcohol consumption, body mass index, cholesterol, and even sleep position. That information is then compared against known risk factors for 40 common but complex diseases, such as type 2 diabetes, heart disease, depression, age-related macular degeneration, deep venous thrombosis, anxiety, and various forms of cancer.

Genophen takes all this information and generates a report showing the patient’s predilection for certain diseases. For each of the 40 diseases, the system will show the patient his overall “modifiable” risk, which takes into account potentially bad habits, such as smoking or eating or drinking too much. It will also show the patient his “non-modifiable” risk for each disease, which is the baseline risk that his family medical history (phenotype) and his genetic information (genotype) have in store for him. The software also mashes up all this data into one big report that shows his overall “lifetime” risk for contracting these diseases and an “achievable” risk that is attainable by controlling the modifiable risk factors.

By modifying controllable risk factors in the “what if” analysis portion of Genophen, patients can see how a combination of lifestyle changes may improve their health.

Genophen also lets patients and their doctors tweak the inputs that go into the modifiable risk part of the equation, such as by moving the HDL cholesterol level up or the BMI down, or improving the overall nutrition rating. This “what if” modeling is not unlike the levers you find on one of those retirement calculators on the Web–only in this case, you’re looking for agreeable ways to bank a little extra health to enjoy during your retirement, not wealth.

The slick Web UI will surely gain the attention of patients and doctors, but the most fascinating aspect of Genophen from a big data point of view is the data mining and analytics that goes into assessing risk factors. Genophen employed a team of scientists to pour through about 5,000 peer-reviewed medical studies to identify each of the known risk factors that contribute to those 40 diseases.

“We had to physically go through these papers and identify the risk factors and the related risk ratios,” says BaseHealth CTO Prakash Menon. “There’s a bunch of statistical hurdles that we must jump through before we can include it in the model. Once you jump through those hoops, we actually hand code an R model for that disease, and fit the model to the data we’re seeing in multiple papers.”

Each disease has specific risk factors associated with it, and those risk factors will often be associated with multiple diseases, although in different ratios. For example, smoking is a risk factor in many diseases, but it’s a stronger contributor to lung cancer than it is to type 2 diabetes. It might not be intuitive, but sleep position is an indicator for several diseases tracked by BaseHealth, such as sleep apnea and migraine headaches. The company then matched these risk factors up against a patient’s specific lifestyle data, their medical history, and their genetic sequence to determine likelihood of contracting diseases.

This is obviously not the first time somebody has sought to quantify the causes of disease in such a regimented and mathematical manner. But because human health is at stake (and potentially a request for FDA approval), BaseHealth took a very regimented approach to its data inputting process. While scientific journals are arguably written in English, one cannot use natural language processing (NLP) algorithms to extract the information, so BaseHealth curated all of its scientific knowledge by hand.

Sourcing data was a bit easier on the human genome side of the fence, since practically the entirety of the body of human knowledge on the topic exists already in a largely structured format. BaseHealth was able to use more automated tools on the genetic side to extract the information and weigh the associated risk factors that are expressed in the human alleles.

BaseHealth co-founder and CEO Hossein Fakhrai-Rad

BaseHealth has tested its product with about 50 physicians over the past couple of years, and is confident it’s on its way to becoming a staple in the doctor’s little black bag of tools. “This platform has been designed to be used by physician and patients,” Fakhrai-Rad says. “Educating the physicians was a challenge, but we have been overcoming those challenges. Now we’re enrolling physicians on a rigorous basis into the platform.”

Because it’s a startup, BaseHealth doesn’t suffer from a big data problem at this point. But with a little luck, it will in the future, Menon says.

“At this point, the big data challenges are around processing the data rather than actually analyzing it, because a full genetic sequence is about a quarter of a terabyte,” he says. “What we’re going for is to have a bunch of data about people [that mixes and compares] their clinical and lifestyle and genetic information, because it’s never been done before as a corpus. When we have a couple of million patients, then we could use the corpus to actually do some serious big data analysis.”

If BaseHealth were to identify a previously unknown connection–say between nutrition and lung cancer, or sleep position and depression—it would need to go through a rigorous, peer-reviewed study to be granted entry into the body of accepted scientific knowledge. Even in the big data age, actual scientific knowledge is hard won.

Related Items:

Fighting Sepsis with Real-Time Analytics

Why Medicine Needs Big Data

Can Big Data Tame MRSA Superbugs?