Follow Datanami:
April 29, 2014

How Big Data Can Help the Sick and Poor

Alex Woodie

We often hear how companies are using big data to improve customer service, reduce risk, and ultimately make more money. That’s all well and good, but how it’s refreshing to hear how humanitarian groups are using the same big data sources, tools, and professionals to assist sick, poor, at-risk, and war-weary people around the world.

One group that’s doing a lot of social good with big data is Sumall Foundation (, which launched about a year ago with the ambitious goal of using big data to impact social issues. So far, the New York City organization, which was spun out of the social media analytics firm founded by Dane Atkinson, has brought its expertise to bear on problems ranging from homelessness in New York City, the Syrian war, human trafficking, and prescription drug abuse across the United States.

When the group takes on a new challenge, it acts like a project manager to help its client or partner get the most use out of their data. In many cases, they have plenty of data to start analyzing, but they lack the tools and skills to get actionable information out of it, says executive director Stefan Heeke.

“They don’t have the budgets to pay for a data scientist, and with that, there’s a bit of a disadvantage to use data to be impactful,” Heeke tells Datanami. “We help them make the most of the data, and not only show them what they have, but also point them toward how to use it, or where interventions could be planned.”

A typical engagement may start with a proprietary data source, and then move into big public data sources. The group typically uses Python to blend data, R to build algorithms to analyze data, and a collection of JavaScript tools to build visualizations. This week, TIBCO announced that it donated SpotFire licenses to enable the group to perform data discovery on big data sets, which is particularly useful for connecting with stakeholders and generating hypotheses.

For example, through its work with the city of New York, has built an application that enables the city to predict three to four months in advance which households are most likely to lose their homes. This particular application starts with a proprietary data source, which is the list of people who have been admitted to emergency shelters. On top of that, the group blends in demographic data, including eviction notices and crime data.’s map of evictions in NYC

Backed by these data-driven predictions, social workers can take proactive steps to prevent homelessness. “That information is very actionable for social workers,” Heeke says. “There are 900 social workers who do nothing else than trying to save families from becoming homeless. We arm them with information that they didn’t have so they become much more effective and we can save families from becoming homeless.”

As if helping families keep their homes isn’t enough, there’s a business angle to this big data story. When a family becomes homeless, it costs the city an average of $35,000 per year. “It’s quite an effective use of big data,” Heeke adds. is also using its big data and social media skills to help journalists cover the war in Syria. Because western journalists are not welcome in Syria, people in western countries know very little about what’s going on. The way we consume news about the war is “not statistically relevant,” as the data-driven Heeke sees it.

So set out to do something that hadn’t been done: Count the dead. So far, through its work on the Humanitarian Tracker website, it has accounted for more than 100,000 dead people during the course of the war. The database contains first names, last names, vocations, and cause of death (beating, stabbing, chemical, artillery, aerial bombardment, etc.). When available, the database has pictures and copies of Facebook pages. It’s not perfect, but it’s the best accounting of the atrocities that are occurring in Syria.

The interactive graphics on the Humanitarian Tracker website are powered by TIBCO

The Humanitarian Tracker database can be used to detect patterns in the war, including the role that chemical attacks have played (very minor overall, Heeke says). One pattern that has emerged is how civilians are drawn into the war through sniper killings. “We can even see that they’re consistently executed by sniper killing, which enables us to say it was a military order, not a spontaneous sniper killing,” he says. “There’s a lot of insight that even could be used in court to prove responsibility.”

Recently, the group has been working with the Clinton Foundation to create a dashboard that demonstrates the potential for prescription drug abuse across the United States. For this project, the group starts with the government statistics, which come out every year. That time interval isn’t quite up to big-data standards, so Heeke’s group augments it with data pulled from the social media. To that end, it scans Twitter for words that are indicative of the use of prescription drugs. It also brings in other sources to help detect instances of doctor shopping.

Despite the somewhat grisly topics, Heeke doesn’t have a big problem attracting volunteers with the requisite big data skills. “It’s quite amazing because if you look at the labor market, it’s almost impossible to hire these people,” he says. “But if there’s a problem that’s interesting to them and there’s a credible kind of impact, then people are quite willing to donate their free time.”

The biggest problem that has is getting access to big data tools. The group doesn’t have a large budget for software acquisition, so it depends on the charity of software companies, such as TIBCO, to donate licenses for their big data tools. executive director Stefan Heeke

“A lot of commercial providers are not aware that their software can save lives, even though it’s not necessarily what the tool has been made for,” Heeke says. “Most of the tools are either financial risk management or marketing tools, but most of them can also be used to help with big issues. The interesting thing is we have many volunteers coming from those areas. We have a lot of hedge fund and marketing people who want to volunteer because they understand their skills can be used to solve other problems.”

Currently on Heeke’s wish list are access to the Twitter fire hose, to Google‘s repository of Web searches, specialized data visualization tools, and a high-end marketing automation tool to help with outreach. “That would be really amazing to have more companies that will donate the tools,” he says. “I know what I would like to use, but we can’t afford it.”

Related Items:

Big Data Gives Peace a Chance

Feds Look to Big Data to Combat Smuggling

Big Data for the Common Good