Big Algorithms to Change the World
The age of information has arrived. Bits and bytes zip through the atmosphere into our homes and buildings, into our televisions and into our mobile devices. Our automobiles, trains, planes, and thoroughfares are Internet-connected. Even our parking spaces have apps. The human race has amassed more data in the past two years than in all of the rest of history combined. However, as a fascinating article in the Harvard Magazine points out, it’s not the quantity of data that’s most impressive.
While big data as a discipline has come of age in step with a veritable explosion in data, the truly remarkable part is what we can do with the data. “The revolution lies in improved statistical and computational methods, not in the exponential growth of storage or even computational capacity,” writes author Jonathan Shaw in paraphrase of Gary King, an Albert J. Weatherhead III University Professor at Harvard University.
As important as Moore’s law advances have been, nowadays the focus is on “big algorithm” – a set of steps for solving a problem much faster than any previous method could do. Shaw cites a colleague who had generated an amount of data that would traditionally require a $2-million computer to analyze all of it. Seeking a less expensive alternative, King and his graduate students developed an algorithm that could do the work in 20 minutes on a laptop.
Linking datasets, visualizing data – these approaches are proving integral to creating knowledge.
A pattern is developing, according to King, with statistical researchers being called in to advise on many different kinds of projects and providing insight and value using modern statistical methods. King gives the example of Kevin Quinn, an assistant professor of government at Harvard who devised a statistical model and compared it with the qualitative judgments of 87 law professors in an effort to determine which could best predict the outcome of all the Supreme Court cases in a year.
King recollects: “The law professors knew the jurisprudence and what each of the justices had decided in previous cases, they knew the case law and all the arguments. Quinn and his collaborator, Andrew Martin [then an associate professor of political science at Washington University], collected six crude variables on a whole lot of previous cases and did an analysis.”
“I think you know how this is going to end,” he relays. “It was no contest.”
Shaw asserts that given sufficient information and methods of quantification, modern statistical methods will outperform even the most expert qualitative notions.
Using big data tools, King developed a method for analyzing social media texts. With a billion social media posts being generated every two days, it’s impossible for a single person or group of people to sift through that much data without computational and algorithmic assistance. The tool that King and his students developed uncovered startling findings about Chinese government censorship practices. The main takeaway from the study was that China’s censorship is primarily aimed at stopping collective action. Personal opinions may be expressed, but not if they include or may incite a call to mobilize.
King also masterminded “what has been called the largest single experimental design to evaluate a social program in the world, ever,” according to Julio Frenk, dean of Harvard School of Public Health. In 2000, Frenk, just appointed as minister of health for Mexico, hired King to evaluate a new public insurance scheme called Seguro Popular. At the time, more than half that nation’s health expenses were not covered by insurance. As a result four million families were financially ruined each year due to excessive health expenditures.
Frenk led a healthcare reform of which Seguro Popular played a key role. King performed a thorough analysis, comparing demographically-similar communities that received the public insurance to those that didn’t. The study proved the plan was very successful in protecting families from catastrophic loss due to serious illness, but it also showed where there was room for improvement, for example in the areas of public outreach and preventative care.
Data-based initiatives are also benefitting public health in the world’s poorest nations. Nathan Eagle, an adjunct assistant professor at the Harvard School of Public Health, worked with the Kenyan ministry of health to develop and implement a blood-bank monitoring system. The system recruited nurses at rural hospitals to text real-time blood supply levels into a central database. These were fed to a visualization system that indicated to the blood bank workers in Kenya the actual blood levels at the hospitals. While the program worked perfectly the first week, in the second week, the text messages stopped coming in.
Eagle said that although the implementation was essentially bulletproof, the system did not take into account a crucial factor: the cost of a text message. Eagle negotiated with mobile operators in East Africa and they set up a script to credit the rural nurses each time they sent a message, plus an extra penny to thank them for their input. “Virtually every nurse reengaged,” says Eagle.
The story is illustrative of the need to consider cultural differences and to adjust to unseen circumstances.
A more ambitious project took place in Rwanda that exemplifies what happens when correlations show up in large linked data sets without understanding causation. Eagle was working with mobile-phone records when he began to notice a pattern where people would suddenly stop moving as much. He initially thought the change in movement signified a sickness or diseases. At first the data seemed to be predictive of cholera outbreaks, but in fact they were really indicating flooding. When there is flooding, roads wash away and disease outbreaks become more likely. The team’s supercomputing analysis of the phone data was identifying flooding, data that they could have located in easier ways.
Even though the story illustrates a setback, it was a good learning experience for Eagle. “It opened my eyes to the fact that big data alone can’t solve this type of problem. We had petabytes of data and yet we were building models that were fundamentally flawed because we didn’t have real insight about what was happening” in remote villages.
Now he uses a platform that is similar to the model used with the Kenyan nurses. It surveys villagers via text message, giving them airtime credits in exchange for answering questions about their health and health-related habits. This simple method of contributing information to larger datasets has since been commercialized into a platform called Jana. Eagle is the co-founder and CEO.
There is huge interest in data-related sciences and early courses in data science are attracting a large number of students from a wide range of disciplines. Currently, there aren’t enough people with the skills to extract meaning from the ever increasing reams of data. Then there’s the related problem of managing and securing all this data. Data’s not relevant or interesting unless it’s actionable, states Eagle. That takes math and machines and people that speak these languages.
You can read the Harvard Magazine article at http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal