How Data Science Is Fighting the Flint Water Crisis
The Flint water crisis is one of the defining environmental crises of our era. Headlines describing poisoned children across one of America’s classic manufacturing cities seized the country’s attention, forcing policy-makers to confront the issue at a high level (and even featuring in two presidential debates in 2016). At OmniSci’s Virtual Summit 2020, Jared Webb, chief data scientist at BlueConduit, described how their company applied machine learning to dramatically improve outcomes in Flint as it tackled a seemingly insurmountable problem.
What happened in Flint
Once, Flint, Michigan, was one of the most prosperous cities in the country, serving as a major manufacturing hub in the mid-20th century. But foreign competition reduced factory jobs from 80,000 to just 5,000, and the population shrank steadily after the 1960s. The Great Recession delivered a brutal blow, forcing Flint to declare bankruptcy, after which the state government took control with an emergency manager.
The emergency manager decided that, in order to save money, Flint’s water source would switch from Lake Huron (which Webb called a “wonderful body of water”) to the Flint River. Because the Flint River was chemically different and had not been properly treated before coursing through the water system, it began to corrode lead service pipes, which connected the water mains to individual houses throughout the city. This lead eventually leached into the water, giving the water a foul odor, producing rashes on people who bathed in it, and raising lead levels in residents’ bloodstreams. Some stopgaps were employed – such as supplying bottled water – but the real task was replacing the lead service pipes.
“Now this is a really easy task,” Webb said, “if you know where they are.”
Finding the pipes
Not all of the service pipes in Flint were lead-based, and the cost of replacing a pipe ranged into the thousands of dollars. Since most of the cost was attributable to the digging, a false positive – that is, digging to reach a pipe that didn’t need to be replaced – was a large sunk cost, and at 30,000 occupied homes and 50,000 housing parcels, that posed a serious financial obstacle.
Finding out where the lead pipes were was much easier said than done. Webb and his colleagues managed to find hand-drawn maps hidden in a basement locker of a city building (“Ironically,” he said, “these maps were all water-damaged”), but even when digitized, the maps had poor legibility and reliability. According to Webb, the maps were “only slightly more accurate than flipping a coin” in terms of accurate lead pipe detection.
The city had $100 million to work with – a far cry from the $250 million required to check and, if necessary, replace service pipes at every home. “The question,” Webb said, “is how can we prioritize these resources?”
A data-driven approach
The research team had an idea. The city funded them to hire hydrovac trucks, which fire high-pressure water at the ground to create a hole and vacuum the watery sludge away, all at just a couple hundred dollars per job. The hydrovac process allowed them to read the labels on the service pipes.
The researchers used the first round of hydrovac truck-enabled readings to create a sample dataset. From there, they used an XGBoost-based machine learning model to predict the likelihood of lead service pipes at each individual housing parcel. The model managed strong accuracy (around 70-80%), allowing the team to help Flint prioritize which houses received pipe replacements.
In the presentation, Webb showed a map of the results across three years, with red dots indicating lead pipe finds and green dots indicating copper pipe finds. The approach wasn’t perfect in 2016 or 2017 – there were green dots on each map – but then, Webb said, “something very interesting happens in 2018.”
“All of a sudden you see this explosion of green,” he said. “And you might be wondering: what happened to your models? Did you run out of lead pipes to find? What’s going on? Well, the problem was that our model stopped being used.”
Seemingly out of nowhere, Flint stopped using the researchers’ free services to identify and replace lead service pipes. Instead, the city spent millions of dollars to hire a major engineering firm to locate the pipes. The moment the switch was made, the hit rate plummeted to around 10-12%. “This was an atrocious waste of resources,” Webb said.
The Natural Resource Defense Council (NRDC), which had helped bring in the money for the pipe replacement project, was not pleased. Further research revealed that the excavations were correlated by voting precinct – not by likelihood of lead pipes – and the NRDC took Flint to court. After a legal battle, the NRDC won, with a judge forcing Flint to return to using the data-based approach to identify and replace lead pipes. Once the switch back was made (“There was some friction” with the city, Webb said), lead pipe identifications returned to the higher hit rate.
Reaching out to the community
Beyond their work with the city government, the researchers wanted to make something community-facing that would be “accessible for the average person in Flint.” They started with Google Maps (as Webb explained, they were data scientists, not GIS experts) and quickly overwhelmed that tool. So they turned, instead, to OmniSci, a data science and analytics company with specialties in visualization. Then, Webb said, “we were able to build a better map.”
The map was designed based on five core principles:
- Inspire appropriate action without inciting panic or apathy
- Impart information immediately, avoiding unnecessary clicks
- Don’t overwhelm with too much information at once
- Maintain accessibility to users with various abilities
- Incorporate community feedback
To achieve the design goal, the researchers used a narrative approach. “When people clicked on their house,” Webb said, “we wanted them to have a story.” Instead of listing the information for a given parcel like a database entry, the interface explained to users what was known about their pipes in paragraph form. The researchers also added gradient colors for predicted lead (or lack thereof) and added a colorblindness-friendly mode. The map is currently in soft release in Flint, and the researchers are working with local community activists to ensure that the tool is useful to Flint’s residents.
What’s next for the researchers? “We might start making money doing this,” Webb said, “because we can’t do it for free forever!” To that end, they recently formed BlueConduit, a water infrastructure analytics consulting company.