Follow Datanami:
July 30, 2018

Water Co. Exploring Use of ML to Detect Quality Issues

(Nikolay 007/Shutterstock)

Everybody expects to have clean drinking water. But as the lead crises in Michigan has shown, that’s not always the case. Now American Water, the largest publicly traded water company in the country, is actively researching the use of machine learning and real-time streaming data technology to detect and identify potentially harmful chemical signatures in its surface drinking water supply.

The company is in the early stages of building such a machine learning system. But according to American Water Senior Technologist John Kuchmek, the potential benefits of training machine learning models on real-time water quality data collected by remote sensors are too great to ignore.

“You can imagine if we can figure out that if someone upstream is dumping a chemical, we can actually stop that raw water treatment,” Kuchmek told Datanami during an interview at the Hortonworks DataWorks Summit in San Jose last month. “We all pull from the same river. On the Missouri River, there might be another 10 facilities that are past us, so if we can actually get that signal and understand it, now we can warn them ahead of time.”

American Water is fully embracing modern data technology and exploring how it can increase the quality of it its product, streamline work for its 6,500 employees, and improve service for its 3 million customers. The company has based its data transformation strategy on a pair of Hortonworks products, including its Hadoop distribution, called Hortonworks Data Platform (HDP), and its Apache Nifi-based streaming data solution, called Hortonworks Data Flow (HDP).

The machine learning project, which is one of several data projects American Water has under way, could go live soon, Kuchmek said. It involves training a machine learning model to associate sensor measurements for things like organic carbons and dissolved oxygen with the presence of known chemicals in the water. The resulting data signatures could then be used to detect the presence of chemicals in real time.

“They’re going to inject chemicals, materials, substances, and actually start to get signatures on the different sensors, and start training machine learning and AI-type models to do predictive analytics, so if something comes in, [we can say if we] should we stop the raw water intake,” he said.

Kuchmek, who is involved on the data engineering side of this project and in support of the data science activities, said the project uses Kepware‘s KEPServer technology to collect the sensor data from the programmable logic controllers (PLCs) using the OPC UA communications protocol, and then employs Apache NiFi (via HDF) to stream the chemical-signature data into the Hadoop cluster.


The chemical-signature data is currently analyzed using Apache Hive, but the company plans to shift it to Druid to get better response times. “Druid is a better time-series database. It will actually do the querying much faster,” Kuchmek said.

The goal is not to completely automate the water treatment process. Instead, it’s to give water quality operators another tool to help them make better decisions. Chlorination is another task where the knowledge is largely tribal and exists between the ears of its workers. That’s another complex task with multiple variables that machine learning could help with, Kuchmek said.

“That starts to tell an operator that with this type of event, you put in this much chlorine with this much confidence,” he said. “Those types of things can actually help.”

Liquid Investment

American Water, which is based in New Jersey, has several other data projects in the works for its Hadoop cluster, which is in the process of being upgraded with 4PB of storage. One of those involves pulling data out if its SAP-based ERP system to improve customer service.

Under the existing ERP system, American Water’s field service representatives (FSRs) could be working with data that’s up to 24 hours old when they go out on calls to service customers. If a customer had recently made a payment to avoid having their water shut off, the FSR would have no way of knowing this under the old system.

Instead of paying SAP perhaps millions of dollars to deliver real-time capability in its HANA implementation, Kuchmek and his IT colleagues hacked together a way to use NiFi to continually poll the SAP database for any changes to customer tables. Any changes to customer accounts are now streamed in real-time (or thereabouts) to a GUI dashboard that consolidates all the pertinent information for FSRs.

The company build that GUI dashboard using App Orchid‘s hosted app development and runtime offering. According to Kuchmek, App Orchid uses Apache Spark to grab data from its HDP instance and present it to the FSRs.  “We’re using [App Orchid] as a point of integration and aggregation, and it gets pushed to the cloud layer where the enrichment happens,” he said. “That’s where calculations are done, where the machine learning and analytics are done.”

American Water is also building a sentiment analysis system that alerts the company to potentially relevant conversation occurring on social media. “We’re in the day and age when somebody would tweet something or put a video online or do something before they actually call customer service,” Kuchmek said. “If I know that a customer is within a certain vicinity and we see a tweet that there’s a water main break, then we can let our call center representative know.”

The Amazon of Water?

The company selected Hortonworks in part because of its commitment to open source, which it’s counting on to be a key differentiator. Since there are no shrink-wrapped products that do what it needs to do, American Water has pledged to build them itself.

“We tried to take a square peg and throw it into a triangle hole — it’s just not possible,” Kuchmek said. “We customized the heck out of an out-of-the-box tool, which then increases complexity and problems. So we decided, let’s just build everything ourselves.”

This willingness to embrace technology is not how water companies behave. Public and private utilities usually are typically very conservative when it comes to technology investments. Progress for water companies is typically measured by the amount of physical pipe they can get in the ground, not by how much data they can move through digital pipes.

For American Water, the digital transformation and big data adoption started just two years ago, when Radha Swaminathan was hired as its CTO and chief innovation officer. “Our CTO is a visionary,” Kuchmek said. “American Water didn’t behave in this manner two years ago.”

Asked which other utilities American Water is emulating, Kuchmek indicated there were none. Instead, it’s taking its data queues from Amazon. “If we don’t get ahead of it, someone else will,” he said. “And if someone else gets ahead of it, then we’ll just be buying their products.”

Related Items:

Dutch Turn to Big Data for Water Management & Flood Control

Hoover Dam Gets a Data Upgrade