How GPU-Powered Analytics Improves Mail Delivery for USPS
When the United States Postal Service (USPS) set out to buy a system that would allow it to track the location of employees, vehicles, and individual pieces of mail in real time, an in-memory relational database was its first choice. But when that technology proved too costly and complex, the 241-year-old service looked to graphical processing units (GPUs) for a big-data speedup.
Just as other large logistics companies have done, the USPS has big plans to tighten up its operations and boost customer service through big data analytics. The main difference here is scale. With more than 600,000 employees and a fleet of 215,000 vehicles, the USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year.
The Postal Service decided to outfit its entire workforce of USPS carriers (or about 200,000 on any given day) with a device that emits its exact geographic location every minute. Armed with this location data, the service aims to improve various aspects of its massive operation, including improving carriers’ route efficiency.
When the service scoped out its location data plans, it was using the an in-memory relational database from a large, well-known vendor. Scaling that database to handle the new workload would have been prohibitively expensive, says Amit Vij, the CEO of Kinetica, (formerly GIS Federal). “For a fraction of that cost, we were able to go into production in a few months in 2014,” Vij tells Datanami.
GPU Power Boost
Kinectica has experience providing geospatial intelligence to large government clients, including the Department of Defense, for whom it built a real-time terrorist tracking system that uses data collected from drones and other sources in the Middle East and around the world.
The company recently moved its headquarters from Virginia to San Francisco to be closer to the epicenter of the big data movement, but Kinectica still maintains its security clearance, and still works closely with the U.S. Army’s Cyber Center of Excellence in Fort Gordon, Georgia.
After the proof of concept, the USPS selected Kinetica’s distributed in-memory database, called GPUdb, to provide analysis on the geolocation data as it trickles into the data center.
Today, the USPS runs GPUdb on a large cluster composed of 150 to 200 nodes. Each node consists of a single X86 blade server from Hewlett-Packard Enterprise, half a terabyte to a terabyte of RAM, and up to two NVidia (NASDAQ: NVDA) Tesla K80 GPUs. The system went live in 2014, and was bolstered with a high availability redundancy in November of that year.
This powerful system was originally setup to enable USPS managers and analysts to visualize and query the “breadcrumb” trail left by all of its mail carriers and vehicles. With 200,000 USPS devices emitting location once every minute, that amounts to more than a quarter billion events captured and analyzed daily, with several times that amount available in a trailing window. The system has also been enhanced to track the flow of individual pieces of mail.
“For the first time in history, USPS is able to see their entire mobile workforce in real time,” Vij says. “USPS never knew how much mail was going into a particular distribution center or post office. It would take them days or weeks to determine that. And they would always be too late because it was such a massive amount of data. With us, we’re able to do that.”
USPS’ parallel cluster is able to serve up to 15,000 simultaneous sessions, providing the service’s managers and analysts with the capability to instantly analyze their areas of responsibility via dashboards and to query it as if it were a relational database.
According to a case study published on Kinectica’s website, the system gives USPS a detailed geospatial view of its operations, which enables it to bolster its operation in a number of ways, including:
- Enabling dispatchers to efficiently plan and make the best use of routes;
- Enabling improved contingency planning when a carrier is absent;
- Finding anomalies in mail distribution, including overlapped coverage, uncovered areas, and bottlenecks
- Enabling more efficient workforce utilization through the use of aggregate performance data;
- Verifying that mail is actually delivered and collected at expected time and location.
What’s more, the system allows USPS to send notifications to mailers and customers when their mail is actually delivered, and also can be used to help direct funding to the area where it can be used the most.
While hard numbers are hard to come by, the implementation is expected to save the USPS millions of dollars in fuel costs and the need for fewer trucks to deliver mail. It also will bolster customer service by helping to ensure on-time delivery and narrowing delivery windows.
The USPS implementation won Kinetica an HPC Innovation Excellence Award from IDC earlier this month. Kevin Monroe, a senior research analyst at IDC, said: the Postal Service’s application of Kinetica “enhances the quality of service that US citizens receive by giving them a better, more predictable experience sending and receiving mail.”
Filling the Real-Time Niche
Moving forward, Kinetica plans to take its technology further into the consumer goods supply chain, where logistics demands get bigger and more complex. This includes analyzing RFID data next to other logistics data for a major US retailer. In POCs, Vij says GPUdb is outperforming SAP (NYSE: SAP) HANA by a factor of 50.
The company has identified an opportunity to sell its in-memory, GPU-powered distributed database to organizations that have not been able to address their real-time analytic needs in other ways.
“Hadoop solves the long-term historical reporting and archival, but as far as streaming live data upon arrival, that’s a gap that Impala and Spark and all the other in-memory tools in the open source community have fallen quite short at the moment,” says Kinetica’s vice president of global sales Marcus Holm. “They find out after 500 or 1,000 or 2,000 nodes that they still can’t achieve it.”
Kinetica is helping to fill a niche in the real-time analytics that open source big data tech can’t fill at the moment. But the company doesn’t view its proprietary technology as competing with Hadoop, Spark, and Kafka, but complementary, particularly for analyzing fast-moving data in 10-day or 30-day trailing windows.
While GPUs still carry a 2x price premium over traditional CPUs, they’re giving users computational value that’s more than that, Holm says. “I do in fact believe that these GPU-accelerated architecture are creeping into mainstream and enterprise as a whole,” says Holm, who formerly worked at Cloudera. “It’s just not feasible for these enterprise to have 2,000 node clusters. They’d rather have a 40-node cluster and achieve what they need.”