Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
Big Data’s Role in Detecting Mail Fraud
The United States Postal Service serves the mailboxes of over 300 million homes and businesses in the United States. Delivering mail from point A to point B becomes a huge logistical problem when those points number in the hundreds of millions.
As such, the USPS is doing as much as possible to incorporate supercomputing and big data analytics in both their logistical and fraud detection operations, as discussed by USPS Program Manager Scot Atkins.
“We catch some pretty sophisticated stuff,” Atkins said in discussing how their analytics operation works to detect fraud as quickly as possible. “The biggest effect we’ve seen is deterrence. We can measure deterrence within our organization, and a large number of fraud cases dropped off significantly since we introduced a lot of revenue protection capabilities as far back as 2006.”
According to Atkins, 528 million pieces of mail are sent through the USPS every day. That’s 6100 mailings each second that their supercomputing facility in Eagan, Minnesota has to keep track of.
Those half-billion mailings per day and six thousand per second translate to 16 terabytes of in-memory computing at the Minnesota site. That near real-time processing, according to Atkins, then gets compared to a transactional database of approximately 400 billion records. The goal is to detect abnormalities that may be indicative of fraud before the suspicious mailings get to the intended local post office.
“We’re in the last mile by the time mail gets to the post office, and if we don’t intercept fraudulent packages at that point, chances are we won’t get the revenue,” said Atkins. Part of intercepting revenue involves collecting and tracking data on each piece of mail for things like weight, size, and routing information. That data then gets sent through the Postal Routed Network, the USPS’s “own enclave of the internet,” according to Atkins, to the facility in Minnesota where it is quickly processed and analyzed. “The USPS is very active in protecting revenue for the purpose of rate stabilization and all the benefits associated with it.”
Indeed, with bandwidth on the internet constantly expanding allowing people to send virtually anything over email or free large-file hosts such as Dropbox, the USPS revenue streams do not look like what they used to. The agency receives no operational funds from taxes yet employs half a million people, according to Atkins. Keeping prices down, particularly standard stamp prices which currently hold at 46 cents, is essential to keep the USPS operationally sustainable.
There may come a time where it seems unfathomable for massive logistical operations such as the one USPS is running to run without big data optimization and analytical assistance. Luckily, the Postal Service recognized that early on in 2006 and it will be interesting to see where it takes them.