November 13, 2013

The ABC’s of the USPS Big Data System

Isaac Lopez

Volume and velocity are significant challenges for the United States Postal Service (USPS), who in addition to needing to sort approximately 160 billion pieces of mail in its 275 routing centers around the country, also needs to perform sophisticated fraud detection scanning at rates up to 4 billion items per day.

Recently, FedCentric Technologies announced that it was awarded a $16.7 million contract to expand on work that it’s been doing for the USPS in building a real-time system for these purposes. This week, Datanami has learned more details of the system that the company is working on, which according to Gerry Kolosvary, President of FedCentric Technologies, is built using the SGI UV 2000 system.

“What [the USPS] has tried to do in the past is use traditional approaches,” said Kolosvary, explaining USPS’s odyssey in getting a grip on its big data problem. These traditional approaches, he explained, have been X86-based and proprietary Sun Systems where they’ve built either scale-up and scale-out platforms that ultimately didn’t work.

“It’s the vast volume,” said Kolosvary, explaining why neither architecture worked for them. USPS was processing 2,500 scans per second and the systems were bottlenecking at the network level, he said. The USPS needed to find a different approach.

Enter FedCentric Technologies, which Kolosvary categorizes as a federal systems integrator that specializes in big data. He says that, while everyone is focused on the “V’s” of big data, FedCentric has come up with its own process-focused alliteration around the ABC’s of big data. (Note: while as a journalist this sort of thing can be annoying, it’s also instructive when you consider the order of magnitude increase in performance they are able to get using the approach, which we’ll talk about shortly.)

The architectural difference starts with what Kolosvary refers to as “Affinity.” “Affinity is about how close your data is to each other,” he explains. “How easy is it for you to make a connection and to see something that isn’t apparent, but becomes apparent once you make that connection? The relative closeness of the data becomes very important.”

“Boundaries” is the next process issue, explained Kolosvary, who added that in the system where the USPS was using a scale out approach, the boundary was really at the blade level. “The way I define boundary is how much of the time does your data and your application spend on a CPU and in-memory,” he explained. “The boundaries of the scale out system are defined by the blade.”

“Connectivity” becomes the next important issue to consider. Once you cross the boundary, you’re out on the network. “At that point, the network becomes an integral part of the compute process, and at that point you’re really going only as fast as the network can go. The network becomes your weakest link,” he said

Finally, Kolosvary explained that they’ve added the letter D for “Domains,” referring to the architectural sections of the overall system. “How much time does your data spend in the compute domain vs. when it crosses the boundary into the network or the I/O domain?”

Using this framework, Kolosvary said that FedCentric Technologies developed an approach that they put into place for the USPS leveraging the SGI UV 2000 supercomputer system. “The boundaries of this approach are not at the blade level,” he said. “They’re at the system level. So where I’m limited on a X86 scale out system to 40 or 48 cores and 1.5 terabytes, with the SGI system I can grow and scale that to 4096 cores and 16 terabytes of RAM in a boundary before I have to do anything – before I have to go to a network, and before I have to go to disk. That means my affinity can really be close.”

“In fact,” he continued, “I never have to leave memory, so it’s only a couple of hops away using a computer backplane, not a network. So the connectivity is tightly coupled and so we can make these data connections very quickly on the system because there aren’t any boundaries… as far as domains go, we stay in the compute domain almost all the time.”

The results of this approach, he says, have been dramatic. “Using our approach, they are able to get about 3.5 million scans per second – several orders of magnitude increase in performance.” This is a huge jump from the 2,500 scans per second noted earlier.

The system is working and FedCentric has been rewarded for the success with a contract to expand the system by building four more of the supercomputer-based systems at the USPS’s Eagan, Minn. Facility. And we, of course, get another example of how big data (and big compute) are affecting our lives in hidden ways.

Dat Wants to be the GitHub for Data

FoundationDB Gets $17M to Push ACID Machines

Applications: Research Analytics

Technologies: Frameworks, Network, Systems

Sectors: Government, Other

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

The ABC’s of the USPS Big Data System

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

The ABC’s of the USPS Big Data System

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link