September 28, 2021

In Pittsburgh, Two-Step Machine Learning Process Sorts Rare Stamps

Oliver Peckham

Setting aside the relatively recent rise of electronic signatures, personalized stamps have been a popular form of identification for formal documents in East Asia. These identifiers – easily forged, but culturally ubiquitous – are the subject of research by Raja Adal, an associate professor of history at the University of Pittsburgh. But, it turns out, the human expertise required to study these stamps at scale was prohibitive – so Adal turned to supercomputer-powered AI to lend a hand.

“[From] the perspective of the social sciences, what matters is not that these instruments are impossible to forge—they’re not—but that they are part of a process by which documents are produced, certified, circulated and approved,” Adal explained in an interview with Ken Chiacchia of the Pittsburgh Supercomputing Center (PSC). “In order to understand the details of this process, it’s very helpful to have a large database. But until now, it was pretty much impossible to easily index tens of thousands of stamps in an archive of documents, especially when these documents are all in a language like Japanese, which uses thousands of different Chinese characters.”

Specifically, Adal was working with a document archive from the Japanese company Mitsui Miike Mine that constituted one of the largest repositories of business documents from modern Japan, spanning fifty years and tens of thousands of documents – including 5,056 images of stamps. Documenting these thousands of diverse stamps would be a gargantuan task for a number of highly specialized research assistants – so Adal reached out to Paola Buitrago, director of AI and big data at the neighboring PSC.

The Mitsui Miike Mine database posed problems for training a machine learning model, since many of the diverse stamps only appeared a few times – or even just once. So the team, instead, applied a two-step machine learning process: first, they trained the model to classify general objects; then, they layered on top of that a classification model to group the stamps, allowing the rare stamps to be grouped together.

Number of stamps by class (truncated on right side). Image courtesy of PSC.

Running this machine learning model took computational firepower – which, of course, the Pittsburgh Supercomputing Center had in-hand. Adal and Buitrago first took advantage of the GPUs onboard the Bridges supercomputer, and then, when it was retired in February, moved on to its successor, Bridges-2 – equipped with even more (and more powerful) GPUs for image analysis. Taking advantage of the resources, the team showed that repeatedly training the model nearly doubled precision (from 44.7 percent to 84.3 percent). Now, the team is looking at applying the model in other research areas.

To learn more, read the reporting from PSC’s Ken Chiacchia here.

Header image: examples of stamps from the repository. Image courtesy of PSC.

Applications: Artificial Intelligence, Research Analytics

Sectors: Academia

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

In Pittsburgh, Two-Step Machine Learning Process Sorts Rare Stamps

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

In Pittsburgh, Two-Step Machine Learning Process Sorts Rare Stamps

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link