September 17, 2021

FinTech Firm Explores Named Entity Extraction

Oliver Peckham

(JMiks/Shutterstock)

Founded in 2018, San Francisco-based Digits Financial combines machine learning and analytics to give businesses insights into their transactions, automatically identifying patterns, classifying data, and detecting anomalies in that data as each transaction is added to the database. Now, in a blog post, Hannes Hapke – a machine learning engineer at Digits – revealed how Digits uses natural language processing (NLP) to extract information for its clients and what they learned from developing their own model.

Digits leverages named entity recognition (NER) to extract information from unstructured text and turn it into categories like dates, identities, and locations. “We had seen outstanding results from NER implementations applied to other industries and we were eager to implement our own banking-related NER model,” Hapke wrote. “Rather than adopting a pre-trained NER model, we envisioned a model built with a minimal number of dependencies. That avenue would allow us to continuously update the model while remaining in control of ‘all moving parts.’”

In the end, Digits decided that no preexisting model would suffice, instead settling on building their own internal NER model based on TensorFlow 2.x and its accompanying ecosystem library, TensorFlow Text. They also conducted their own data annotation, using doccano to parse banking data into companies, URLs, locations, and more.

Hapke also explained Digits’ decision to go with Transformer architecture – specifically, the Bidirectional Encoder Representation from Transformers (BERT) architecture – for its initial NER model.

“Transformers provide a major improvement in NLP when it comes to language understanding,” he said. “Instead of evaluating a sentence token-by-token, the way recurrent networks would perform this task, transformers use an attention mechanism to evaluate the connections between the tokens.” Further, he explained, BERT could evaluate up to 512 tokens simultaneously.

After prototyping the model, they converted the model for production and began an initial deployment, optimizing the architecture for high throughput and low latency.

The resulting product provided, at its cores, a deceptively simple capability: allowing users to search their transaction records for vendors, websites, locations, and so forth. Digits has also expanded the model to include automatic insights and optimized it further for latency.

An example of how Digits’ model parses financial data into categories. Image courtesy of Digits.

“A more recent pre-trained model (e.g. BART or T5) could have provided higher model accuracy, but it would have also increased the model latency substantially,” Hapke said. “Since we are processing millions of transactions daily, it became clear that model latency is critical for us.”

Given its handling of financial data, Digits is sensitive to concerns over false positives and other errors. As a result, Hapke explained, Digits makes sure that it communicates which results were ML-predicted and allows users to easily overwrite suggestions.

Applications: Artificial Intelligence, Data Mining, Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services

Vendors: Digits

Tags: BERT, Digits, financial analytics, financial data, Transformer

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

FinTech Firm Explores Named Entity Extraction

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

FinTech Firm Explores Named Entity Extraction

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link