Amazon Textract Recognizes Handwriting and Adds Five New Languages
Nov. 16, 2020 — In a blog post, Andrea Morton-Youmans, Product Marketing Manager on the AI Services team at AWS, discussed two new features of Amazon Textract, a machine learning service that extracts printed text, that aim to expand and improve data extraction from various formats, including handwriting, tables and forms. The blog post is included in part below.
Documents are a primary tool for communication, collaboration, record keeping, and transactions across industries, including financial, medical, legal, and real estate. The format of data can pose an extra challenge in data extraction, especially if the content is typed, handwritten, or embedded in a form or table. Furthermore, extracting data from your documents is manual, error-prone, time-consuming, expensive, and does not scale. Amazon Textract is a machine learning (ML) service that extracts printed text and other data from documents as well as tables and forms.
We’re pleased to announce two new features for Amazon Textract: support for handwriting in English documents, and expanding language support for extracting printed text from documents typed in Spanish, Portuguese, French, German, and Italian.
Handwriting recognition with Amazon Textract
Many documents, such as medical intake forms or employment applications, contain both handwritten and printed text. The ability to extract text and handwriting has been a need our customers have asked us for. Amazon Textract can now extract printed text and handwriting from documents written in English with high confidence scores, whether it’s free-form text or text embedded in tables and forms. Documents can also contain a mix of typed text or handwritten text.
The following image shows an example input document containing a mix of typed and handwritten text, and its converted output document.
You can log in to the Amazon Textract console to test out the handwriting feature, or check out the new demo by Amazon Machine Learning Hero Mike Chambers.
Not only can you upload documents with both printed text and handwriting, you can also use Amazon Augmented AI (Amazon A2I), which makes it easy to build workflows for a human review of the ML predictions. Adding in Amazon A2I can help you get to market faster by having your employees or AWS Marketplace contractors review the Amazon Textract output for sensitive workloads. For more information about implementing a human review, see Using Amazon Textract with Amazon Augmented AI for processing critical documents. If you want to use one of our AWS Partners, take a look at how Quantiphi is using handwriting recognition for their customers.
Additionally, we’re pleased to announce our language expansion. Customers can now extract and process documents in more languages.
New supported languages in Amazon Textract
Amazon Textract now supports processing printed documents in Spanish, German, Italian, French, and Portuguese. You can send documents in these languages, including forms and tables, for data and text extraction, and Amazon Textract automatically detects and extracts the information for you. You can simply upload the documents on the Amazon Textract console or send them using either the AWS Command Line Interface (AWS CLI) or AWS SDKs.
AWS customer success stories
AWS customers like yourself are always looking for ways to overcome document processing. In this section, we share what our customers are saying about Amazon Textract.
Lumiq is a data analytics company, holding the deep domain and technical expertise to build and implement AI- and ML-driven products and solutions. Their data products are built like building blocks and run on AWS, which helps their customers scale the value of their data and drive tangible business outcomes.
“With thousands of documents being generated and received across different stages of the consumer engagement lifecycle every day, one of our customers (a leading insurance service provider in India) had to invest several manual hours for data entry, data QC, and validation. The document sets consisted of proposal forms, supporting documents for identity, financials, and medical reports, among others. These documents were in different, non-standardized formats and some of them were handwritten, resulting in an increased average lag in lead to policy issuance and impacted customer experience.
“We leveraged Amazon’s machine learning-powered Textract to extract information and insights from various types of documents, including handwritten text. Our custom solution built on top of Amazon Textract and other AWS services helped in achieving a 97% reduction in human labor for PII redaction and a projected 70% reduction in work hours for data entry. We are excited to further deep-dive into Textract to enable our customers with an E2E paperless workflow and enhance their end-consumer experience with significant time savings.”
– Mohammad Shoaib, Founder and CEO, Lumiq (Crisp Analytics)
We continually make improvements to our products based on your feedback, and we encourage you to log in to the Amazon Textract console and upload a sample document and use the APIs available. You can also talk with your account manager about how best to incorporate these new features. Amazon Textract has many resources to help you get started, like blog posts, videos, partners, and getting started guides. Check out the Textract resources page for more information.
You have millions of documents, which means you have a ton of meaningful and critical data within those documents. You can extract and process your data in seconds rather than days, and keep it secure by using Amazon Textract. Get started today.
Source: Andrea Morton-Youmans, AWS