ENGLISH

AWS makes Textract generally available for extracting text from documents

137

Amazon Web Services on Wednesday announced the general availability of Textract, a fully managed service that uses machine learning to automatically extract text and data, including from tables and forms. Textract was one of multiple AI-powered tools and services unveiled at last year’s AWS re:Invent conference that requires no machine learning expertise to use.

By comparison, AWS has called Textract an OCR ++ service. It can, for instance, see a document with a table and recognize that the data belongs in rows and columns. “It’s able to identify there’s a table and able to lay out for you what that table should look like so you can use and read that data,” AWS CEO Andy Jassy said at re:Invent.

Textract’s API supports multiple image formats including scans, PDFs and photos, and customers can use it with database and analytics services like Amazon Elasticsearch Service, Amazon DynamoDB and Amazon Athena. They can also use it with other machine learning services like Amazon Comprehend, Comprehend Medical, Amazon Translate or Amazon SageMaker.

Customers using the service already include The Globe and Mail, PwC, Healthfirst, UiPath, Teradact, Ripcord, BluePrism and Alfresco.

Textract is now available in the US East (Ohio) region, US East (N. Virginia), US West (Oregon) and EU (Ireland). AWS will bring it to additional regions in the coming year.

Related Topics: