Azure-docs: Forms Recognizer returns error on some input files, even though they all use the same template

Created on 27 Jun 2019  Â·  8Comments  Â·  Source: MicrosoftDocs/azure-docs

The error message is "Word-level token extraction failed on document. string index out of range". I have 10 single page PDfs in a blob storage container that I am trying to train a model for, all of the PDFs use the same template so I'm confused as to why some encounter the errors, and others don't.

I have tried using cURL and PostMan, both return errors for the same PDFs, how can I isolate what the issues is inside the PDFs that fail when attempting to create and train a model?


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 assigned-to-author cognitive-servicesvc form-recognizesubsvc product-question triaged

Most helpful comment

This error " string index out of range" should be fixed now. Please try running the data again.

All 8 comments

@BeigeBadger Thanks for the feedback. We are investigating into the issue and will update you shortly.

@BeigeBadger Could you please confirm if the error is seen while using the train API or Analyze API? I have noticed the same error while using the Analyze API when the document is of a different format that is not used while training.

@PatrickFarley Could you please let us know if there is a way to get the details of the error for a particular document?

@Rohit I get the error when using the Train API.

@NHaiby, this user appears to be having issues with PDF text extraction

I get the same error when I use the Train API.

That is my response:

Response status code: 200 Response body: { 'modelId': '9..', 'trainingDocuments': [{ 'documentName': '1.pdf', 'pages': 1, 'errors': ['Page 1: Word-level token extraction failed on document. string index out of range'], 'status': 'failure'

That is the response for all five documents in the blob storage

This error " string index out of range" should be fixed now. Please try running the data again.

@NHaiby
Hello, thanks for your feedback. My first test was successful!

Resolved

please-close

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bdcoder2 picture bdcoder2  Â·  3Comments

Ponant picture Ponant  Â·  3Comments

mrdfuse picture mrdfuse  Â·  3Comments

jharbieh picture jharbieh  Â·  3Comments

AronT-TLV picture AronT-TLV  Â·  3Comments