Azure-docs: Using sample labeling tool for table labeling

Created on 23 Feb 2020  Â·  11Comments  Â·  Source: MicrosoftDocs/azure-docs

How to label a table with multiple rows using sample labeling tool?


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 assigned-to-author cognitive-servicesvc forms-recognizesubsvc product-question review-team-triage triaged

Most helpful comment

Hi, we are currently working on updating the docs with specific information on tagging tables. I'll get back to you very soon.

All 11 comments

@AleynovSergey
Thanks for the question! We are investigating and will update you shortly.

@AleynovSergey I have used the labeling tool to train a document with tables with two tags for headers and rows by creating the tags and selecting the text after OCR is complete for the document

image

Using the predict option the tool recognized a row and returned the fieldNames of these labels and text.

image

You can also download the JSON and lookup the values returned with predict from this model to build your application.

@RohitMungi-MSFT as you can see in images attached to your reply, only headers row and a second row were returned. How can I use a Labeling tool to train and predict all rows and headers from a specific table on the page?

@NHaiby @PatrickFarley Could you please let us know if we can follow any additional training steps from the tool to increase the prediction accuracy to recognize all rows in a table?

Hi, we are currently working on updating the docs with specific information on tagging tables. I'll get back to you very soon.

Thanks @PatrickFarley.

@AleynovSergey We've since updated the doc with table-labeling instructions. Please see the Sample labeling tool guide.

please-close

Have the same query. Not sure if it was resolved for @AleynovSergey.

In the documentation link above, the only thing i see as guidance for tables with multiple rows is: "Table data should be detected automatically and will be available in the final output JSON file. However, if the model fails to detect all of your table data, you can manually tag these fields as well. Tag each cell in the table with a different label. If your forms have tables with varying numbers of rows, make sure you tag at least one form with the largest possible table."

This is not clear. The images above shared by @RohitMungi-MSFT indicate that the entire header row of the table should be tagged as one tag "Headers" and then all the cells of data rows as one tag "rows". The above statement from the documentation says "Tag each cell in the table with a different label". In either case, how will we get a simple json result that contains each row with the column header and the corresponding value?

If possible can you share a sample model - that processes invoices with multiple rows in the table.

Please note - this is the same question as #53750 - however the response there too wasn't helpful.
If currently the form analyzer can't process forms with multiple rows in the table - it's fine. Just confirm it to us - so we can look at other alternative ways.

We have the same issue, we need to extract table with multiple records from invoice form.
Seems Form Recognizer currently can't process it properly, the auto detected table can't be used in the most of our cases.
@NephorLabs Do you have any alternative way to do it? Really appreciate for any suggestion.

Thanks,

Any update on this?

Hi! I have the same query that my tables have variable number of rows in various documents even conforming to the same overall template.

If the table itself is not recognized - which is something happening most of the time - then there's no way to extract all rows. Or is there that I am unaware of?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jharbieh picture jharbieh  Â·  3Comments

varma31 picture varma31  Â·  3Comments

jamesgallagher-ie picture jamesgallagher-ie  Â·  3Comments

DeepPuddles picture DeepPuddles  Â·  3Comments

Agazoth picture Agazoth  Â·  3Comments