I there any command line parameter to set an arbitrary "level" of confidence (in text mode), below which program should not try to "guess" chars but use a replacement char instead? Thanks to this I could correct bad words by searching for replacement chars - this would speed up manual work.
5.0-alpha
Win10 64bit
Please respect guidelines for posting issue: use tesseract user forum for asking questions/support.
This question is an _issue_
IIRC there is no rejection mechanism for the LSTM models (yet). There used to be plenty of related parameters in the legacy engine (see tesseract --print-parameters | grep rej), but whether any of these will ever be supported again is unclear. (Rejection in the LSTM beam decoder is possible in principle, but would probably need distinct parameters.)
In the meantime, you can emulate this to some degree by looking at the confidences of character outputs yourself:
alto config): WC on the word levelhocr config): x_wconf on the word level, x_conf on the character leveltsv config): conf (second-last) column on the word levelResultIterator.Confidence() (on the word or character level), ChoiceIterator.Confidence() (on the character alternative level)This question is an _issue_
@spajak, the question may be an issue for you, but I don't think it meets the guidelines for an issue according to the docs of this repository. https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md
Creating an Issue or Using the Forum
If you think you found a bug in Tesseract, please create an issue.
Use the users mailing-list instead of creating an Issue if ...
You have problems using Tesseract and need some help. You have problems installing the software. You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the ImproveQuality documentation. You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the official guides [1] or [2] found in the project documentation. You have a general question.