Tesseract: --psm param not working with hocr output

Created on 13 Jan 2018  路  8Comments  路  Source: tesseract-ocr/tesseract

When I try save resoult to hocr then it look like "--psm" is not taken into account.
Here is my resoult in cmd - look at "page 4", while psm is not used then resoult is empty, with --psm 6 I got better accuracy, but --psm 6 and hocr look same as in 1st case (empty page)

Platform:
Win7U x64
tesseract version:
tesseract 4.00.00alphaleptonica-1.74.1
libgif 4.1.6(?) :
libjpeg 8d (libjpeg-turbo 1.5.0) :
libpng 1.6.20 :
libtiff 4.0.6 :
zlib 1.2.8 :
libwebp 0.4.3 :
libopenjp2 2.1.0

screenshot 1515856154

OSD bug

Most helpful comment

You can create another file with your custom settings.

tesseract in.png out myhocr

Another option, without file editing:

tesseract in.png out --psm 6 -c tessedit_create_hocr=1

All 8 comments

https://github.com/tesseract-ocr/tesseract/blob/master/tessdata/configs/hocr

tessedit_create_hocr 1
tessedit_pageseg_mode 1
hocr_font_info 0

So can I edit this file or its not recommended?

You can create another file with your custom settings.

tesseract in.png out myhocr

Another option, without file editing:

tesseract in.png out --psm 6 -c tessedit_create_hocr=1

Thanks.

After some testing I found that tessedit_pageseg_mode 6 do no get same resoult as --psm 6.
screenshot 1515886076

Very interesting. Thanks for this question and the answers provided.

Adding the following at least gets hocr to output similar to -psm. Of course, not sure if hocr exactly matches text only -psm.

tesseract file.tif output -psm 6 -c tessedit_create_hocr=1 -c tessedit_pageseg_mode=6

@zdenop, please close this issue.

Fixed with PR #1943.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

royudev picture royudev  路  5Comments

garry-ut99 picture garry-ut99  路  5Comments

johnthagen picture johnthagen  路  6Comments

reubano picture reubano  路  6Comments

LaurentBerger picture LaurentBerger  路  3Comments