I'm having trouble simply running the example command from wiki page on hocr output (https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#hocr-output)
command:
tesseract --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-eng -l eng hocr
eurotext-eng.txt is generated with text and the terminal says Can't open hocr:
read_params_file: Can't open hocr
Should generate eurotext-eng.hocr
(Note: this works as expected if I exclude --tessdata-dir ./)
Perhaps this is an argument parse bug, or maybe there's a new syntax and the wiki needs updating. Or maybe I'm missing something.
Thank you!
Thank you.
The example was wrong for your case. You already found the right solution. I fixed the Wiki.
Thanks @stweil. So does this mean the hocr option cannot be combined with the --tessdata-dir option? Is there a workaround to use these two options together?
The Tesseract command line syntax is a bit confusing. hocr is not an option, but a configfile. --tessdata-dir is an option. Both can be used together, but options must come before configfiles. In your case the given directory did not contain the expected tessdata files.
The Tesseract command line syntax is a bit confusing
It's terrible...
Ah I see. The wiki does call it a config. So I just need to copy tessdata/configs/hocr into my tessdata directory and it should work. I'll give this a try.
Hmm didn't work. Got this error: read_params_file: parameter not found: enable_new_segsearch
It worked. That's only a warning. You copied an old hocr file. Remove enable_new_segsearch from that file.
Actually the hocr config file does not contain enable_new_segsearch. And there is no generated files txt or hocr/html.
Maybe someone else can reproduce what I'm experiencing:
tesseract --tessdata-dir ./ ./test-image.png ./extract-image-output hocrThe hocr file contains:
tessedit_create_hocr 1
hocr_font_info 0
The output from this is:
read_params_file: parameter not found: enable_new_segsearch
and no new files created in the cwd.
If you really get error read_params_file: parameter not found: enable_new_segsearch there are 2 possibilities:
Config parameter enable_new_segsearch must be somewhere specified (e.g. in traineddata or config file).
BTW please use recent tesseract version (and data) when you report issue.
Thanks @zdenop for the help! I was able to get things working normally by doing two things:
The result: hocr file was generated and no warning message.
Thanks again for the help.