Hi,
The command line help output shows 11 PSM modes (0 through 10).
pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
I was trying each one and getting mixed results. However, I accidentally ran 'psm -11' and I suddenly got perfect accuracy - way way better than any other PSM mode, and much better than the default. The same for PSM 12 too, perfect accuracy - then PSM 13 gives nothing.
The image is just about 10 words over 2 lines, spread about the page. All the other segmentation modes and default garble the text, but PSM 11/12 worked great, splitting text perfectly.
Is it correct that there's a PSM 11 and 12 mode? What do they do, why do they give such good accuracy?! And should they be in the help/Wiki?
Thanks!
Is it correct that there's a PSM 11 and 12 mode?
Yes. and PSM 13 too.
You revealed our secret! :)
https://github.com/tesseract-ocr/tesseract/blob/8d6dbb133b41/api/tesseractmain.cpp#L115
I'm sorry, I do not have good answers to your other questions.
With Tesseract 4.0 PSM 11, 12, and 13 appear in the help message. psm 13 is used with the new LSTM engine to OCR a single textline image.
master(4.00) & 3.05 repository produce help message for psm 11-13
for
●✚ tesseract --version
tesseract 4.1.0
leptonica-1.78.0
libgif 5.2.1 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found SSE
support psm=11 -13:
tesseract --help-psm
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
Most helpful comment
for
support psm=11 -13: