Tesseract: Are there more PSM modes than are listed in the help/wiki - 11 and 12?

Created on 27 Sep 2016 · 4Comments · Source: tesseract-ocr/tesseract

Hi,

The command line help output shows 11 PSM modes (0 through 10).

  pagesegmode values are:
  0 = Orientation and script detection (OSD) only.
  1 = Automatic page segmentation with OSD.
  2 = Automatic page segmentation, but no OSD, or OCR
  3 = Fully automatic page segmentation, but no OSD. (Default)
  4 = Assume a single column of text of variable sizes.
  5 = Assume a single uniform block of vertically aligned text.
  6 = Assume a single uniform block of text.
  7 = Treat the image as a single text line.
  8 = Treat the image as a single word.
  9 = Treat the image as a single word in a circle.
  10 = Treat the image as a single character.

I was trying each one and getting mixed results. However, I accidentally ran 'psm -11' and I suddenly got perfect accuracy - way way better than any other PSM mode, and much better than the default. The same for PSM 12 too, perfect accuracy - then PSM 13 gives nothing.

The image is just about 10 words over 2 lines, spread about the page. All the other segmentation modes and default garble the text, but PSM 11/12 worked great, splitting text perfectly.

Is it correct that there's a PSM 11 and 12 mode? What do they do, why do they give such good accuracy?! And should they be in the help/Wiki?

Thanks!

Source

samiles

👍8 😄2 🎉1

Most helpful comment

for

●✚  tesseract --version 
tesseract 4.1.0
 leptonica-1.78.0
  libgif 5.2.1 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found SSE

support psm=11 -13:

 tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

crifan on 3 Dec 2019

👍3

All 4 comments

Is it correct that there's a PSM 11 and 12 mode?

Yes. and PSM 13 too.
You revealed our secret! :)
https://github.com/tesseract-ocr/tesseract/blob/8d6dbb133b41/api/tesseractmain.cpp#L115

I'm sorry, I do not have good answers to your other questions.

amitdo on 27 Sep 2016

With Tesseract 4.0 PSM 11, 12, and 13 appear in the help message. psm 13 is used with the new LSTM engine to OCR a single textline image.

amitdo on 22 Nov 2016

👍2

master(4.00) & 3.05 repository produce help message for psm 11-13

zdenop on 7 Dec 2016

for

●✚  tesseract --version 
tesseract 4.1.0
 leptonica-1.78.0
  libgif 5.2.1 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found SSE

support psm=11 -13:

 tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

crifan on 3 Dec 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings