Tesseract: Document snum language on Wiki

Created on 29 Apr 2019  路  6Comments  路  Source: tesseract-ocr/tesseract

The wiki currently lists supported languages but it does not include an entry for snum.

Could that be added and documented? I am having difficulty finding out what snum stands for.

Most helpful comment

Thanks, just knowing that it wasn't a real language was enough to help me figure out what was going on. I was able to find out that snum = Serial Number. Looks like a third party serial number identification.

https://github.com/varenc/homebrew-core/blob/251f7b8d16ee286d80de02e19882a350439a59d0/Formula/tesseract.rb#L39

https://memex.jpl.nasa.gov/GHCI16.pdf

  resource "snum" do
    url "https://github.com/USCDataScience/counterfeit-electronics-tesseract/raw/319a6eeacff181dad5c02f3e7a3aff804eaadeca/Training%20Tesseract/snum.traineddata"
    sha256 "36f772980ff17c66a767f584a0d80bf2302a1afa585c01a226c1863afcea1392"
  end

Hopefully someone else might find this issue in the future and it help them.

All 6 comments

Where did you find 'snum`?

Tesseract installed from Homebrew on macOS 10.14.4.

There is reference to it here: https://github.com/Homebrew/homebrew-core/pull/36786

$ tesseract --version
tesseract 4.0.0
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found SSE

$ tesseract --list-langs
List of available languages (3):
eng
osd
snum

You should contact its packager for explanation. This issues tracker is only for project/files provided by tesseract team (e.g. you need to refer to https://github.com/tesseract-ocr)

Thanks, just knowing that it wasn't a real language was enough to help me figure out what was going on. I was able to find out that snum = Serial Number. Looks like a third party serial number identification.

https://github.com/varenc/homebrew-core/blob/251f7b8d16ee286d80de02e19882a350439a59d0/Formula/tesseract.rb#L39

https://memex.jpl.nasa.gov/GHCI16.pdf

  resource "snum" do
    url "https://github.com/USCDataScience/counterfeit-electronics-tesseract/raw/319a6eeacff181dad5c02f3e7a3aff804eaadeca/Training%20Tesseract/snum.traineddata"
    sha256 "36f772980ff17c66a767f584a0d80bf2302a1afa585c01a226c1863afcea1392"
  end

Hopefully someone else might find this issue in the future and it help them.

@johnthagen Thanks for posting the info.

snum seems to be trained for legacy/base tesseract so it might require to be used with --oem 0 with tesseract4.

Thanks for the info @Shreeshrii!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

duzenko picture duzenko  路  3Comments

LaurentBerger picture LaurentBerger  路  3Comments

reubano picture reubano  路  6Comments

clarkk picture clarkk  路  7Comments

dthrock picture dthrock  路  5Comments