If I were to run tesseract page356.png page356 -l eng+osd+ell pdf
It would only recognize the English characters, but produce no errors about other language recognition
If I run tesseract page356.png page356greek -l ell
It recognizes the Greek fine, but now there is no English
If I run tesseract page356.png greekandenglish356 -l ell+eng+osd pdf I get this pdf
greekandenglish356.pdf
only recognizes English
I ran apt-get install tesseract-ocr-all
and I'm experiencing this issue on multiple linux distros
Here is a sample image

AFAIK multiple language support is only available for version 4+, not necessarily available in the default repo of the distro. Version 3 only extracts one language at a time.
Multiple languages are supported on v3.x
Share a sample image please.
Added a sample image and pdf. I have tried 3x on opensuse and 4x on ubuntu based distros
It gives very poor results, but Tesseract 4 is producing a mix of Greek & English. This is also true in PDF output. Problem is not reproducing for me.
$ tesseract -l ell+eng example.png - -
[...]
6- 5- 4- 3- 2- Îą- Îē- N- Entry Name
[...]
And I even see some Greek in your attached PDF. There is a Îē in there.
Also, try with the script trained data
https://github.com/tesseract-ocr/tessdata_best/blob/master/script/Greek.traineddata
It should have both Greek and English.
Could be issue be closed?
There is no reaction from original reporter for several months....
zdenop, 30/09/2018 17:19:
There is no reaction from original reporter for several months....
What information is needed? I have the same experience with tesseract
3.05.02, but I don't understand from the responses above whether it's
considered expected result until version 4.
Reported claims that there are only English characters. jbreiden but see there greek alphabets...
Shreeshrii provided several suggestion for testing... No reply...
If reporter does not care why we should? We are not paid for this. If reporter does not value our time and cooperate than I will eliminate such report (if it does not have some value for this project)
Same problem reported in the forum today, but for Thai and English.
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/YLvnrS-01kI/x0PUNGsGBAAJ

tesseract 4.0.0-beta.4-179-g57a6
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
As mentioned by the OP in the forum if the input image have both language(eng+thai) in the same line it will read only in 1 language but when having single language in that line it will read in correct language
script/Thai.traineddata seems to give correct result.
ubuntu@tesseract-ocr:~/TEST$ bash ./en_th.sh
***** ./en_th.jpg LANG tha+eng TESSDATA tessdata OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
1āđāļĨ!10 āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg LANG tha+eng TESSDATA tessdata_best OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
1āđāļĨ!1āđ0 āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg LANG tha+eng TESSDATA tessdata_fast OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
1āđāļĨ!10 āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg LANG eng+tha TESSDATA tessdata OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
1āđāļĨ!10 āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg LANG eng+tha TESSDATA tessdata_best OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello aaa
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg LANG eng+tha TESSDATA tessdata_fast OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello ayaa
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg SCRIPT Thai TESSDATA tessdata_best OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
***** ./en_th.jpg SCRIPT Thai TESSDATA tessdata_fast OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļīāļāđāļē
This is a test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
DONE
Last(?) change from Ray regarding multi-language mode seems to be
https://github.com/tesseract-ocr/tesseract/commit/b453f74e0194f2cf08e9251b1846a0132657c4f8
Any updates on this issue. I am using Tesseract v4 to detect text "BÅuf Stroganoff" using German and French traindata. Text detection doesn't work when I use traindata for multiple languages together.
You can try Latin.traineddata from best/fast.
@amitdo Tried that. It doesn't work as well. The command I used :
tesseract <input image> <output file> -l lat
-l lat is Latin language.
Use script/Latin which is Latin script and has been trained using all
languages using that script.
On Sat, 6 Oct 2018, 19:02 Nawab Hussain, notifications@github.com wrote:
@amitdo https://github.com/amitdo Tried that. It doesn't work as well.
The command I used :
tesseractâ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/tesseract-ocr/tesseract/issues/1579#issuecomment-427612285,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE2_o1DBTo47XF2soIHl-5S6W28SQdmjks5uiTZpgaJpZM4UBqgR
.
@Shreeshrii Even the predictions with tesseract <input image> <output file> -l script\Latin are disappointing. Wrongly predicts the original text to "_Bouf Stroganoff_". Am I missing something? Is there no way to make it work for multiple languages where it does not predict only for the first mentioned language, in case of multiple languages. I previously also tried several combinations like tesseract <input image> <output file> -l deu+fra where it would not predict for French properly. However, the same works properly if the order of mentioned languages are reversed i.e. tesseract <input image> <output file> -l fra+deu.
Please provide a test image. We need to test whether this is a regression .
_This image shows the character set I am targetting_
_This image shows the text I was experimenting with as mentioned in the previous comments_
Any input on these would be highly appreciated.
https://github.com/tesseract-ocr/langdata/issues/83#issuecomment-375027879
theraysmith commented on Mar 21
I did have an idea for a better multi-language implementation that would cleanly use models from multiple languages at once, but that depends on getting rid of the old code, and moving the multi-language functionality into the beam search. Until the old code is gone, that would be very messy. âĶ
I can replicate this.
it seems to me that Å is only trained for French. However, it is not being recognized when French is listed second.
./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata *
Warning: Invalid resolution 0 dpi. Using 70 instead.
**BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata *
Warning: Invalid resolution 0 dpi. Using 70 instead.
**BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Bouf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata *
Warning: Invalid resolution 0 dpi. Using 70 instead.
**BÅuf Stroganoff
Results with all three repos, tessdata, tessdata_fast and tessdata_best
./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata_best ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata_fast ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata_best ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata_fast ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata_best ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata_fast ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata_best ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata_fast ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Bouf Stroganoff
./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata_best ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
Bouf Stroganoff
./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata_fast ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata_best ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata_fast ***
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
You already summarized the behavior when lstm is activated.
As mentioned by the OP in the forum if the input image have both language(eng+thai) in the same line it will read only in 1 language but when having single language in that line it will read in correct language
'in 1 language' -> 'in the first given language'
Also, try with the script trained data
https://github.com/tesseract-ocr/tessdata_best/blob/master/script/Greek.traineddata
It should have both Greek and English.
@Shreeshrii : My OCR even became faster by using Devanagari.traineddata, is there any reason for this to happen, also hin+eng was converting a lot of the hindi text to english
@sirius0503 Devanagari was trained with hin+san+mar+nep+eng so it is better at recognition, plus only one traineddata file is used rather than two diff ones.
@Shreeshrii : Surprisingly, it is faster than hin.traineddata, any ideas why this maybe so?
That's due to a difference in the net-spec between the two, which makes Devanagari's network smaller than hin's network.
In addition, like Shree said, in the case of hin+eng you add to hin another network, eng. Devanagari has just one network for both languages.
That's due to a difference in the net-spec between the two, which makes Devanagari's network smaller than hin's network.
@amitdo : When I am using only -l Devanagari instead of -l hin , I get better speed ( not even hin + eng), Can you explain more about the net-spec difference
https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs
https://github.com/tesseract-ocr/tesseract/issues/1404#issuecomment-374680492
Lang | Repo | Height | Lfys | Lfx,Lrx | Lrx
:----------: | :----------: | :------------: | :---------: | :------------: | :----------
Devanagari | best | 48 | 64 | 64 | 512
hin | best | 48 | 64 | 96 | 512
Devanagari | fast | 36 | 48 | 96 | 192
hin | fast | 48 | 64 | 96 | 384
I am struggling to get it to talk to laser. It states cannot find path? Which part am I missing, please?
Shree, can you retest the eng+tha / tha+reng, best/fast, with code from the master branch?
.
Using the same image used for test in https://github.com/tesseract-ocr/tesseract/issues/1579#issuecomment-426351989 and master code built with disable-legacy:
ubuntu@tesseract-ocr:~/TEST$ bash en_th.sh
tesseract 5.0.0-alpha-473-g6d171
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.3.0
***** ./en_th.jpg LANG tha+eng TESSDATA tessdata OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is āļĨ test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
1.66user 0.02system 0:01.70elapsed 99%CPU (0avgtext+0avgdata 81408maxresident)k
0inputs+0outputs (0major+1568minor)pagefaults 0swaps
***** ./en_th.jpg LANG tha+eng TESSDATA tessdata_best OEM 1 PSM 3 ****
Warning: Parameter not found: segsearch_max_futile_classifications
Warning: Parameter not found: language_model_ngram_on
Warning: Parameter not found: language_model_ngram_space_delimited_language
Warning: Parameter not found: chop_enable
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is āļĨ test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
2.00user 0.03system 0:02.06elapsed 98%CPU (0avgtext+0avgdata 64768maxresident)k
0inputs+0outputs (0major+1936minor)pagefaults 0swaps
***** ./en_th.jpg LANG tha+eng TESSDATA tessdata_fast OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is āļĨ test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
1.23user 0.00system 0:01.24elapsed 99%CPU (0avgtext+0avgdata 25408maxresident)k
0inputs+0outputs (0major+666minor)pagefaults 0swaps
***** ./en_th.jpg LANG eng+tha TESSDATA tessdata OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is āļĨ test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
1.64user 0.02system 0:01.67elapsed 99%CPU (0avgtext+0avgdata 78144maxresident)k
0inputs+0outputs (0major+1527minor)pagefaults 0swaps
***** ./en_th.jpg LANG eng+tha TESSDATA tessdata_best OEM 1 PSM 3 ****
Warning: Parameter not found: segsearch_max_futile_classifications
Warning: Parameter not found: language_model_ngram_on
Warning: Parameter not found: language_model_ngram_space_delimited_language
Warning: Parameter not found: chop_enable
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is āļĨ test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
1.95user 0.02system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 59136maxresident)k
0inputs+0outputs (0major+1667minor)pagefaults 0swaps
***** ./en_th.jpg LANG eng+tha TESSDATA tessdata_fast OEM 1 PSM 3 ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 376
Hello āļŠāļ§āļąāļŠāļāļĩāļāđāļē
This is āļĨ test.
āļāļĩāđāļāļ·āļāļāļēāļĢāļāļāļŠāļāļ
1.30user 0.02system 0:01.33elapsed 99%CPU (0avgtext+0avgdata 25472maxresident)k
0inputs+0outputs (0major+625minor)pagefaults 0swaps
DONE
ubuntu@tesseract-ocr:~/TEST$
Hello is now being recognized correctly.
a in This is a test is now being recognized as a Thai character.
Interesting, thanks.
Results for test case in https://github.com/tesseract-ocr/tesseract/issues/1579#issuecomment-428432287 are also changed.
ubuntu@tesseract-ocr:~/TEST$ bash euro.sh
***** ./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata_best ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG deu+fra TESSDATA tessdata_fast ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata_best ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra+deu TESSDATA tessdata_fast ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata_best ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG eng+fra TESSDATA tessdata_fast ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata_best ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra+eng TESSDATA tessdata_fast ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Boeuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Bouf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata_best ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
Bouf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG script/Latin TESSDATA tessdata_fast ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata_best ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
***** ./euro.png OEM 1 PSM 6 LANG fra TESSDATA tessdata_fast ****
Warning: Invalid resolution 0 dpi. Using 70 instead.
BÅuf Stroganoff
DONE
https://raw.githubusercontent.com/tesseract-ocr/langdata_lstm/master/fra/fra.wordlist
has these words:
Boeuf, boeuf, BOEUF, bÅuf, but not BÅuf.
https://raw.githubusercontent.com/tesseract-ocr/langdata_lstm/master/eng/eng.wordlist
has boeuf, but not Boeuf.
Most helpful comment
Also, try with the script trained data
https://github.com/tesseract-ocr/tessdata_best/blob/master/script/Greek.traineddata
It should have both Greek and English.