Tesseract: Slower Performance in Latest Tesseract

Created on 17 Oct 2017  ·  14Comments  ·  Source: tesseract-ocr/tesseract

Hi,

i have installed Tesseract: 4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 and its working fine with the detection and all, but i have noticed that the performance is slower than before (comparing with 5 months ago tesseract, and leptonica 1.74.1).

in the past the time was around 4 or 5 seconds but lately its almost the double, that command that im using is the normal tesseract detection command which is: *tesseract image results -l lang--tessdata-dir ./tessdata --oem 1 * , so am i missing something or is there some sort of a parameter that i should add after the updates to the tesseract or leptonica? or any other way to enhance the performance speed? (for both single thread case or multi thread case)

Thank you

performance

Most helpful comment

or any other way to enhance the performance speed? (for both single thread case or multi thread case)

If you use multi-threading try disabling OpenMP.
OMP_THREAD_LIMIT=1 tesseract in.png out --oem 1

.

All 14 comments

Slower Performance in Latest Tesseract

It's not clear if you're comparing a newer 4.00 to older 4.00 or 4.00 to 3.05.

Also, do you use the newest traineddata for 4.0?

Use traineddata files from tessdata_fast repository for speed in
recognition.

ShreeDevi


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Oct 17, 2017 at 9:00 PM, Amit D. notifications@github.com wrote:

Also, do you use the newest traineddata?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/tesseract-ocr/tesseract/issues/1171#issuecomment-337266737,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE2_o_BlcoI0mWe0dClpPN5puFlZejciks5stMgKgaJpZM4P8Rw1
.

or any other way to enhance the performance speed? (for both single thread case or multi thread case)

If you use multi-threading try disabling OpenMP.
OMP_THREAD_LIMIT=1 tesseract in.png out --oem 1

.

@amitdo actually im comparing the latest (4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 ) with the older version (4.00.00dev-549-g2b854e3 with leptonica 1.74.1)

@Shreeshrii "tessdata_fast" is a news to me, i'm already using the official traineddata, but i dont know about this one, can you please give me the link to it?, also i already created a tuned LSTM, can i also combine it with the new tessdata_fast as well?

Thank you both

The latest traineddata files are at https://github.com/tesseract-ocr/tessdata_best and https://github.com/tesseract-ocr/tessdata_fast. But if you want to compare the performance of an older Tesseract 4.00 with the latest version, you will have to use the same traineddata for both, usually from https://github.com/tesseract-ocr/tessdata. I'd disable multithreading for the test (set environment variable OMP_THREAD_LIMIT=1).

Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line

If you have the data for your finetuning, you can create the 'faster' integer type of traineddata by using
convert_to_int with stop_training.

@Shreeshrii so i assume that if i fine tuned an LSTM file (made by older version tools) it won't combine with the new traineddate? (for example a traineddata from: https://github.com/tesseract-ocr/tessdata_best)
also you mean by "data for your fine tuning" as the following?
1
and the steps in the link that you have shared are to enhance accuracy, detection speed or both?

@stweil the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ?

Thanks for the answers

the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ?

tessdata_fast is faster than tessdata_best, yes.
tessdata_best is generally better, but not always. I also noticed cases where tessdata_fast is better. And there are even cases where the old Tesseract gives the best recognition rates of all current tessdata.

For training, you have to start with tessdata_best models. You can create
your traineddata in the integer faster format.

You will have to test with your language and data.

On 18-Oct-2017 7:28 PM, "ibr123" notifications@github.com wrote:

if i wanted to fine tune using the tool


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tesseract-ocr/tesseract/issues/1171#issuecomment-337600169,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE2_oyx5CIz_10_spwJn3BbM-AvfinFUks5stgQXgaJpZM4P8Rw1
.

if i wanted to fine tune using the tool "lstmtraining" while i'm using the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which are generated by tesstrain.sh)file that are created by older Tesseract version, such as (4.00.00dev-549-g2b854e3) ?
meaning are lstmf files compatible between tesseract versions?

You can give it a try. There have been significant changes, that break
compatibility between commits since this is development code in alpha stage.
If you get an error, you will have to recreate the lstmf files.

On 18-Oct-2017 7:34 PM, "ibr123" notifications@github.com wrote:

if i wanted to fine tune using the tool "lstmtraining" while i'm using the
latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which
are generated by tesstrain.sh)file that are created by older Tesseract
version, such as (4.00.00dev-549-g2b854e3) ?
meaning are lstmf files compatible between tesseract versions?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tesseract-ocr/tesseract/issues/1171#issuecomment-337601903,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE2_o6UXEXXc9MEveLBjrtgdNMWPYbLNks5stgVbgaJpZM4P8Rw1
.

I do not know about the specific commit numbers you refer to. You may want
to check the github history of commits.

On 18-Oct-2017 7:39 PM, "ShreeDevi Kumar" shreeshrii@gmail.com wrote:

You can give it a try. There have been significant changes, that break
compatibility between commits since this is development code in alpha stage.
If you get an error, you will have to recreate the lstmf files.

On 18-Oct-2017 7:34 PM, "ibr123" notifications@github.com wrote:

if i wanted to fine tune using the tool "lstmtraining" while i'm using
the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files
(which are generated by tesstrain.sh)file that are created by older
Tesseract version, such as (4.00.00dev-549-g2b854e3) ?
meaning are lstmf files compatible between tesseract versions?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tesseract-ocr/tesseract/issues/1171#issuecomment-337601903,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE2_o6UXEXXc9MEveLBjrtgdNMWPYbLNks5stgVbgaJpZM4P8Rw1
.

thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

duzenko picture duzenko  ·  3Comments

samiles picture samiles  ·  4Comments

johnthagen picture johnthagen  ·  6Comments

clarkk picture clarkk  ·  6Comments

clarkk picture clarkk  ·  7Comments