Tesseract: Segmentation fault when using integer models for LSTM training

Created on 12 May 2018  路  11Comments  路  Source: tesseract-ocr/tesseract

I am running the tutorial on training lstm by fine tuning it following the link https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact

The training works OK when I follow the tutorial instruction and fine tune from .lstm extracted from tessdata/best/eng.traineddata. However the training failed when I try to extract .lstm from tessdata/eng.traineddata

Environment

  • Tesseract Version: tesseract 4.0.0-beta.1-232-g45a6

  • Platform:

The code I am trying to execute:
training/lstmtraining --model_output ~/tesstutorial/impact_from_full/impact --continue_from ~/tesstutorial/impact_from_full/eng.lstm --traineddata tessdata/eng.traineddata --train_listfile ~/tesstutorial/engeval/eng.training_files.txt --max_iterations 400

The eng.lstm is extracted by "training/combine_tessdata -e tessdata/eng.traineddata ~/tesstutorial/impact_from_full/eng.lstm"

The code will work if I use the tessdata/best/eng.traineddata

The error that I got:
Loaded file /home/dlai/tesstutorial/impact_from_full/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/dlai/tesstutorial/impact_from_full/eng.lstm
Loaded 72/72 pages (1-72) of document /home/dlai/tesstutorial/engeval/eng.FreeSans.exp0.lstmf
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Segmentation fault (core dumped)

Thanks very much

Dihui

bug

Most helpful comment

However the training failed when I try to extract .lstm from tessdata/eng.traineddata

Both tessdata and tessdata_fast have integer models which cannot be used for lstmtraining..
Only the float models in tessdata_best can be used for it.

Of course, it should give an appropriate error message and not crash.

@stweil Is it possible to add an error msg for 4.0.0?

All 11 comments

However the training failed when I try to extract .lstm from tessdata/eng.traineddata

Both tessdata and tessdata_fast have integer models which cannot be used for lstmtraining..
Only the float models in tessdata_best can be used for it.

Of course, it should give an appropriate error message and not crash.

@stweil Is it possible to add an error msg for 4.0.0?

Thanks for your response, Shreeshrii,

I did read some comments on the integerize in your documentations and should have guessed this.

Still, is there a way to integerize the fine tuned model from the tessdata_best ? The speed of the model on tessdata_best is too slow for our application.

Dihui

The best files can be converted to integer by the following command

Usage for compacting LSTM component to int:
  combine_tessdata -c traineddata_file

The tessdata repo has the integer version of best models plus the old legacy model also.

@DihuiLai Please change issue title to

Segmentation fault when using integer models for LSTM training

Segmentation fault when using integer models for LSTM trining

s/trining/training/

@stweil Is it possible to add an error msg for 4.0.0?

Yes, I think so. I added the issue to the planning list.

@zdenop, please add the "bug" label to this issue.

@stweil Thanks for fixing the typo :-) Good to know that it can be fixed for 4.0.0.

Changed @Shreeshrii

The problem is solved and I am closing the issue

AFAIK this issue was not solved.

It was only clarified that it was caused by training based on an integer model which is not allowed.
So that's an error which can be easily avoided. Of course the error handling needs to be improved here. @zdenop or @DihuiLai, please reopen this issue.

Although this is a bug, I think it can be fixed after 4.0.0, as training won't be done by most users of Tesseract.

@stweil : can you send PR, so we can fix this for 4.0 release?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

eliyaz-kl picture eliyaz-kl  路  4Comments

garry-ut99 picture garry-ut99  路  5Comments

clarkk picture clarkk  路  7Comments

johnthagen picture johnthagen  路  6Comments

duzenko picture duzenko  路  3Comments