Hello,
I am trying to train Tesseract but unfortunately, my training is failing.
I was trying to train Tesseract but I encountered a problem when I was about to shapecluster the training data.
Here is what I got :
Reading ./Data/eng.fonf.exp365.tr ...
Bad properties for index 3, char I: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 4, char c: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 5, char a: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 6, char n: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 7, char t: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 8, char d: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 9, char o: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 10, char i: 0,255 0,255 0,0 0,0 0,0
Bad properties for index 11, char .: 0,255 0,255 0,0 0,0 0,0
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4 5 6 7 8
Stopped with 0 merged, min dist 0.280000
Master shape_table:Number of shapes = 9 max unichars = 1 number with multiple unichars = 0
Then I searched online and found that shapeclustering is not good for anything but Indic language so I skipped it and tried mftraining and here is what I got :
Warning: No shape table file present: shapetable
Reading ./Data/eng.fonf.exp365.tr ...
Flat shape table summary: Number of shapes = 9 max unichars = 1 number with multiple unichars = 0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!
I am trying to understand why or how and now that I have researched it for 1 week I think its the unicharset_extractor.
I went into the code of Tesseract and place some _print_ just after and just before the _if_ checking if wctype is in the system and then went ahead and compiled it. When I executed it, all the prints before were there, but all the ones after were not there... So I think my system does not support wctype, but I am working on Ubuntu 15.10 on virtualbox so I don't understand because it says on the wiki that wctype is not supported is not supported only on older systems..... I made someone else try on another machine, but he had the same error.
@amitdo no worries. :+1:
See issue #318
@amitdo Okay, I read it! Thank you very much, I will test the solution and come back to you as soon as I can and close this issue if it solved it or is the same thing as the other issue. Just going to keep it open for now, in case of emergency! :P
Warning: No shape table file present: shapetable
...
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
These specific warnings are confusing, but everyone gets them, so they should be ignored.
Okay, it did not bug... I think this time....
Here is what I did!
My data seams nice now, not sure though....
Maybe its super bad!!! D:
Anyway, here is what I did and the result just in case someone wants to check it out one day... O_O
$ set_unicharset_properties --F font_properties --script_dir=Latin.unicharset -U unicharset -O output_unicharsetLoaded
Loaded unicharset of size 12 from file unicharset
Setting unichar properties
Other case C of c is not in unicharset
Other case A of a is not in unicharset
Other case N of n is not in unicharset
Other case T of t is not in unicharset
Other case D of d is not in unicharset
Other case O of o is not in unicharset
Warning: properties incomplete for index 3 = I
Warning: properties incomplete for index 4 = c
Warning: properties incomplete for index 5 = a
Warning: properties incomplete for index 6 = n
Warning: properties incomplete for index 7 = t
Warning: properties incomplete for index 8 = d
Warning: properties incomplete for index 9 = o
Warning: properties incomplete for index 10 = i
Warning: properties incomplete for index 11 = .
Writing unicharset to file output_unicharsetLoad
$ mftraining -F font_properties -U output_unicharsetLoaded -O eng.unicharset
./Data/eng.fonf.exp365.tr
Warning: No shape table file present: shapetable
Reading ./Data/eng.fonf.exp365.tr ...
Flat shape table summary: Number of shapes = 9 max unichars = 1 number with multiple unichars = 0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
Done!
$ cntraining ./Data/eng.fonf.exp365.tr
Reading ./Data/eng.fonf.exp365.tr ...
Clustering ...
Writing normproto ...
So, you can now change the text to 'can do it!'... :)
@amitdo YES!! IT WORKED! MONTHS OF STUDYING AND WORKING HARD, AND NOW ITS WORKING! IF I COULD KISS YOU, I WOULD!!! THANK YOU VERY VERY MUCH!!!
@amitdo lol, sorry, it was not only this problem. But I'm a student in an internship and this was very hard because I had never done c++ or any knowledge of AI or deep learning. Now this was the last step before I can get the result of all the hard work I did to train Tesseract and get results.
Good luck with the internship!
@amitdo & @Alipharo
i am using Tesseract 3.05.1 on windows 64 bit system. i want to train a new font.i tried but i results are bad.
can you send me complete set of command sequence to teach a font using a bmp image?
Thanks in advance.
Most helpful comment
@amitdo YES!! IT WORKED! MONTHS OF STUDYING AND WORKING HARD, AND NOW ITS WORKING! IF I COULD KISS YOU, I WOULD!!! THANK YOU VERY VERY MUCH!!!