Tesseract: Port tesstrain.sh to Python or C++

Created on 22 Apr 2018  路  6Comments  路  Source: tesseract-ocr/tesseract

Since training tools are available on windows, it will be nice if we have training scripts ported too.
There are two ways:

  1. Python script - which is probably the best option, and
  2. C++ program/script - my personal preference because of easier c++ dependencies integration with CPPAN - initial work (WIP) can be found in https://github.com/egorpugin/tesseract/commits/cpp_tesstrain

Contributions are appreciated.

feature request training

Most helpful comment

I like the Python option :)

All 6 comments

I like the Python option :)

me too, but it increase dependency for user that does not need python. Or we try to freezy it ;-)
Should we create extra repository for "test" like this (e.g. tools- experimental)?

Should we create extra repository for "test" like this (e.g. tools- experimental)?

I think this is not needed. We put more scripts in training dir as they are ready.
But still we can also use main repo for development of such tools to provide the best collaborative access.
I did not commit my changes to main repo just because I wanted to notify you first.
Porting of that script is pretty straightforward - prepare some data, run commands.

If we use separate repo - we could issue potential problems with integrating tess code - just extra burden.

vote python and yaml for conf file. C++ and shell is pain in the ass

Python is also used for some related software, for example by the hocr-tools. I think it would be a good choice.

I think this issue is 'solved' so it should be closed now.

Was this page helpful?
0 / 5 - 0 ratings