I git cloned the tesseract-ocr repositories on ubuntu 14.04 with the following structure
tesseract-ocr
tesseract-ocr/tesseract
tesseract-ocr/tessdata
tesseract-ocr/langdata
The build process (autogen, make, sudo make install, sudo ldconfig) put the tessdata files with configs and tessconfigs subdirectories and pdf.ttf in /usr/local/share/tessdata
This puts tessdata related files in two locations:
tesseract-ocr/tessdata
and
/usr/local/share/tessdata
(in addition to the source in tesseract-ocr/tesseract/tessdata)
As a regular user I cannot copy the tesddata files to /usr/local/share/tessdata
$ cp ./tessdata/san.traineddata /usr/local/share/tessdata
cp: cannot create regular file ‘/usr/local/share/tessdata/san.traineddata’: Permission denied
$ export TESSDATA_PREFIX=/home/shree/tesseract-ocr
$ echo $TESSDATA_PREFIX
/home/shree/tesseract-ocr
If I use the above tessdata prefix, then tesseract does not find the config files ..
$ tesseract testing/phototest.jpg testing/phototest-jpg -l eng -psm 3
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
$ tesseract testing/phototest.jpg testing/phototest-jpg -l eng -psm 3 pdf
read_params_file: Can't open pdf
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
$ tesseract testing/phototest.jpg testing/phototest-jpg -l eng -psm 3 tsv
read_params_file: Can't open tsv
$ export TESSDATA_PREFIX=/usr/local/share/tessdata
$ echo $TESSDATA_PREFIX
/usr/local/share/tessdata
If I use the above then tesseract does not find the traineddata files even when tessdata-dir is pointing to the correct location
$ tesseract --tessdata-dir=../ testing/phototest.jpg testing/phototest-jpg -l eng -psm 3
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
$ tesseract --tessdata-dir=/home/shree/tesseract-ocr testing/phototest.jpg testing/phototest-jpg -l eng -psm 3
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
I can get around it by copying configs, tessconfigs and pdf.ttf to
/home/shree/tesseract-ocr/tessdata directory.
But this should not be required. What am I missing in the process?
Git repository is for developers who know how to deal with their system.
Regular users should use packaging tools of their system.
Before you run those commands, you should put the *.traineddata files in tessdata
subdirectory in the source tree.
The *.traineddata files must be places in the tessdata directory together with the config files.
Thanks, @amitdo
Need at a minimum to copy
osd.traineddata
eng.*
As a regular user I cannot copy the tesddata files to /usr/local/share/tessdata
$ cp ./tessdata/san.traineddata /usr/local/share/tessdata
cp: cannot create regular file ‘/usr/local/share/tessdata/san.traineddata’: Permission denied
Well, /usr is a system directory so you can't write to it without admin rights.
To run a command as a root user use sudo.
Be careful when running commands with sudo, you might damage your system if you type the wrong command!
https://help.ubuntu.com/community/RootSudo
http://manpages.ubuntu.com/manpages/karmic/man8/sudo.8.html
https://www.raspberrypi.org/documentation/usage/terminal/
Most helpful comment
Git repository is for developers who know how to deal with their system.
Regular users should use packaging tools of their system.