Tesseract: unclear "leptonica not found" message

Created on 9 Feb 2016  路  18Comments  路  Source: tesseract-ocr/tesseract

$ sudo apt-get install liblept4
Reading package lists... Done
Building dependency tree       
Reading state information... Done
liblept4 is already the newest version.
liblept4 set to manually installed.
The following packages were automatically installed and are no longer required:
  linux-image-4.2.0-23-generic linux-image-extra-4.2.0-23-generic
  linux-signed-image-4.2.0-23-generic
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
teo@xxx1:~/temp/tesseract$ ./configure 
checking for g++... g++
...
checking for mbstate_t... yes
checking for leptonica... configure: error: leptonica not found

OR the error message needs to be more precise about how to get leptonica

Most helpful comment

I know this is an old closed issue, but I found a solution to this issue and will share my solution, since it was not obvious.

The short summary: I tried building latest tesseract (and leptonica) from source using cmake for both libs. The tesseract cmake step could not find leptonica. My solution was to set PKG_CONFIG_PATH before cmake. My exact steps to address the problem are below.

I will assume you have installed other pre-requisites (like giflib) as specified by both libraries.

After pre-requisites, I installed Leptonica to a custom location:

export INSTALL_PREFIX=~/.sources
git clone https://github.com/DanBloomberg/leptonica.git
mkdir leptonica/build ; cd leptonica/build
cmake -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DBUILD_PROG=1 ..
make
make install

Then, I installed Tesseract to the same custom location. Note the use of PKG_CONFIG_PATH.

export INSTALL_PREFIX=~/.sources
git clone https://github.com/tesseract-ocr/tesseract.git
mkdir tesseract/build ; cd tesseract/build
PKG_CONFIG_PATH=$INSTALL_PREFIX/lib/pkgconfig cmake -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX/lib ..
make
make install

Finally, I use tesseract by exporting these variables in my bash profile. The only one you probably need is PATH...

export LD_LIBRARY_PATH=$HOME/.sources/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=$HOME/.sources/lib:$HOME/.sources/lib64:$LIBRARY_PATH
export INSTALL_PREFIX=$HOME/.sources
export PATH=$HOME/.sources/bin:$PATH

And an example run:

git clone https://github.com/tesseract-ocr/tessdata.git
tesseract --tessdata-dir ./tessdata -l heb ./tesseract/testing/hebrew.png out
cat out.txt 

A single file containing all above code is here: https://gist.github.com/adgaudio/f772004444dc808e900c057d45f8b52e

FYI: If you prefer to use the ./autogen.sh approach, you should be able to replace ./configure with PKG_CONFIG_PATH=$INSTALL_PREFIX/lib/pkgconfig ./configure --prefix=$INSTALL_PREFIX --with-extra-libraries=$INSTALL_PREFIX/lib. (I just tried it and it worked).

I hope you find this helpful :)

All 18 comments

PEBCAK

You need the development package, which seems to be libleptonica-dev

Yep, but the message says "leptonica not found", it should say "libleptonica-dev not found".
How am I supposed to guess that "leptonica" means libleptonica-dev?

PEBCAK, but whose chair and keyboard?

@teo1978: this is standard autotools error message. You will get it for all libraries. ./configure is tool not a teacher. Name of needed package could be different base on distribution.

If you are not familiar with compiling software on your operation system it is not tesseract problem.

Now on another host:

configure: error: leptonica library missing

# apt-get install libleptonica-dev 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libleptonica-dev is already the newest version.

Also,

If you are not familiar with compiling software on your operation system it is not tesseract problem.

It is a documentation problem then. I am not familiar with compiling software indeed, but every time I have done it with other software, I usually follow the steps in the documentation and they work out of the box, and the requirements are specified clearly.

Also, I wouldn't have to compile it at all if there was a decent Debian package not so riduculously obsolete that it doesn't even recognize the -v option.

Not even with this:

$ export LIBLEPT_HEADERSDIR=/usr/include
$ ./configure --with-extra-libraries=/usr/lib

which are the paths where the headers and the library respectively are.

I'd prefer a less inflammatory title for this bug report, please. Also, may I ask why you aren't installing Tesseract via apt-get? Debian Stable ships with version 3.03, see https://packages.qa.debian.org/t/tesseract.html

$ sudo apt-get install tesseract-ocr

$ tesseract -v
tesseract 3.03
 leptonica-1.70
  libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : webp 0.4.0

Also, may I ask why you aren't installing Tesseract via apt-get?

Because that's how it had been installed in the first place, and tesseract -v gave an error not recognizing the -v option. This is on Debian 6

@teo1978: this is standard autotools error message. [...] Name of needed package could be different base on distribution.

That's no excuse. It should still specify that it needs the headers, i.e. _-dev_ package. There's no way the person installing should know whether the binary package or the headers package is required, regardless of the actual names.

Besides, I don't know what "autotools error message" means, but the error message is hard-coded in the configure file, so it could be easily edited.

Also note that the error message _"leptonica library missing"_ issued when the pixCreate check fails (whatever that means) is very confusing too. "Leptonica library missing" seems to indicate that the leptonica library is missing. Instead, you get this message when the leptonica library is actually found and something, which is not clear what it is, is missing. That error message is clearly useful to those who wrote and/or maintain tesseract, but is completely obscure to anyone else.

This issue should be reopened

This issue should not be re-opened, and let me explain why. Tesseract uses a venerable and very common build system called autotools. (And I'll admit, autotools has a well deserved reputation for being complicated.) This build system is designed to work with many different operating systems, not just Linux. Over the years it has supported operating systems like Solaris, Ultrix, AIX, Cygwin, OS X, and many, many others. Each of these operating systems can have wildly different ideas about organization and packaging of software. The autotools system created the configure file, and its error messages are therefore are not tailored to any particular operating system.

You are running into issues and error messages that are really for developers or system integrators to deal with. As a user, the recommended procedure is to not worry about integrating Tesseract into your computer starting from source code, but rather take advantage of the packaging work done by others. Update your Linux distribution to the modern era, install tesseract with the standard tools, and it will work. The alternative is doing a ton of system administration work (not just with Tesseract, but with its entire chain of dependencies) that is not particularly fun or easy.

If you really want to learn about the nuances of autotools, there are entire books written about it such as https://www.sourceware.org/autobook/autobook/autobook_toc.html However, I respectfully suggest spending the time and brain cells on something else. I expect that most software will eventually move to simpler and better build systems in future decades.

This issue should not be re-opened, and let me explain why.

You have only explained the cause of the issue.

I think you misunderstood the point I made in comment https://github.com/tesseract-ocr/tesseract/issues/215#issuecomment-181944624
What I meant is not that the error message should give you the exact name of the package - I do understand that can change from OS to OS- but it could easily be more informative in telling you that it's missing the _headers_ and not the actual library.

The autotools system created the configure file

That doesn't mean you can't edit it.

You are running into issues and error messages that are really for developers or system integrators to deal with. As a user, the recommended procedure is to not worry about integrating Tesseract into your computer starting from source code

That's the whole essence of this issue (and I guess many others): that is a plain wrong approach.
Granted, the _recommended_ procedure is to take advantage of package, but you should take into account that the recommended procedure is not always an option.

A user may well have to compile the software from source code, either because the distribution he needs to install it on is not "from the modern era" - by the way, Debian 6 is just 5 years old; let me ask a question: would you tell somebody trying to install tesseract on Windows Vista (released 2007) to upgrade their OS to the modern era? - or because it's a distribution for which there's no package at all.

I know compiling is not supposed to be particuarly easy for the end user, but there's no need to make it more painful than it could easily be. I have had to compile software from source quite a few times, though I am no developer (of the software I had to compile) or system integrator, and usually they come with an unambiguous list of requirements, a set of steps to follow (typically run a bash script that ships with the source code, then configure, then make and make install, like in this case) and they usually work out of the box. And when they don't, you definitely know right away what you're missing from a glance at an error message. If neither of these is the case, then I'm sorry but it's poorly maintained software.

In this issue I am reporting the fact that the error messages issued when the leptonica _headers_ are missing or when some stuff related to a pixCreate thing is missing (what is that? a function in the library whose inavailability means the version of the library doesn't meet the requirements?), are poor. This could easily, and hence should, be improved. You explained perfectly (it was already clear) why the situation is what it is, but that's no valid reason for not improving it.

You confuse an explanation of why the problem exists with an argumentation that it is not a problem (which unfortunately is, I have to admit, a pretty common mistake).

I'm having the same problem, and I can't figure out how to solve it.

Please use Tesseract users forum and ask this question (and other questions you might have) there.

Please use Tesseract users forum and ask this question (and other questions you might have) there.

What about fixing the bug instead, so that the error message is clear enough and there's nothing to figure out?

any solution to this issue?

any solution to this issue?

Yes.

  • Most users should install the version that is available through the package manager of their distro.

    • Advanced users can try to install a newer version from source. They should follow the instruction in the wiki.

  • If a user still have problems installing the software, he/she should use the forum for support.

Like it or not, that's our solution.

Try using "make install" after "make " inside your leptonica-1.xx folder

I know this is an old closed issue, but I found a solution to this issue and will share my solution, since it was not obvious.

The short summary: I tried building latest tesseract (and leptonica) from source using cmake for both libs. The tesseract cmake step could not find leptonica. My solution was to set PKG_CONFIG_PATH before cmake. My exact steps to address the problem are below.

I will assume you have installed other pre-requisites (like giflib) as specified by both libraries.

After pre-requisites, I installed Leptonica to a custom location:

export INSTALL_PREFIX=~/.sources
git clone https://github.com/DanBloomberg/leptonica.git
mkdir leptonica/build ; cd leptonica/build
cmake -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DBUILD_PROG=1 ..
make
make install

Then, I installed Tesseract to the same custom location. Note the use of PKG_CONFIG_PATH.

export INSTALL_PREFIX=~/.sources
git clone https://github.com/tesseract-ocr/tesseract.git
mkdir tesseract/build ; cd tesseract/build
PKG_CONFIG_PATH=$INSTALL_PREFIX/lib/pkgconfig cmake -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX/lib ..
make
make install

Finally, I use tesseract by exporting these variables in my bash profile. The only one you probably need is PATH...

export LD_LIBRARY_PATH=$HOME/.sources/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=$HOME/.sources/lib:$HOME/.sources/lib64:$LIBRARY_PATH
export INSTALL_PREFIX=$HOME/.sources
export PATH=$HOME/.sources/bin:$PATH

And an example run:

git clone https://github.com/tesseract-ocr/tessdata.git
tesseract --tessdata-dir ./tessdata -l heb ./tesseract/testing/hebrew.png out
cat out.txt 

A single file containing all above code is here: https://gist.github.com/adgaudio/f772004444dc808e900c057d45f8b52e

FYI: If you prefer to use the ./autogen.sh approach, you should be able to replace ./configure with PKG_CONFIG_PATH=$INSTALL_PREFIX/lib/pkgconfig ./configure --prefix=$INSTALL_PREFIX --with-extra-libraries=$INSTALL_PREFIX/lib. (I just tried it and it worked).

I hope you find this helpful :)

Thanks a lot, adgaudio!
I was getting the error message "configure: error: Leptonica 1.74 or higher is required." Try to install libleptonica-dev package".
Your solution worked like a charm!!!

Was this page helpful?
0 / 5 - 0 ratings