Tesseract: Commit da03e4e breaks tesserocr compilation with header files

Created on 19 Jul 2017  ·  35Comments  ·  Source: tesseract-ocr/tesseract

Related issue on Tesserocr: here

Environment

  • Tesseract Version: 4.00alpha
  • Commit Number: latest as of 2017-07-19
  • Platform: Ubuntu 16.04

Current Behavior:

Tesseract header files since commit da03e4e added the use of string in the unichar.h header file that is used during compilation by tesserocr. This causes the compilation of tesserocr to throw an error because the std library is not imported.

In file included from tesserocr.cpp:484:0:
/usr/local/include/tesseract/unichar.h:164:10: error: ‘string’ does not name a type
static string UTF32ToUTF8(const std::vector<char32>& str32);
          ^

Changing static string UTF32ToUTF8(const std::vector<char32>& str32); to static std::string UTF32ToUTF8(const std::vector<char32>& str32); in /usr/local/include/tesseract/unichar.h:164 fixes the problem for me.

I am aware that you do not provide support for product that aren't the command-line Tesseract but I thought you might want to know since it worked before the commit.

Expected Behavior:

Hopefully make it work again, if that's not an option, confirm that this is expected behavior.

Reproducing the error

  • Compile Leptonica from source
  • Compile Tesseract from source (I used --disable-openmp if that matters)
  • Install tesserocr with pip3 install tesserocr (obviously you need pip3 installed)

Most helpful comment

-DUSE_STD_NAMESPACE works 👍

All 35 comments

Try this code before including Tesseract header files:

#include <string>
using std::string;

According to @theraysmith, the Google coding standards require string instead of std::string.

Can we do something in our side that Ray will be okay with?

That's my question too, while I do understand the desire to respect Google's coding guidelines, this probably won't bite only Tesserocr users...

The problem is that -DUSING_STD_NAMESPACE is now needed for all of the tesseract library, not just for training. I fixed it in the Makefile.am files, but corresponding fixes are needed for windows and other non-autotools platforms.

@theraysmith Is it? Because I had that problem about 3 hours ago on Ubuntu. Tesseract does build, it's the modules built with its header files that doesn't.

I'll try adding -DUSING_STD_NAMESPACE and report back.

The same define will be required for anything that uses the library too, as
the define causes a using std::string to be declared in platform.h, which
is exactly what you are missing.

On Wed, Jul 19, 2017 at 3:14 PM, Edouard Belval notifications@github.com
wrote:

@theraysmith https://github.com/theraysmith Is it? Because I had that
problem about 3 hours ago on Ubuntu. Tesseract does build, it's the modules
built with its header files that doesn't.

I'll try adding -DUSING_STD_NAMESPACE and report back.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tesseract-ocr/tesseract/issues/1045#issuecomment-316534228,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AL056Wm5F3tHXCEBRfmSncrNj3xLaO1Jks5sPn-tgaJpZM4OdRN0
.

--
Ray.

The same define will be required for anything that uses the library too,

We need to document that requirement somewhere.

It looks like USING_STD_NAMESPACE must always be defined. I wonder why the Tesseract code does not add using std::string unconditionally (making USING_STD_NAMESPACE unnecessary).

Ray already fixed the issue with USING_STD_NAMESPACE. @Belval, please try the latest version from git.

Indeed the issue is fixed (at least for tesserocr). Thank you!

Problem is back, clean pull and build from master causes the same error described in the issue.

There is probably an include of platform.h needed somewhere.
Can you give us the compiler error?

@Belval, are you using cmake and do you see the problem since commit c67c2e9f416e0b41b5e055e67ba5281e04dd5e6c (but not before that commit)?

@Belval? Please answer to the questions from Ray and Stefan.

@amitdo Sorry, issue fell off my radar because of another problem.

@theraysmith It's the exact same compiler error that was in the original issue.

@stweil According to this issue the last "good" version was this commit. I did not try anything else than the latest version. Will report back with the results.

Quoting my comment from here:

I installed the latest pull from the master branch. These changes were added in unichar.h on 2017-07-14 (tesseract-ocr/tesseract@da03e4e9105b6262706d40ef2b4436eae4ebe19f) and unicharset.h on 2017-07-24 (tesseract-ocr/tesseract@b0ead95d64a3667339775b2f99ac37e97e90c2a0).

Setting USING_STD_NAMESPACE didn't make a difference (same error message).

I had to manually replace string with std::string in both unichar.h and unicharset.h to get it to compile properly.

The correct macro which must be defined is USE_STD_NAMESPACE.

@stweil I'll try it. If it works you might want to edit your comment above to make sure that no one else will make that mistake.

-DUSE_STD_NAMESPACE works 👍

@sirfz Can confirm, I built it successfully too.

I am finding same issue in tesseract fb359fc

#include <string>
using std::string;

before tesseract's includes works again for current master ebbfc3ae8df8 (thanks to @stweil)

I had to patch string -> std::string in one of the headers for our Debian package.

https://github.com/tesseract-ocr/tesseract/commit/aee910a7bfcbe0366806b68dc7f20535764c06d4#r23118118

https://github.com/tesseract-ocr/tesseract/blob/aee910a7bfcbe0366806b68dc7f20535764c06d4/ccutil/platform.h#L57

// Fix to map between google use of string without std and everywhere else.
#ifdef USE_STD_NAMESPACE
#include <string>
using std::string;
#endif

@jbreiden,

What about changing this to:

// Fix to map between google use of string without std and everywhere else.
#ifndef GOOGLE
#include <string>
using std::string;
#endif

I'd prefer #ifndef DO_NOT_USE_STD_NAMESPACE because it is more specific, but yes, that modification would solve the problem for most people. I have no idea whether it would fit into the Google environment.

I have no idea whether it would fit into the Google environment.

but @jbreiden should know...

I am not a language lawyer, and don't know off the top of my head. But it should be very easy for me to test and find out. Unfortunately due to holidays I won’t be able to do that until January 7. If super urgent Dan might be able to help but this type of thing is not his strong suit either. One thing is clear, we need to do something. Because of the OpenCV dependency, Tesseract is now critical infrastructure in Linux distributions. We are not going to force OpenCV to use oddball compilation flags.

It's not urgent. Enjoy the holidays!

https://github.com/tesseract-ocr/tesseract/commit/3f7735492fa117

So my updated suggestion is:

// Fix to map between google use of string without std and everywhere else.
#ifndef GOOGLE_TESSERACT
#include <string>
using std::string;
#endif

Checking right now to see exactly goes wrong with using std::string

Okay, I've looked into this. I've read commit messages. I've done test compiles. I've run lint tools. And I simply don't see what the problem is with std::string. Recommend we switch unichar.cpp and unichar.h to std::string. Leaving things as-is is not a reasonable approach. If I am wrong and missed something, there is always the #ifdef approach to fall back on. @rays, please speak up about this if you wish.

rays, please speak up about this if you wish

The fact the he is 'rays' in Google, does not mean he 'rays' in GitHub.
You 'called' the wrong Ray. He is @theraysmith here.

He said that using string without std:: is a strong convention in Google's C++ code.

I spoke with Ray about this. He is happy for Tesseract to use std::string, as long as it compiles fine and talks with Google internal code that uses a different type of string implementation. Apparently there was trouble with this some time ago, but I I tested it recently and it appears to work fine. @theraysmith and I I are now in agreement; go ahead and use std::string in unichar.cpp and unichar.h. Thank you.

P.S. This change has already in patched into the Debian & Ubuntu packages.

@jbreiden, Thanks.

Thanks also to @theraysmith, of course!

Was this page helpful?
0 / 5 - 0 ratings