Tesseract: Running .net core console application on AWS linux 2.

Created on 15 Nov 2018  Â·  25Comments  Â·  Source: charlesw/tesseract

Charles,
We last spoke regarding mono a long time ago. The .net core stuff is exciting, so I'm running full speed at it, and I feel like I'm close, but I'm having an issue, since I don't see a lot of actual linux testing, I thought I'd share what I've got, and see where that leads.

I'm running the 3.2.0 alpha 4 build in a .net core console project. I get the following error while trying to load pix from memory.

Method not found: 'System.Reflection.Emit.AssemblyBuilder System.AppDomain.DefineDynamicAssembly(System.Reflection.AssemblyName, System.Reflection.Emit.AssemblyBuilderAccess)'. !StackTrace: at InteropDotNet.InteropRuntimeImplementer.CreateInstanceT
at Tesseract.Interop.LeptonicaApi.Initialize()
at Tesseract.Interop.LeptonicaApi.get_Native()
at Tesseract.Pix.LoadTiffFromMemory(Byte[] bytes)

So my question is... am I even using the right build, or do I need to try and migrate to the tesseract 4.0 development fork? Is there anything else I'm missing?

Thanks!

Most helpful comment

@cypressious this is my Dockerfile. It's for a web application and not a console application, so their might be a few differences.

FROM microsoft/dotnet:2.1-aspnetcore-runtime AS base
# Install packages required for tesseract 3.3.0
RUN apt update && apt install libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6 libc6-dev libgdiplus -y && apt clean
WORKDIR "/app"
EXPOSE 80

ENV ASPNETCORE_URLS=http://*:80/

FROM microsoft/dotnet:2.1-sdk AS build
WORKDIR "/src"
COPY ["Scriptum/Scriptum.csproj", "Scriptum/"]
WORKDIR "/src/Scriptum"

RUN dotnet restore -nowarn:msb3202,nu1503
WORKDIR "/src"
COPY . .
WORKDIR "/src/Scriptum"
RUN dotnet build "Scriptum.csproj" -c Release -o /app

FROM build AS publish
RUN dotnet publish "Scriptum.csproj" -c Release -o /app

FROM base AS final
COPY --from=publish /app .
RUN sed -i 's/false/true/g' web.config
ENTRYPOINT ["dotnet", "Scriptum.dll"]

I've added the .so files to the x64 folder in my visualstudio workspace. So I don't need to copy them. I've build these files from source like described here. I compiled the .so files in a container that is exactly the same as the one in my Dockerfile.. I've attached a zip file which contains the files within my x64 folder.

x64.zip

All 25 comments

Hi, I'd recommend build the project from source at the moment the NuGet
packages are quite out of date. The main device branch (tesseract 3.05)
does support .net core as does the Tesseract 4 branch. I personally haven't
tested these on Linux though but others have reported having some success.

It should be noted that some of the tests are failing for tesseract 4
(mainly detecting page orientation) which I'm still looking into so use the
develop branch instead if you need this functionality.

Good luck

On Fri., 16 Nov. 2018, 00:51 fhbiii <[email protected] wrote:

Charles,
We last spoke regarding mono a long time ago. The .net core stuff is
exciting, so I'm running full speed at it, and I feel like I'm close, but
I'm having an issue, since I don't see a lot of actual linux testing, I
thought I'd share what I've got, and see where that leads.

I'm running the 3.2.0 alpha 4 build in a .net core console project. I get
the following error while trying to load pix from memory.

Method not found: 'System.Reflection.Emit.AssemblyBuilder
System.AppDomain.DefineDynamicAssembly(System.Reflection.AssemblyName,
System.Reflection.Emit.AssemblyBuilderAccess)'. !StackTrace: at
InteropDotNet.InteropRuntimeImplementer.CreateInstanceT
at Tesseract.Interop.LeptonicaApi.Initialize()
at Tesseract.Interop.LeptonicaApi.get_Native()
at Tesseract.Pix.LoadTiffFromMemory(Byte[] bytes)

So my question is... am I even using the right build, or do I need to try
and migrate to the tesseract 4.0 development fork? Is there anything else
I'm missing?

Thanks!

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/charlesw/tesseract/issues/451, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPzyPzzYCjzgd2H0CE_QeQlHPUmxNGRks5uvXFPgaJpZM4Yf1f8
.

Ok, so I've built and tried both the 4.0 branch, and the develop branch. Interestingly enough, I'm getting the same error I got in linux, in my windows development environment now.

Method not found: 'System.Reflection.Emit.AssemblyBuilder System.AppDomain.DefineDynamicAssembly(System.Reflection.AssemblyName, System.Reflection.Emit.AssemblyBuilderAccess)'. !StackTrace: at InteropDotNet.InteropRuntimeImplementer.CreateInstanceT
at Tesseract.Interop.LeptonicaApi.Initialize()
at Tesseract.Interop.LeptonicaApi.get_Native()
at Tesseract.Pix.LoadTiffFromMemory(Byte[] bytes)

Am I doing something wrong? I've only got the eng.traineddata file in my tessdata folder, where before I had cube and bigrams. other than the dll versions, that's the only difference I can see.

Thanks

Ok,
so I added the tesseract-test package for 4 to my project, and I got a little further.
I ran into an issue with a missing libdl.so, which was corrected by creating a link to libdl.so.2 named libdl.so. (sudo ln -s /lib64/libdl.so.2 /lib64/libdl.so)
I cam currently getting "failed to find library liblept1760.so for platform x64 using logic UnixLibraryLoaderLogic" Which I think points me right back to issue #433. And while I'm eager to make it work, I'm not as comfortable with the steps toward resolution those users mentioned.

Are you planning a resolution to #433 or a "linux build" any time soon?
Thanks

My main issue with resolving #433 is I haven't got a linux setup atm. Will have to set that up on a virtual machine first when I have time.

I have had the library running with some minor changes on .NET core on Linux and on a Mac, with system-provided and custom-linked tesseract builds. See my comments on #433 too. I'm willing to experiment more to contribute to this effort, but don't have the time atm unfortunately. Hope to have some time for this the next cpl of weeks.

Can anyone see if this is resolved for AWS Linux in the latest release (3.3). I've implemented the fixes discussed in #433. Note you will need to bundle the tesseract 3.05.02 and leptonica 1.75.3 binaries for Linux in yourself (i.e. the .so files). Note that I believe you will need to place these with the expected names in an x86 or x64 (depending on your targeted architecture). It won't be sufficient to just use a package manager to install the libraries (sorry!).

Finally I got to testing. Unfortunately I don't have a tesseract 3.0.5 setup on my hands. So I have tested on a vanilla Ubuntu 18.04 LTS server with tesseract 4 as default. So basically I have applied your commit for #433 to tesseract 4 feature branch and built it.

Symlinks (or copies as you suggested) are needed for the .so files at their expected positions like so:
x64/liblept1760.so -> /usr/lib/x86_64-linux-gnu/liblept.so.5
x64/libtesseract400.so -> /usr/lib/x86_64-linux-gnu/libtesseract.so.4

And it works :)

Some remarks and details for the record:

  • I tested with .net core SDK 2.2
  • It complained with the following error message:

System.DllNotFoundException: Unable to load shared library 'libdl.so' or one of its dependencies.

So I went and installed gcc, and then it worked. No idea if there is any smaller package that would fulfill this dependency. (I'd be interested if anybody knows)

@charlesw I can confirm that the v3.3 NuGet package works on AWS Lambda (CentOS, I believe) with the .so files you mentioned: liblept1753.so and libtesseract3052.so. I placed the .so files in a folder like <MyProject>/x64.

I still do not like how it requires library files with that exact naming scheme as above (the version in the filename), but it does work.

I try to run tesseract in a Docker container but first I got the error "libdl.so" not found. I could fix that by creating a symlink from /lib/x86_64-linux-gnu/libdl.so.2 to /usr/lib/x86_64-linux-gnu/libdl.so

Now i am with the error
System.DllNotFoundException: Failed to find library "liblept1753.so" for platform x64

I have already copied my liblept1753.so into the /app/x64 folder but it still can't find the library.
On native Ubuntu everything works fine with the same liblept1753.so in the x64 folder.

@HelgeL could you please provide step by step how you made symlinks on liblept1760.so and libtesseract400.so? Currently I have both these files under project directory inside x64 folder.

I'm getting an error "Failed to find library "liblept1760.so" for platform x64 using logic UnixLibraryLoaderLogic."

I tried to copy these files into folder /usr/lib/x86_64-linux-gnu/ but still not working.

Thanks

@charlesw could you implement those changes also into version 4 branch?

@HelgeL could you please provide step by step how you made symlinks on liblept1760.so and libtesseract400.so? Currently I have both these files under project directory inside x64 folder.

I'm getting an error "Failed to find library "liblept1760.so" for platform x64 using logic UnixLibraryLoaderLogic."

I tried to copy these files into folder /usr/lib/x86_64-linux-gnu/ but still not working.

Thanks

@charlesw could you implement those changes also into version 4 branch?

Create a folder "x64" in your project directory (where your .csproj file is located) and copy liblept1760.so and libtesseract400.so into it. Now you should be able to see your both files in visual studio. You will have to set "copy to output directory" (is the property called like this in english?) for both files to "always"

Make sure you have the following packages installed inside your container
apt-get install -y libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6

Otherwise the dlopen command for liblept will fail and you will get your mentioned errormessage.

If you don't have the liblept package installed inside your container and only copied the .so file into the x64 directory, the open command for libtesseract will fail.

To fix this, you have to create a sym link to your liblept shared object.
Just run inside your container / Dockerfile
ln -s /app/x64/liblept1760.so /usr/lib/x86_64-linux-gnu/liblept.so.5
Make sure to use the correct source path. For default asp.net core docker images and my described way /app/x64/liblept1760.so should be working.

Feel free to task, if something will not work :)

@chixlol thanks, unfortunately I'm still getting the same error.

I got these packages libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6 inside container. I also created sym link from /app/x64/liblept1760.so to /usr/lib/x86_64-linux-gnu/liblept.so.5. No idea why it still can't find it.

@chixlol thanks for providing the info. It's been a while since I fiddled with this, but I found that I didn't need to copy the .so files to x64 folder but only created symbolic links:

ln -s /usr/lib/x86_64-linux-gnu/libtesseract.so.4 libtesseract400.so
ln -s /usr/lib/x86_64-linux-gnu/liblept.so.5 liblept1760.so

and result looks like:

root@testdotnet:/home/test/test/ocr/publish/x64# ls -l
total 6272
-rwxr--r-- 1 test test 3742208 Jul  5  2018 liblept1760.dll
lrwxrwxrwx 1 root     root          38 Dez 30 17:35 liblept1760.so -> /usr/lib/x86_64-linux-gnu/liblept.so.5
-rwxr--r-- 1 test test 2677248 Jul  5  2018 libtesseract400.dll
lrwxrwxrwx 1 root     root          43 Dez 30 17:32 libtesseract400.so -> /usr/lib/x86_64-linux-gnu/libtesseract.so.4

and it worked (for reference: Ubuntu 18.04.1 LTS, .net core 2.2. dll files are from windows and are ignored on that box)

If you enable traces doesn't it say where it checks and whether or not it found it?

Thanks everyone, looks like it was problem on my side, since I'm fairly new to Docker. I didn't know I need to confirm (commit) changes on docker image..

Chixlol's sym links doesn't work for me, but HelgeL's does at least for liblept, when I install tesseract in container. It doesn't work for tesseract because container is Debian where current stable version is 3 and not 4. I tried to copylibtesseract400.so as libtesseract.so.4 but for some reason this doesn't work.

Edit: after installing version 4 from backports it seems to work (I have got rid of that tesseract error, but got new one from other library). Thanks for help.

Does anyone have (or can create) a full guide for getting this working from start to finish?

I'm looking to get this running on the latest Raspbian, if possible.

@BrentMcFerrin I need the liblept1753.so and libtesseract3052.so files. Can you provide these or explain how I can get them otherwise?

I'm trying to make it work for the current version on NuGet (3.3.0) which depends on Tesseract 3.0.5 inside a Docker container.

I'm currently stuck with System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.DllNotFoundException: Failed to find library "libtesseract3052.so" for platform x64.

Here's my Dockerfile

...

FROM microsoft/dotnet:2.2.2-aspnetcore-runtime-bionic
WORKDIR /app

RUN apt-get update
RUN apt-get install -y libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6 liblept5

COPY --from=builder /dockerout .
COPY --from=webuni/tesseract:3 /usr/lib/libtesseract.so.3.0.5 ./x64/libtesseract3052.so

RUN ln -s /usr/lib/x86_64-linux-gnu/liblept.so.5 x64/liblept1753.so
RUN ln -s /lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/libdl.so

I'm installing the liblept5 dependency via apt-get and creating a symlink like @HelgeL showed. Unfortunately, Tesseract 3 is not available via apt-get so I'm copying it from the webuni/tesseract:3 docker image.

@cypressious I had the same problem. You also need the"libgdiplus" package.

@FerronN I've added libgdiplus to the apt-get install line but same result. Can you share your Dockerfile? If you're not using Docker, can you share where you got the tesseract .so file from?

@cypressious this is my Dockerfile. It's for a web application and not a console application, so their might be a few differences.

FROM microsoft/dotnet:2.1-aspnetcore-runtime AS base
# Install packages required for tesseract 3.3.0
RUN apt update && apt install libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6 libc6-dev libgdiplus -y && apt clean
WORKDIR "/app"
EXPOSE 80

ENV ASPNETCORE_URLS=http://*:80/

FROM microsoft/dotnet:2.1-sdk AS build
WORKDIR "/src"
COPY ["Scriptum/Scriptum.csproj", "Scriptum/"]
WORKDIR "/src/Scriptum"

RUN dotnet restore -nowarn:msb3202,nu1503
WORKDIR "/src"
COPY . .
WORKDIR "/src/Scriptum"
RUN dotnet build "Scriptum.csproj" -c Release -o /app

FROM build AS publish
RUN dotnet publish "Scriptum.csproj" -c Release -o /app

FROM base AS final
COPY --from=publish /app .
RUN sed -i 's/false/true/g' web.config
ENTRYPOINT ["dotnet", "Scriptum.dll"]

I've added the .so files to the x64 folder in my visualstudio workspace. So I don't need to copy them. I've build these files from source like described here. I compiled the .so files in a container that is exactly the same as the one in my Dockerfile.. I've attached a zip file which contains the files within my x64 folder.

x64.zip

@FerronN Awesome! I think it's working. Now off to figuring out how to make OpenCV work...

Hello, thank you for your comment . I have several question to use tesseract under Linux os or docker container. Did you change contant.cs with libtesseractxxx.so and leptonicaxxxx.so ? Did you try with tesseract v4 ?

@HelgeL could you please provide step by step how you made symlinks on liblept1760.so and libtesseract400.so? Currently I have both these files under project directory inside x64 folder.
I'm getting an error "Failed to find library "liblept1760.so" for platform x64 using logic UnixLibraryLoaderLogic."
I tried to copy these files into folder /usr/lib/x86_64-linux-gnu/ but still not working.
Thanks
@charlesw could you implement those changes also into version 4 branch?

Create a folder "x64" in your project directory (where your .csproj file is located) and copy liblept1760.so and libtesseract400.so into it. Now you should be able to see your both files in visual studio. You will have to set "copy to output directory" (is the property called like this in english?) for both files to "always"

Make sure you have the following packages installed inside your container
apt-get install -y libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6

Otherwise the dlopen command for liblept will fail and you will get your mentioned errormessage.

If you don't have the liblept package installed inside your container and only copied the .so file into the x64 directory, the open command for libtesseract will fail.

To fix this, you have to create a sym link to your liblept shared object.
Just run inside your container / Dockerfile
ln -s /app/x64/liblept1760.so /usr/lib/x86_64-linux-gnu/liblept.so.5
Make sure to use the correct source path. For default asp.net core docker images and my described way /app/x64/liblept1760.so should be working.

Feel free to task, if something will not work :)

Hi @chixlol, i am working on dockerization of .net core 2.2 app which is using Genesis.Tesseract4. after building docker image i am unable to apply OCR on a image it gives the inner exception Failed to find library "liblept1760.so" for platform x64 before docker i was working on windows environment where tesseract needed liblept1760.dll which is available(and tesseract is working) but i cant find "liblept1760.so" and "libtesseract400.so" anywhere i also tried your solution by adding tesseract packages in docker file and then created a sys link but same error appears again. below is my docker file please see if i am doing something wrong.

`FROM mcr.microsoft.com/dotnet/core/sdk:2.2 AS build-env
WORKDIR /app

COPY *.csproj ./
RUN dotnet restore

COPY . ./

RUN dotnet publish -c Release -o out

FROM mcr.microsoft.com/dotnet/core/aspnet:2.2
WORKDIR /app
COPY --from=build-env /app/out .

RUN apt update && apt install libgif7 libjpeg62 libopenjp2-7 libpng16-16 libtiff5 libwebp6 libc6-dev libgdiplus -y && apt clean

RUN ln -s /app/x64/liblept1760.so /usr/lib/x86_64-linux-gnu/liblept.so.5
RUN ln -s /app/x64/libtesseract400.so /usr/lib/x86_64-linux-gnu/libtesseract.so.4

RUN apt install -y ghostscript

ENTRYPOINT ["dotnet", "My.dll"]`

@aqibshabbir did you find a solution for this error?

@cypressious how did you get this to work?

Was this page helpful?
0 / 5 - 0 ratings