Tesseract: Tesseract 3.05 support

Created on 13 Apr 2017  Â·  25Comments  Â·  Source: charlesw/tesseract

Tesseract 3.05 has been available for a couple months now. Will you release a compatible version for it? Thanks.

enhancement

All 25 comments

Me too, I would highly appreciate if this would happen !!

Hehe I didn't even realise it was out!

In 3.05, there was a new method in c-api interface: TessBaseAPIDetectOrientationScript.

I wrote a little program to detect the orientation of the page. In it's current form it only tries to find out if the page is upside down but you could easily change the code to detect other orientations. What it does is OCR a part of the text, then rotate the page 180 degrees and then to the same. The best result then decides the orientation of the page.

https://github.com/Sicos1977/PageOrientationEngine

Tesseract 3.05.01 has been released. There were minor changes to the c-api interface.

https://github.com/tesseract-ocr/tesseract/releases

Thanks for letting me know. According to https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows it looks like tesseract have improved their build process though I'll still need to reinstall VS2015 etc to avoid changing the target runtime. I'll see if I can upgrade the project this weekend.

Got caught up with other things, will have to wait till next weekend.

Merged in #355, thanks @nguyenq

I've also issued a new nuget release 3.2.0-alpha3 and will look at doing a full release over the weekend assuming no issues are found.

Hi. Looking on the nuget site, I see that 3.0.2 is still the latest stable release. Will a Tesseract 3.0.5 version be coming available or will it be a move straight to 3.2.0?

Hey guys, I currently use the DetectBestOrientation method to ensure each page is rotated to the correct orientation before OCR, and it's worked very well. In 3.2.0-alpha3 this has been commented out due to the TessBaseAPIDetectOrientationScript change.

Is there any plan to implement a "new" DetectBestOrientation that uses TessBaseAPIDetectOrientationScript?

DetectBestOrientation method was commented out because the Tesseract API method, TessBaseAPIDetectOS, that it uses was considered unsafe and hence removed by Tesseract developers. The new TessBaseAPIDetectOrientationScript method was created in its place. It is included in the .NET version but not yet exposed in Page class. A PR has just been submitted for this.

Thanks @nguyenq I've merged the PR, will try and give it a test run tomorrow.

If anyone could have a look at the DetectBestOrientationAndScript and co methods and let me know if they meet your requirements that would be great. If so I'll create a new Nuget when I find a bit of time.

Thanks.

Thanks guys. Just had a look and definitely fits the bill 🙂 Looking forward to giving it a test when I'm back in the office!

If you can put a Nuget together in the next little while, I'll have opportunity to do some testing .. sorry, not trying to be pushy, just want to help out.

Okay, I'll see what I can do.

On Sat, 19 Aug 2017 at 12:59 MattMofDoom notifications@github.com wrote:

If you can put a Nuget together in the next little while, I'll have
opportunity to do some testing .. sorry, not trying to be pushy, just want
to help out.

—
You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub
https://github.com/charlesw/tesseract/issues/340#issuecomment-323495733,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPzyCouqoxpyK_LCFwnhJ6VL4iU85G4ks5sZk-bgaJpZM4M8v00
.

Sorry for the delay, the nuget package, 3.2.0-alpha4 should be up now.

No problem at all - thanks very much. I'll let you know how I go.

I've been able to do some testing with 3.2.0-alpha4, and it's looking good.. no issues jumping out.

I'm trying to call the new method PixArray.Add(Pix pix, int copyflag), but it keeps throwing NonComVisibleBaseClass exceptions with the following code below. Can someone take a look?

var pix = Pix.LoadFromFile(filename);
PixArray pixA = PixArray.Create(0);
pixA.Add(pix, 0); // L_NOCOPY

NonComVisibleBaseClass occurred
Message: Managed Debugging Assistant 'NonComVisibleBaseClass' has detected a problem in 'C:\PROGRAM FILES (X86)\MICROSOFT VISUAL STUDIO 14.0\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\TESTWINDOW\vstest.executionengine.x86.exe'.
Additional information: A QueryInterface call was made requesting the class interface of COM visible managed class 'Tesseract.Pix'. However since this class derives from non COM visible class 'Tesseract.DisposableBase', the QueryInterface call will fail. This is done to prevent the non COM visible base class from being constrained by the COM versioning rules.

@nguyenq I am having the same error - is there any reason to use PixArray vs ArrayList made up of Pix? I have got the latter to work and currently that is fine for me.

I'll see if I can have a look over the weekend, time permitting.

From memory PixArray is really just used to support loading multi-page
tiff's if you can use another data structure like a List then I'd
suggest you do so. However you must ensure they're disposed of when you're
done.

On Thu., 7 Sep. 2017, 08:08 Ryan Leonard notifications@github.com wrote:

@nguyenq https://github.com/nguyenq I am having the same error - is
there any reason to use PixArray vs ArrayList made up of Pix? I have got
the latter to work and currently that is fine for me.

—
You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub
https://github.com/charlesw/tesseract/issues/340#issuecomment-327620665,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPzyMqDkMZIW4jI40T4UVPAQE1ujBVKks5sfxH2gaJpZM4M8v00
.

PixArray is Leptonica native type. I hoped I could use it for holding non-TIFF images, with usage similar to the code outlined in ProcessMultipageTiff test case in ResultRendererTests class (for the purpose and benefits of code reuse); otherwise, I can use code similar to ProcessFile instead. No problem really.

I've had a look and identified some issues with the PixArray.Add which I've fixed but unfortunately located some other memory management related issues. In short PixArray should be safeenough to use for Multipage Tiff's however usage a just an array of pix won't work at the moment so use vector\arraylist for now. I'll see if I can resolve the other issues and push a fix when I can find the time.

Closing as Tesseract 3.05 should now be supported. If you find any bugs with the wrapper please file new issues or better yet lodge a pull request with the fix :)

Was this page helpful?
0 / 5 - 0 ratings