Tesseract: I tried to OCR a PDF file with ver 4 on Windows 10 but returned:...

Created on 14 Apr 2018 · 2Comments · Source: tesseract-ocr/tesseract

I tried to OCR a file "Kamus_Arab-Indonesia.pdf" - in English: "Arabic - Indonesia Dictionary"..
so I typed from tesseract install dir:

tesseract.exe D:\DOC\ARABIC\Kamus_Arab-Indonesia.pdf  z:\t\Kamus.pdf  -l ara+ind --psm 1

Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica
Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.

How could I solve mine such above?.. many thanks in advance.

Source

abdulbadii

Most helpful comment

Tesseract does not support reading PDF files.

You can try other software, for example OCRmyPDF.

stweil on 14 Apr 2018

👍4 ❤1

All 2 comments

Tesseract does not support reading PDF files.

You can try other software, for example OCRmyPDF.

stweil on 14 Apr 2018

👍4 ❤1

Apparently OCRmyPDF uses Tesseract under the hood, so I think that's important to note.

ylluminarious on 9 Oct 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Port tesstrain.sh to Python or C++

egorpugin · 6Comments

Are there more PSM modes than are listed in the help/wiki - 11 and 12?

samiles · 4Comments

SIGSEGV in docker container when called from Java service

duzenko · 3Comments

Tesseract 4.0 crash with Capture2Text_CLI

garry-ut99 · 5Comments

Compiling on Windows failed when executing SW

ivder · 7Comments