Tesseract: I tried to OCR a PDF file with ver 4 on Windows 10 but returned:...

Created on 14 Apr 2018  路  2Comments  路  Source: tesseract-ocr/tesseract

I tried to OCR a file "Kamus_Arab-Indonesia.pdf" - in English: "Arabic - Indonesia Dictionary"..
so I typed from tesseract install dir:

tesseract.exe D:\DOC\ARABIC\Kamus_Arab-Indonesia.pdf  z:\t\Kamus.pdf  -l ara+ind --psm 1

Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica
Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.

How could I solve mine such above?.. many thanks in advance.

Most helpful comment

Tesseract does not support reading PDF files.

You can try other software, for example OCRmyPDF.

All 2 comments

Tesseract does not support reading PDF files.

You can try other software, for example OCRmyPDF.

Apparently OCRmyPDF uses Tesseract under the hood, so I think that's important to note.

Was this page helpful?
0 / 5 - 0 ratings