Hi, Slovak language OCR tesseract dictionaries cannot be downloaded via SubtitleEdit and when I try download it from tesseract git it makes SubtitleEdit crash everytime... also their "slk.traineddata" is way bigger than yours.
Yes, indeed slk.traineddata not available for download from SE. You are not for the 3.02 version downloaded, for this reason to crash everytime... You need this one: https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.slk.tar.gz/download
Checked - works.
@niksedk : this many also missed, add please
When I say crash i mean this http://imgur.com/txyQgpI
@lucybook: Could you test latest beta version: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.2/SubtitleEditBeta.zip
Better?
SE still uses Tesseract 3.02 as later versions are not nearly as good... a few things are better, but it mostly just worse and slower.
By the way, does SE train Tesseract, or just use the existing language files?
SE just uses the existing language files
@darnn Nope, it does't. Need using existing, but may build yourself. Truth is a troublesome thing...
@niksedk I tried beta and result is this http://imgur.com/uO5SGjk I have downloaded tesseract slovak dictionary with SE and also slovak spell checking dictionary. All the orange lines are empty btw...
@lucybook Need to add "eng.traineddata" then it will work.
New beta up: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.2/SubtitleEditBeta.zip
The Slovak 3.02 dictionary seems unfinished, so SE now links to Slovak 3.04 version...
It's working now, thank you 馃
Most helpful comment
New beta up: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.5.2/SubtitleEditBeta.zip
The Slovak 3.02 dictionary seems unfinished, so SE now links to Slovak 3.04 version...