Subtitleedit: OCR issue with command line

Created on 28 Mar 2019  路  7Comments  路  Source: SubtitleEdit/subtitleedit

Hello!

It is not possible to specify the engine mode of the OCR feature while using SubtitleEdit via comand line.
SubtitleEdit use "Tesserac only" and not my default value "Tesserac + LSTM".
Therefore many subtitles are useless because it does not recognize the text correct.

Is there any possibility to fix this issue or is there any additional parameter which I can set in the cmd to use LSTM for OCR?

Most helpful comment

Or obtain the desired settings.xml file as described here, and just extract the portable version of SE to a different folder and use that settings file with it.

All 7 comments

It looks like Subtitle Edit uses Configuration.Settings.VobSubOcr (see src/Logic/CommandLineConvert.cs).
Those are the OCR settings that were used previously, as stored in in the <VobSubOcr> section
of your Settings.xml file (in your %AppData%\Subtitle Edit folder).

What does that mean ?

I'm facing the same problem.

I want to convert 100 sup files to srt, with both tesseract method and LSTM method in two separate batch.

How can I achieve it ?

@andiandi13
I suppose, the most convenient way would be to,

  • exit Subtitle Edit
  • open the %AppData%\Subtitle Edit folder in Explorer
  • copy Settings.xml to Settings.GUI.xml
  • start Subtitle Edit
  • select "File > Import/OCR Blu-ray" to open a sup file
  • set up the OCR parameters (tesseract etc.)
  • click OK to close the Import/OCR dialogue
  • exit Subtitle Edit (decline when asked to save the empty subtitles)
  • rename the changed Settings.xml to Settings.OCR-tess-LSTM.xml
  • rename Settings.GUI.xml to Settings.xml to restore the original settings

Alternatively, you could edit (a copy of) the Settings.xml file manually in a text editor, like vim or notepad. In this case you would find the OCR parameters in the <VobSubOcr> section.

Run the command line conversion from within a script, that copies Settings.OCR-tess-LSTM.xml to Settings.xml before invoking SubtitleEdit.exe. In a powershell script, for example,

Copy-Item -Destination "${env:APPDATA}\Subtitle Edit\Settings.backup.xml" -LiteralPath "${env:APPDATA}\Subtitle Edit\Settings.xml"
Copy-Item -Destination "${env:APPDATA}\Subtitle Edit\Settings.xml" -LiteralPath "${env:APPDATA}\Subtitle Edit\Settings.OCR-tess-LSTM.xml"
SubtitleEdit.exe /convert @args | Write-Output
Copy-Item -Destination "${env:APPDATA}\Subtitle Edit\Settings.xml" -LiteralPath "${env:APPDATA}\Subtitle Edit\Settings.backup.xml"

Or obtain the desired settings.xml file as described here, and just extract the portable version of SE to a different folder and use that settings file with it.

I have portable Subtitle edit so I can replace Settings.xml with preferred OCR method.

But now, how do I do a batch command line (no GUI) ?

When I run a batch conversion in the GUI, it seems that it'll never finish one file, I get impatient and I close it (but when I OCR only one .sup file, there's no problem)...

Never mind, I updated Subtitle edit to the last version and the batch now works.

But I still wonder, can we do a command line batch in Windows (.bat file) ?

Run SubtitleEdit.exe /? or SubtitleEdit.exe /help to show the supported command line parameters.

Was this page helpful?
0 / 5 - 0 ratings