Wav2letter: 'Unable to open dictionary file 'tokens.txt''

Created on 25 Feb 2019  路  5Comments  路  Source: flashlight/wav2letter

I don't understand why I am getting errors opeing tokens.txt when I have specified the file path correctly in train.cfg. Maybe it is due to my encoding --utf-8--?

Mytokens.txt is encoded with utf-8 and the cotent is a set of korean tokens

path to my tokens.txt:
/home/wav2letter/data/processed_data/wav2letter/korean/tokens.txt

How I specified path to tokens.txt in my train.cfg:

--tokensdir=/home/wav2letter/data/processed_data/wav2letter/korean
--tokens=tokens.txt

My stack trace is

F0225 03:04:26.933456   499 Utils.cpp:237] Unable to open dictionary file 'tokens.txt'
*** Check failure stack trace: ***
    @     0x7fa8f766e5cd  google::LogMessage::Fail()
    @     0x7fa8f7670433  google::LogMessage::SendToLog()
    @     0x7fa8f766e15b  google::LogMessage::Flush()
    @     0x7fa8f7670e1e  google::LogMessageFatal::~LogMessageFatal()
    @           0x52e5a6  w2l::createTokenDict()
    @           0x52e689  w2l::createTokenDict()
    @           0x417d38  main
    @     0x7fa8a2d0b830  __libc_start_main
    @           0x4656f9  _start
    @              (nil)  (unknown)
Aborted (core dumped)

Most helpful comment

Once the file is well encoded with UTF8 and system where you run the training handles utf8 well ( docker or otherwise) it should work, worked well for me for french characters. Check if the file is in the right directory too, maybe you just mi-specified the tokens.txt file location in the config file.

All 5 comments

I had a similar error before when dealing with french. not sure it's the same case, but my problem was that I created the tokens.txt file in a python script without specifying utf8 as the encoding format. that alone did not correct the error by itself, another issue was that I executed the script inside the provided docker image. that image does not deal with utf8 encoding correctly. I solved it by installing the locals in the docker image https://stackoverflow.com/questions/28405902/how-to-set-the-locale-inside-a-ubuntu-docker-container
and then regenerating all the text files. everything worked smoothly after that.
Hope this helps.

Thanks a lot!
So I need to do 2 things: 1. Change setting to utf8, and 2. Dont use docker.

How do I do (1)?

Once the file is well encoded with UTF8 and system where you run the training handles utf8 well ( docker or otherwise) it should work, worked well for me for french characters. Check if the file is in the right directory too, maybe you just mi-specified the tokens.txt file location in the config file.

Thanks a lot :) Ill try tmrw and get back.

Training with a UTF-8-encoded tokens file is and has been supported. Closing for now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pzelasko picture pzelasko  路  6Comments

EdwinWenink picture EdwinWenink  路  4Comments

kamakshi-malhotra picture kamakshi-malhotra  路  5Comments

ekorudi picture ekorudi  路  5Comments

abhinavkulkarni picture abhinavkulkarni  路  3Comments