Wav2letter: streaming_convnets ERRor terminate called after throwing an instance of 'std::invalid_argument'

Created on 17 Jan 2020  Â·  13Comments  Â·  Source: flashlight/wav2letter

terminate called after throwing an instance of 'std::invalid_argument'
what(): mismatched # of elements in moddims
* Aborted at 1579265403 (unix time) try "date -d @1579265403" if you are using GNU date
PC: @ 0x7f45cbe95428 gsignal
SIGABRT (@0x1ec6000086c0) received by PID 34496 (TID 0x7f462281d7c0) from PID 34496; stack trace:
@ 0x7f46216f6390 (unknown)
@ 0x7f45cbe95428 gsignal
@ 0x7f45cbe9702a abort
@ 0x7f45cc9fa84d __gnu_cxx::__verbose_terminate_handler()
@ 0x7f45cc9f86b6 (unknown)
@ 0x7f45cc9f8701 std::terminate()
@ 0x7f45cc9f8919 __cxa_throw
@ 0x6ad552 fl::moddims()
@ 0x6e6f66 fl::View::forward()
@ 0x6dedff fl::UnaryModule::forward()
@ 0x6ceb42 fl::Sequential::forward()
@ 0x492fdf _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEES_INS0_19FirstOrderOptimizerEES9_ddbiE3_clES2_S5_S7_S9_S9_ddbi.constprop.12666
@ 0x41bf80 main
@ 0x7f45cbe80830 __libc_start_main
@ 0x48de89 _start
@ 0x0 (unknown)
Makefile:2: recipe for target 'train' failed
make: *
* [train] Aborted (core dumped)

All 13 comments

Will you please add steps for reproducing the error?

I use /home/work/wav2letter/build/Train train --flagsfile train.cfg.

train.cfg is :
--runname=exp1
--rundir=./
--tokensdir=./am
--archdir=./
--train=./lists/train.lst
--valid=./lists/dev.lst
--lexicon=./am/lexicon.txt
--arch=am_500ms_future_context.arch
--tokens=tokens.txt
--criterion=ctc
--batchsize=1
--lr=0.4
--momentum=0.0
--maxgradnorm=0.5
--reportiters=1000
--nthread=6
--surround=|
--mfsc=true

--usewordpiece=true

--wordseparator=_

--filterbanks=80
--minisz=200
--mintsz=2
--maxisz=33000
--enable_distributed=true
--pcttraineval=1
--minloglevel=0
--logtostderr
--onorm=target
--input=wav
--sqnorm
--localnrmlleftctx=300

@xgp0602,

could you try with removing commented lines in the config? And why did you comment these lines?

Hi,
There was a small bug in the architecture file. Could you modify the last line in architecture file to V NLABEL 0 -1 1 instead. Sorry about that !

Also, curious why do you use --surround=| instead of --usewordpiece=true --wordseparator=_. Are you trying to use letters instead of wordpieces ?

Hi,
There was a small bug in the architecture file. Could you modify the last line in architecture file to V NLABEL 0 -1 1 instead. Sorry about that !

Also, curious why do you use --surround=| instead of --usewordpiece=true --wordseparator=_. Are you trying to use letters instead of wordpieces ?

I want use streaming_convnets model to Chinese .In Chinese, you can divide a sentence into words and spaces ,like "你好世界“->"你 好 世 界“
How to Modeling (how to prepare "train.lst ,lexicon.txt, tokens.txt") in this case.

Hi @xgp0602,

What happens during training:

  • lexicon is used to map each word in the transcription to the sequence of tokens
  • for each frame AM will predict probability for each token
  • surround is using to map tokens sequence back to words to compute WER during training

As soon as you have no word boundaries in the language you can do following:

  • your tokens set should be all possible hieroglyphs you have (like in English we are using letters)
  • then simplest case to prepare lexicon and train.lst is:

    • you create (in some sense fake) lexicon like mapping each hieroglyph to itself "ä½ " -> "ä½ ".

    • train.lst should be in the format "id path size(in ms) transcription" where transcription is your sentence but each hieroglyph is separated with the space, like instead of putting "你好世界“ you put "ä½  好 世 界“.

    • set to empty --surround=''

    • during training have a look at LER (WER will be wrong and it is not defined in your case)

@vineelpratap, please correct me if I am wrong or better way exists.

@tlikhomanenko thanks for your advice and I do as it. When I run "builb/Decoder" to decoder my chinese audios, it always report error what(): Unknown entry in dictionary: '' . I looked at the Decode.cpp and found this line int silIdx = tokenDict.getIndex(FLAGS_wordseparator); , it leads to error because '' is not in my tokens.txt, Can you give some advice to fix this errors, thanks

@luweishuang

could you try to set FLAGS_wordseparator=_ for example, so that this token doesn't exist in your token dict and silIdx will be set to -1. Or inside code you can fix to

int silIdx = -1;
if (FLAGS_wordseparator != "") {
  silIdx = tokenDict.getIndex(FLAGS_wordseparator);
}

Let me know if this works, I will send the fix in the code.

@tlikhomanenko yes, using int silIdx = -1; if (FLAGS_wordseparator != "") { silIdx = tokenDict.getIndex(FLAGS_wordseparator); } instead of silIdx = tokenDict.getIndex(FLAGS_wordseparator); can directly get right results.

Closing for now, @wwxm0523 @luweishuang feel free to reopen if the task is not solved for you!

i meet the same problem, how can i fix it? i attach my config and architect in pictures

Screenshot from 2021-01-14 08-03-15

Screenshot from 2021-01-14 08-03-33

Screenshot from 2021-01-14 08-04-45

Could you provide the whole arch file here?

Could you provide the whole arch file here?

i findout the reason, because i move --arch line in train.cfg file to after --archdir, it shoud after --lexicon, when i return this line as recipe train.cfg it can run normally!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JanX2 picture JanX2  Â·  5Comments

gauenk picture gauenk  Â·  3Comments

nutriver picture nutriver  Â·  3Comments

ekorudi picture ekorudi  Â·  5Comments

megharangaswamy picture megharangaswamy  Â·  5Comments