Wav2letter: better predict got wrose WER

Created on 21 Aug 2020 · 1Comment · Source: flashlight/wav2letter

Question

Hello, I have a question for decode log of WER.

i run _/wav2letter/build/Decoder --flagsfile=config.cfg_, something like:

|T|: people pasted window papercuts to celebrate chinese new year
|P|: people can can in the papercuts to celebrate chinese new year
|t|: p e o p l e _ p a s t e d _ w i n d o w _ p a p e r c u t s _ t o _ c e l e b r a t e _ c h i n e s e _ n e w _ y e a r
|p|: p e o p l e _ c a n _ c a n _ i n _ t h e _ p a p e r c u t s _ t o _ c e l e b r a t e _ c h i n e s e _ n e w _ y e a r
[sample: 449d9026d7398467a16d405cc0e92e, WER: 44.4444%, LER: 18.3333%, slice WER: 20.3955%, slice LER: 15.6813%, decoded samples (thread 2): 597]
|T|: these are red lanterns the red lanterns should be hung on the tree
|P|: it's a red lanterns to red lanterns should be on the tree
|t|: t h e s e _ a r e _ r e d _ l a n t e r n s _ t h e _ r e d _ l a n t e r n s _ s h o u l d _ b e _ h u n g _ o n _ t h e _ t r e e
|p|: i t ' s _ a _ r e d _ l a n t e r n s _ t o _ r e d _ l a n t e r n s _ s h o u l d _ b e _ o n _ t h e _ t r e e
[sample: fda0c350cfcc2e2626fc16373d1541, WER: 30.7692%, LER: 19.697%, slice WER: 21.6992%, slice LER: 16.4165%, decoded samples (thread 4): 589]

|T|: i like salad
|P|: i like salad
|t|: i _ l i k e _ s a l a d
|p|: i _ l i k e _ s a l a d
[sample: ca1ea83a8a943978729820ed37ef68, WER: 0%, LER: 0%, slice WER: 23.8167%, slice LER: 18.4226%, decoded samples (thread 5): 615]
|T|: you should stop eating fatty meat
|P|: you should stop eating fatty meat
|t|: y o u _ s h o u l d _ s t o p _ e a t i n g _ f a t t y _ m e a t
|p|: y o u _ s h o u l d _ s t o p _ e a t i n g _ f a t t y _ m e a t
[sample: 93bb6a67fa39d7ebe492937647c56, WER: 0%, LER: 0%, slice WER: 22.8326%, slice LER: 17.1254%, decoded samples (thread 1): 612]

the first and second transcript get wrong predict sentence but have 20% and 21% WER. the third and fourth transcript get right predict sentence but only 23% and 22% WER.

How is the WER score calculated？ is it （N_replace + N_delete + N_insert ）/ N_all ?

Additional Context

here is my decoder config:

--am=/mnt/data/speech/qkids-data/sconv/001_model_#mnt#data#speech#qkids-data#lists#dev.lst.bin
--tokensdir=/mnt/data/speech/qkids-data/am
--tokens=qkids-train-all-unigram-2882.tokens
--lexicon=/mnt/data/speech/qkids-data/am/qkids-train+dev-unigram-2882-nbest5.lexicon
--datadir=/mnt/data/speech/qkids-data/lists
--test=dev.lst
--lm=/mnt/data/speech/qkids-data/lm/3-gram.arpa
--lmweight=0.5515838301157
--wordscore=0.52526055643809
--uselexicon=true
--decodertype=wrd
--lmtype=kenlm
--silscore=0
--beamsize=500
--beamsizetoken=100
--beamthreshold=100
--nthread_decoder=8
--smearing=max
--show
--showletters