Currently the decoder API returns a std::vector of beam_size Output structures sorted by descending probability, each struct representing a candidate transcription. Exposing this information in the API would involve the following steps:
decoder_decode since currently it's doing a lot of useless work computing the Output structures for all beams.*WithMetadata methods could be changed to return an array of Metadata structs rather than just one.top_paths parameter in the changed/new API.Oh, yeah, and probably the most time consuming part:
I'm happy to mentor anyone interested in taking this on. Also, if anyone is interested in doing only parts 1-3, I will make sure the bindings work.
I鈥檇 be happy to look at parts 1-3, @reuben. I鈥檒l look through the code and get back to you if I have any questions.
@dabinat someone reached out to me expressing interest in contributing to this feature, so if you're also interested in working on #1678, I'd suggest taking that one first.
Hey @reuben, does this still need to be done? And is the implementation you listed above still valid with all the changes made recently to master?
@dabinat hi! Yeah, that is still valid. I fixed one minor point which was that we were computing and returning beam_size paths just to throw away all but one. That is fixed by separating beam_width and top_paths in decoder_decode. But the overall structure of the work should still be the same.
Returning a list of several paths might be tricky to get working with SWIG plus all our different bindings, so if you get stuck on that part, just open the PR as is and I'll help get the bindings working.
Thanks for the interest!
@dabinat / @reuben - has there been any progress on this feature and do you think it might make it in in time for the 0.6.0 release? Would be really handy!
@nmstoker I got this half finished, then I got tied up with other things. But since then a lot of significant structural changes have been made on master, so it seems like maybe it鈥檚 actually simpler to just redo it from scratch.
I should have time within the next week or two to look at it again.
@nmstoker I got this half finished, then I got tied up with other things. But since then a lot of significant structural changes have been made on master, so it seems like maybe it鈥檚 actually simpler to just redo it from scratch.
I should have time within the next week or two to look at it again.
No pressure here @dabinat, but have you made progresses on that ?
@lissyx Sorry, lots of projects on the go right now. I鈥檝e made a note to look into this as soon as I can.
Hello @dabinat , @lissyx, @reuben (or others :-) )
I am quite interested to get/use this feature.
I have two years Python programming experience but no DeepSpeech experience. In case I would implement this feature by myself:
Do you have an idea, how many hours this could (roughly) take me?
I know this is a very difficult question, but perhaps you are able to deliver just a _rough order of magnitude_ estimation.
Thanks in advance.
Fixed by #2792
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
@lissyx Sorry, lots of projects on the go right now. I鈥檝e made a note to look into this as soon as I can.