Deepspeech: Expose multiple candidate transcriptions in API (top_paths/top_n != 1)

Created on 10 Mar 2017 · 14Comments · Source: mozilla/DeepSpeech

Source

kdavis-mozilla

Most helpful comment

@lissyx Sorry, lots of projects on the go right now. I’ve made a note to look into this as soon as I can.

dabinat on 2 Feb 2020

👍3

All 14 comments

Currently the decoder API returns a std::vector of beam_size Output structures sorted by descending probability, each struct representing a candidate transcription. Exposing this information in the API would involve the following steps:

Add a separate top_paths/top_n parameter to the decoder API that allows applications to specify different values for beam_width and top_paths. Right now they're tied together but it's very reasonable that one would want to decode using a beam width of say, 500, but only look at the top 5 results. This would also speed up decoder_decode since currently it's doing a lot of useless work computing the Output structures for all beams.
Modify or extend API to expose multiple outputs per call. For example, the *WithMetadata methods could be changed to return an array of Metadata structs rather than just one.
Expose the new top_paths parameter in the changed/new API.

reuben on 29 May 2019

👍1

Oh, yeah, and probably the most time consuming part:

Make sure changed/new API is properly exposed to our bindings.

reuben on 29 May 2019

I'm happy to mentor anyone interested in taking this on. Also, if anyone is interested in doing only parts 1-3, I will make sure the bindings work.

reuben on 29 May 2019

I’d be happy to look at parts 1-3, @reuben. I’ll look through the code and get back to you if I have any questions.

dabinat on 29 May 2019

@dabinat someone reached out to me expressing interest in contributing to this feature, so if you're also interested in working on #1678, I'd suggest taking that one first.

reuben on 29 May 2019

Hey @reuben, does this still need to be done? And is the implementation you listed above still valid with all the changes made recently to master?

dabinat on 28 Aug 2019

👍1

@dabinat hi! Yeah, that is still valid. I fixed one minor point which was that we were computing and returning beam_size paths just to throw away all but one. That is fixed by separating beam_width and top_paths in decoder_decode. But the overall structure of the work should still be the same.

Returning a list of several paths might be tricky to get working with SWIG plus all our different bindings, so if you get stuck on that part, just open the PR as is and I'll help get the bindings working.

Thanks for the interest!

reuben on 28 Aug 2019

👍2

@dabinat / @reuben - has there been any progress on this feature and do you think it might make it in in time for the 0.6.0 release? Would be really handy!

nmstoker on 7 Oct 2019

@nmstoker I got this half finished, then I got tied up with other things. But since then a lot of significant structural changes have been made on master, so it seems like maybe it’s actually simpler to just redo it from scratch.

I should have time within the next week or two to look at it again.

dabinat on 10 Oct 2019

👍3

@nmstoker I got this half finished, then I got tied up with other things. But since then a lot of significant structural changes have been made on master, so it seems like maybe it’s actually simpler to just redo it from scratch.

I should have time within the next week or two to look at it again.

No pressure here @dabinat, but have you made progresses on that ?