Deepspeech: Expose multiple candidate transcriptions in API (top_paths/top_n != 1)

Created on 10 Mar 2017  路  14Comments  路  Source: mozilla/DeepSpeech

P2

Most helpful comment

@lissyx Sorry, lots of projects on the go right now. I鈥檝e made a note to look into this as soon as I can.

All 14 comments

Currently the decoder API returns a std::vector of beam_size Output structures sorted by descending probability, each struct representing a candidate transcription. Exposing this information in the API would involve the following steps:

  1. Add a separate top_paths/top_n parameter to the decoder API that allows applications to specify different values for beam_width and top_paths. Right now they're tied together but it's very reasonable that one would want to decode using a beam width of say, 500, but only look at the top 5 results. This would also speed up decoder_decode since currently it's doing a lot of useless work computing the Output structures for all beams.
  2. Modify or extend API to expose multiple outputs per call. For example, the *WithMetadata methods could be changed to return an array of Metadata structs rather than just one.
  3. Expose the new top_paths parameter in the changed/new API.

Oh, yeah, and probably the most time consuming part:

  1. Make sure changed/new API is properly exposed to our bindings.

I'm happy to mentor anyone interested in taking this on. Also, if anyone is interested in doing only parts 1-3, I will make sure the bindings work.

I鈥檇 be happy to look at parts 1-3, @reuben. I鈥檒l look through the code and get back to you if I have any questions.

@dabinat someone reached out to me expressing interest in contributing to this feature, so if you're also interested in working on #1678, I'd suggest taking that one first.

Hey @reuben, does this still need to be done? And is the implementation you listed above still valid with all the changes made recently to master?

@dabinat hi! Yeah, that is still valid. I fixed one minor point which was that we were computing and returning beam_size paths just to throw away all but one. That is fixed by separating beam_width and top_paths in decoder_decode. But the overall structure of the work should still be the same.

Returning a list of several paths might be tricky to get working with SWIG plus all our different bindings, so if you get stuck on that part, just open the PR as is and I'll help get the bindings working.

Thanks for the interest!

@dabinat / @reuben - has there been any progress on this feature and do you think it might make it in in time for the 0.6.0 release? Would be really handy!

@nmstoker I got this half finished, then I got tied up with other things. But since then a lot of significant structural changes have been made on master, so it seems like maybe it鈥檚 actually simpler to just redo it from scratch.

I should have time within the next week or two to look at it again.

@nmstoker I got this half finished, then I got tied up with other things. But since then a lot of significant structural changes have been made on master, so it seems like maybe it鈥檚 actually simpler to just redo it from scratch.

I should have time within the next week or two to look at it again.

No pressure here @dabinat, but have you made progresses on that ?

@lissyx Sorry, lots of projects on the go right now. I鈥檝e made a note to look into this as soon as I can.

Hello @dabinat , @lissyx, @reuben (or others :-) )

I am quite interested to get/use this feature.

I have two years Python programming experience but no DeepSpeech experience. In case I would implement this feature by myself:

Do you have an idea, how many hours this could (roughly) take me?

I know this is a very difficult question, but perhaps you are able to deliver just a _rough order of magnitude_ estimation.

Thanks in advance.

Fixed by #2792

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

beriberikix picture beriberikix  路  36Comments

erksch picture erksch  路  38Comments

istojan picture istojan  路  54Comments

shyamalschandra picture shyamalschandra  路  25Comments

breandan picture breandan  路  41Comments