Mumble: Any suggestions in implementing stereo audio?

Created on 20 May 2020 · 17Comments · Source: mumble-voip/mumble

I have been watching for the progress of supporting stereo audio for months. But it seems that #2829 has been opened for years and little progress has been made since that issue started. And this feature, though requested by a large amount of folks, is not on the priority list of mumble developers.

In my community, people used to say "you can you up", means stop complaining and get things done by oneselves. I think now it is the time, at least to consider implementing this feature by ourselves--the people who want it.

However, as mentioned in #2829, the audio part is convoluted and it is no fun playing around it. So I'd like to open this issue to discuss with developers regarding the way of implementing this feature.

In #2829, @davidebeatrici provided some guides:

Ensure that all audio backends support stereo input.
Create stereo decoder: https://github.com/mumble-voip/mumble/blob/7dd9d50efd6b8d755193ccd0612606777cd9ec45/src/mumble/AudioOutputSpeech.cpp#L52
Change code in AudioOutputSpeech.cpp so that multiple channels are supported.

_Originally posted by @davidebeatrici in https://github.com/mumble-voip/mumble/issues/2829#issuecomment-554787486_

However, I'm still concerning the backward compatible problem, if we just hard code these changes, it may caused old-version client unusable and all 3rd-party packages and apps to be updated. I noticed in #2829, @mkrautz mentioned adding a codec, can it be the solution for this problem?

And I'm just wondering what we have to do to the positional audio and echo cancellation, since it also something need to sort out.

I'd like to see mumble developers providing some guidances and suggestions so we can build a roadmap here and get our hands dirty asap. Thanks!!!

stale support

Source

TerryGeng

Most helpful comment

First off I think if this is tackled it should be tackled to allow an arbitrary/variable stream count. For that the audio must contain a header that would hold information on how many channels are in this audio stream.

And to be completely honest: I think considering to re-write the audio code completely might be a viable alternative. As it is so central for Mumble, it's just poor that it is in this ridiculous state, so that nobody really wants to work on it.
This also goes hand-in-hand with the recent plans of rewriting Mumble as a whole in order to get a modern and well organized code-base that is easy to work with.

Krzmbrzl on 20 May 2020

❤3

All 17 comments

I would consider making it optional, because many use cases need as low a latency as possible, and don't need stereo audio. The ideal thing would be for it to be an option per-user.

felix91gr on 20 May 2020

Also, maybe this is the architecture you could use for backwards compatibility:

Server accepts stereo and mono.
Clients who are able to send stereo, send stereo if the user asks to send stereo sound.
Clients that are able to accept stereo, tell the Server that they can. To those, the server sends the audio streams as it receives them, either stereo or mono.
Clients that are not able to accept stereo work as they do today. To those, the server sends the Mono audio streams as it does today, and combines the Stereo streams before sending, and sends them as Mono.

felix91gr on 20 May 2020

👍1

See https://github.com/mumble-voip/mumble/issues/2829#issuecomment-279192882

If I understand it correctly every client should be able to decode a stereo stream. The Opus decoder would do the downmix to mono automatically. Works fine for SIP, e.g. in SDP it's always negotiated as opus/48000/2 even for a mono session. See https://datatracker.ietf.org/doc/rfc7587

Most of the mentioned issues are non-issues.

No ideas how stereo and positional audio would work. I would just disable it. Is there a use case for positional audio for stereo streams? I could imagine a 2-channel (or multi-channel) stream with a position for each channel, but that wouldn't be compatible with previous clients.

streaps on 20 May 2020

👍1

Krzmbrzl on 20 May 2020

❤3

@Krzmbrzl Thanks for the reply!

As for a special header indicating the channel numbers in the packet, it will be great since it provides much more possibilities than simple create a so called "steoro_opus" codec. And I also read the advice from @streaps, from whom it seems that we can actually just initialize opus in stereo mode and opus will be able to handle mono and stereo at the same time, I will test that and if it works, we are good! And just out of curiosity, wouldn't a change in the protocol causes some compatibility issues for older versions? I think maybe I can leave the support for arbitrary channels to the refactoring plan you have mentioned :).

Speaking of the refactoring plan, I have read about it in #2829 and other issues many times, and absolutely I believe that is the ultimate solution! However, based on my observation, although the mumble community is supportive and active, the fact is, developers are working part-time on this project, and tasks on the roadmap take years to even start to be considered, like the video feature. I think there must be piles of other important tasks on @davidebeatrici's hand that have more priorities than this one. That's the reason I start to plan for this. Just to satisfy this urgent need of many folks :D.

And I also have the wish of participating in the development of mumble, so this could be a good starting point for me to get to know the core of mumble.

TerryGeng on 20 May 2020

Backwards compatibility is of course an issue with this and right now there's no header in for this in the audio stream. That's of course an issue and I'm not sure yet as to how this could be worked around (other than dropping a 2.0 release that doesn't have backwards compatibility).

And yes it is true that we have a very long list of items that have to be done, but a rewrite would facilitate future changes so at this point I think it's definitely worth it :)

For the short term though I think it'd be great to have stereo transmissions if it can be implemented reasonably simple (don't really know whether that's possible as I don't know the audio code).

Great to hear that you want to contribute code to the project. We can definitely use helping hands here :D

Krzmbrzl on 20 May 2020

This also goes hand-in-hand with the recent plans of rewriting Mumble as a whole in order to get a modern and well organized code-base that is easy to work with.

Is there an issue about this topic? I wanna read it :D
How much rewriting are we talking about here? I ask because given the development on Rust over the recent years, a just-as-performant (if not more) rewrite in Rust is possible today.
If we're talking instead about piecemeal rewrite (so, a rewrite of each module), are the modules decoupled enough? Maybe we could start by defining an internal interface, so that the different Mumble pieces can be made independent of each other and thus rewritten piecemeal without breaking everything else :3

felix91gr on 20 May 2020

👍1

Is there an issue about this topic? I wanna read it :D

Not yet no. We only discussed it internally. But I guess it wouldn't hurt to create one :)

See #4195

Krzmbrzl on 21 May 2020

❤1 👍1

Backwards compatibility is of course an issue with this and right now there's no header in for this in the audio stream.

I don't understand the need for a (additional) header. See https://tools.ietf.org/html/rfc6716#section-2.1.2

streaps on 21 May 2020

Ah so the Opus format actually already contains a header for this? Well in that case we obviously don't have to duplicate that data ^^

Krzmbrzl on 22 May 2020

Okay, so now I have my mumble successfully compiled on my Mac, and I have started looking into the audio part.

I'd like to check again that the server just relays the audio packet, it will not decode the packet. Is it right? If so, I don't have to change anything on the server and we are good. But if this is not the case, perhaps things will be more complicated...

TerryGeng on 24 May 2020

I'd like to check again that the server just relays the audio packet, it will not decode the packet. Is it right?

This is indeed correct (afaik anyways xD)

Krzmbrzl on 24 May 2020

I have made some prototype and see an opus decoder initialized with two channels does support both mono and stereo opus streams.

I think as long as I assume all streams from opus users are stereo and then opus decoder can do the rest of the job.

Thank @streaps for providing this useful information.

TerryGeng on 26 May 2020

❤1 👍1

I did some more reading. Mono or stereo is indicated in every Opus packet and it can change from packet to packet (as can the bitrate, mode and frame size). Every decoder should expect and support mono and stereo packets.

https://tools.ietf.org/html/rfc6716#section-3.1

3.1.  The TOC Byte

A well-formed Opus packet MUST contain at least one byte [R1].  This
byte forms a table-of-contents (TOC) header that signals which of the
various modes and configurations a given packet uses.  It is composed
of a configuration number, "config", a stereo flag, "s", and a frame
count code, "c", arranged as illustrated in Figure 1.  A description
of each of these fields follows.

  0
  0 1 2 3 4 5 6 7
  +-+-+-+-+-+-+-+-+
  | config  |s| c |
  +-+-+-+-+-+-+-+-+

streaps on 26 May 2020

👍2

This support-issue has been automatically marked as stale because it has not had recent activity. If no further activity occurs, the issue will be automatically closed as we'll assume your problem to be fixed.

stale[bot] on 30 May 2020

not sure why this is labeled as a support issue?

streaps on 2 Jun 2020

Because Terry asked for help - which is what support is about. Now that the implementation is ready, this issue is not needed anymore and was thus closed.

Krzmbrzl on 2 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings