I have been watching for the progress of supporting stereo audio for months. But it seems that #2829 has been opened for years and little progress has been made since that issue started. And this feature, though requested by a large amount of folks, is not on the priority list of mumble developers.
In my community, people used to say "you can you up", means stop complaining and get things done by oneselves. I think now it is the time, at least to consider implementing this feature by ourselves--the people who want it.
However, as mentioned in #2829, the audio part is convoluted and it is no fun playing around it. So I'd like to open this issue to discuss with developers regarding the way of implementing this feature.
In #2829, @davidebeatrici provided some guides:
AudioOutputSpeech.cpp so that multiple channels are supported._Originally posted by @davidebeatrici in https://github.com/mumble-voip/mumble/issues/2829#issuecomment-554787486_
However, I'm still concerning the backward compatible problem, if we just hard code these changes, it may caused old-version client unusable and all 3rd-party packages and apps to be updated. I noticed in #2829, @mkrautz mentioned adding a codec, can it be the solution for this problem?
And I'm just wondering what we have to do to the positional audio and echo cancellation, since it also something need to sort out.
I'd like to see mumble developers providing some guidances and suggestions so we can build a roadmap here and get our hands dirty asap. Thanks!!!
I would consider making it optional, because many use cases need as low a latency as possible, and don't need stereo audio. The ideal thing would be for it to be an option per-user.
Also, maybe this is the architecture you could use for backwards compatibility:
See https://github.com/mumble-voip/mumble/issues/2829#issuecomment-279192882
If I understand it correctly every client should be able to decode a stereo stream. The Opus decoder would do the downmix to mono automatically. Works fine for SIP, e.g. in SDP it's always negotiated as opus/48000/2 even for a mono session. See https://datatracker.ietf.org/doc/rfc7587
Most of the mentioned issues are non-issues.
No ideas how stereo and positional audio would work. I would just disable it. Is there a use case for positional audio for stereo streams? I could imagine a 2-channel (or multi-channel) stream with a position for each channel, but that wouldn't be compatible with previous clients.
First off I think if this is tackled it should be tackled to allow an arbitrary/variable stream count. For that the audio must contain a header that would hold information on how many channels are in this audio stream.
And to be completely honest: I think considering to re-write the audio code completely might be a viable alternative. As it is so central for Mumble, it's just poor that it is in this ridiculous state, so that nobody really wants to work on it.
This also goes hand-in-hand with the recent plans of rewriting Mumble as a whole in order to get a modern and well organized code-base that is easy to work with.
@Krzmbrzl Thanks for the reply!
As for a special header indicating the channel numbers in the packet, it will be great since it provides much more possibilities than simple create a so called "steoro_opus" codec. And I also read the advice from @streaps, from whom it seems that we can actually just initialize opus in stereo mode and opus will be able to handle mono and stereo at the same time, I will test that and if it works, we are good! And just out of curiosity, wouldn't a change in the protocol causes some compatibility issues for older versions? I think maybe I can leave the support for arbitrary channels to the refactoring plan you have mentioned :).
Speaking of the refactoring plan, I have read about it in #2829 and other issues many times, and absolutely I believe that is the ultimate solution! However, based on my observation, although the mumble community is supportive and active, the fact is, developers are working part-time on this project, and tasks on the roadmap take years to even start to be considered, like the video feature. I think there must be piles of other important tasks on @davidebeatrici's hand that have more priorities than this one. That's the reason I start to plan for this. Just to satisfy this urgent need of many folks :D.
And I also have the wish of participating in the development of mumble, so this could be a good starting point for me to get to know the core of mumble.
Backwards compatibility is of course an issue with this and right now there's no header in for this in the audio stream. That's of course an issue and I'm not sure yet as to how this could be worked around (other than dropping a 2.0 release that doesn't have backwards compatibility).
And yes it is true that we have a very long list of items that have to be done, but a rewrite would facilitate future changes so at this point I think it's definitely worth it :)
For the short term though I think it'd be great to have stereo transmissions if it can be implemented reasonably simple (don't really know whether that's possible as I don't know the audio code).
Great to hear that you want to contribute code to the project. We can definitely use helping hands here :D
This also goes hand-in-hand with the recent plans of rewriting Mumble as a whole in order to get a modern and well organized code-base that is easy to work with.
Is there an issue about this topic? I wanna read it :D
Not yet no. We only discussed it internally. But I guess it wouldn't hurt to create one :)
See #4195
Backwards compatibility is of course an issue with this and right now there's no header in for this in the audio stream.
I don't understand the need for a (additional) header. See https://tools.ietf.org/html/rfc6716#section-2.1.2
Ah so the Opus format actually already contains a header for this? Well in that case we obviously don't have to duplicate that data ^^
Okay, so now I have my mumble successfully compiled on my Mac, and I have started looking into the audio part.
I'd like to check again that the server just relays the audio packet, it will not decode the packet. Is it right? If so, I don't have to change anything on the server and we are good. But if this is not the case, perhaps things will be more complicated...
I'd like to check again that the server just relays the audio packet, it will not decode the packet. Is it right?
This is indeed correct (afaik anyways xD)
I have made some prototype and see an opus decoder initialized with two channels does support both mono and stereo opus streams.
I think as long as I assume all streams from opus users are stereo and then opus decoder can do the rest of the job.
Thank @streaps for providing this useful information.
I did some more reading. Mono or stereo is indicated in every Opus packet and it can change from packet to packet (as can the bitrate, mode and frame size). Every decoder should expect and support mono and stereo packets.
https://tools.ietf.org/html/rfc6716#section-3.1
3.1. The TOC Byte
A well-formed Opus packet MUST contain at least one byte [R1]. This
byte forms a table-of-contents (TOC) header that signals which of the
various modes and configurations a given packet uses. It is composed
of a configuration number, "config", a stereo flag, "s", and a frame
count code, "c", arranged as illustrated in Figure 1. A description
of each of these fields follows.
0
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| config |s| c |
+-+-+-+-+-+-+-+-+
This support-issue has been automatically marked as stale because it has not had recent activity. If no further activity occurs, the issue will be automatically closed as we'll assume your problem to be fixed.
not sure why this is labeled as a support issue?
Because Terry asked for help - which is what support is about. Now that the implementation is ready, this issue is not needed anymore and was thus closed.
Most helpful comment
First off I think if this is tackled it should be tackled to allow an arbitrary/variable stream count. For that the audio must contain a header that would hold information on how many channels are in this audio stream.
And to be completely honest: I think considering to re-write the audio code completely might be a viable alternative. As it is so central for Mumble, it's just poor that it is in this ridiculous state, so that nobody really wants to work on it.
This also goes hand-in-hand with the recent plans of rewriting Mumble as a whole in order to get a modern and well organized code-base that is easy to work with.