Mumble: Test and improve RNNoise and add some info in UI & wiki

Created on 17 May 2020  路  38Comments  路  Source: mumble-voip/mumble

Context:
RNNoise, Audiofilters

Description:
As pointed out by @fedetft in https://github.com/mumble-voip/mumble/issues/4127#issuecomment-629653914

  • libspeexdsp already includes a noise-canceller that is enabled by default
  • rnnoise is maybe not implemented correctly and should be tested once more
  • rnnoise might interfere with libspeexdsp (should be tested, whether thats the case)

Potencial Todo:

  • [ ] Test RNNoise (again) alongside all other Audio-Filters (Noise-Cancel by libspeex & Echo-Cancel by libspeex)
  • [ ] Check whether noise-cancel by libspeex can (or should) be deactivated, if RNNoise is used
  • [x] (fixed in #4212) Improve the implementation of RNNoise (especially the order of the filters)
  • [ ] (optional) inform users that noise-cancel is already active (by libspeex) and that RNNoise is an additional filter (later this could be modified to other potencial usecases, listed above)
documentation

Most helpful comment

Maybe remove it until it's fixed?

I would still recommend to disable it or notify the user not to use it for now.

A compromise would be: how about labeling it as experimental feature?

I strongly disagree with labeling RNNoise as experimental and especially so with removing it. RNNoise is one of the prominent features of Mumble and everyone that I know who has used it remarks that it is the best noise suppression out of every voice chat program that they have ever used. Removal or discouraging use of RNNoise would force me to maintain a fork of and provide builds of Mumble retaining the feature. To those who are advocating against it: Have you actually used it?

All 38 comments

libspeexdsp already includes a noise-canceller that is enabled by default
rnnoise is maybe not implemented correctly and should be tested once more
rnnoise might interfere with libspeexdsp

That's not information that concerns the end-user. Therefore it shouldn't be in the UI. If you want to, you can create a wiki page for it, so everyone interested can have a look.

Explain RNNoise better.

What kind of explanation would you like to have in the UI?

Let's wait until theRNNoise / speexdsp stuff is fixed / improved. I would like to have an option to switch between SpeexDSP, RNNoise and "off". Then we could add some short explanation about the differences.

I don't think it is useful to add a list of reasons why you shouldn't use RNNoise. Maybe remove it until it's fixed?

Maybe remove it until it's fixed?

Uhm I guess that'd be an option. On the other hand if we remove it now, it'll seem like a step backwards to the end-user, so maybe given that it has been in this state for a while I don't think that it'll hurt to leave it that way until it's fixed...

Uhm I guess that'd be an option. On the other hand if we remove it now, it'll seem like a step backwards to the end-user, so maybe given that it has been in this state for a while I don't think that it'll hurt to leave it that way until it's fixed...

Well it might not function in the way the user desires, this could be seen as a bug and bugs should be fixed.
I would still recommend to disable it or notify the user not to use it for now.

A compromise would be: how about labeling it as experimental feature?

Regarding the info: The user should at least be informed that noise cancelling is already happening (through libspeex).
Of course he doesn't need to know all the technical details, but he should know that it is not necessary to enable RNNoise.

My personal observation after testing over the last few days, is RNNoise is the only thing is that lets me use voice activity mode with my microphone without triggering voice activity when i type (due to position of a mechanical keyboard and microphone).

Maybe remove it until it's fixed?

I would still recommend to disable it or notify the user not to use it for now.

A compromise would be: how about labeling it as experimental feature?

I strongly disagree with labeling RNNoise as experimental and especially so with removing it. RNNoise is one of the prominent features of Mumble and everyone that I know who has used it remarks that it is the best noise suppression out of every voice chat program that they have ever used. Removal or discouraging use of RNNoise would force me to maintain a fork of and provide builds of Mumble retaining the feature. To those who are advocating against it: Have you actually used it?

I guess the removal discussion is settled then. It's obviously good enough that it's useful for some users.

@TredwellGit
If it works fine, I won't object against it being kept activated :slightly_smiling_face: .

I only wanted to give some information that came up, thanks to someone looking at the code.

Nonetheless I think the general purposes of this issue remain:

  • RNNoise should be tested once more and the implementation improved if necessary
  • Users should be informed about noice-cancel already active (with libspeex) and that RNNoise is just an additional filter

I changed the title for that.

Users should be informed about noice-cancel already active (with libspeex) and that RNNoise is just an additional filter

We shouldn't add warnings directed to the user for problems in the code. These problems should be fixed.

RNNoise is the only thing is that lets me use voice activity mode with my microphone without triggering voice activity when i type (due to position of a mechanical keyboard and microphone).

Shouldn't VAD be able to distinguish between voice and keyboard sounds (with or without RNNoise enabled)? I wonder if Voice Activity works more like a noise gate in Mumble.

Shouldn't VAD be able to distinguish between voice and keyboard sounds (with or without RNNoise enabled)? I wonder if Voice Activity works more like a noise gate in Mumble.

Might be a case for another issue report then :wink:.

We shouldn't add warnings directed to the user for problems in the code. These problems should be fixed.

Well, I think a notice about already active noise filters is not a warning, it just clarifies to only use rnnoise of you still have noise problems and want to solve them.

VAD in Mumble is currently a bit basic. You have two options:

  • Amplitude: threshold based: anything that hasn't been removed by the echo/noise canceller that has an high enough amplitude triggers the voice activation
  • Signal to noise: using libspeexdsp, "replaced by a hack pending a complete rewrite" by the libspeexdsp developers. No idea what the hack is in detail. Is anyone using this option?

Also consider that before https://github.com/mumble-voip/mumble/pull/4167 any "smart" voice activation that would try to detect an actual voice was basically impossible in Mumble, as it would have been triggered by the echo...

I was using signal to voice but it tended to pass "voice" if i was silent for too long. Guess that ratio is relative to the previous audio frames or something like that...

In any case, my comment from having read the code is that rnnoise seems to have been bodged in rather than being well integrated, or at least the curious design choices that have been made have not been documented in the source code.

In particular:

  • Rnnoise is applied before the echo canceller, while libspeexdsp's noise canceller is applied after, why?
  • When rnnoise is enabled, libspeexdsp's noise canceller is not disabled, leading to two noise cancellers being run, and with the echo canceller in the middle, why?

These choices may also have unintended consequences:

  • Rnnoise may impact the efficacy of the echo canceller
  • We may be wasting CPU for nothing by running two noise cancellers in a row

I think that it may make sense to try a more conventional configuration, such as disabling libspeexdsp's noise canceller when rnnoise is active and putting it after the echo canceller. Can someone like @TredwellGit try such a configuration and see whether rnnoise works just as well?

Once rnnoise is properly integrated, Mumble devs should also advertise it more, both in the input configuration and in the wiki, otherwise potentially interested users will have a hard time discovering it.

RNNoise works well at filtering out typing sounds and eating/chewing food sounds. Whatever replaces it, if anything, shouldn't be any worse in these aspects.

RNNoise also provides VAD, but we currently rely on libspeexdsp for that.

We should always use RNNoise's VAD when available and provide an option to completely disable the Speex preprocessor (#3323).

As for improving our RNNoise implementation: last year the library's API was changed so that it allows to save/load the machine learning progress. Ideally, we should save the progress either in the configuration file or in the SQLite database.

If rnnoise provides VAD, then that's another strong point in favor of putting it after echo cancellation, so it doesn't consider echo as a voice.

After https://github.com/mumble-voip/mumble/pull/4167 is merged I might find the time to propose a patch for the rnnoise users to try out.

I don't see any advantage in using noise suppression before the echo canceller, especially not with a strong noise suppressor like RNNoise. I think putting RNNoise behind the echo canceller is a win-win for both.

I don't have much experience with that stuff and I didn't do any tests, but my basic understanding of audio processing and experience tells me that putting a noise suppressor before the echo canceller makes no sense at all. Please correct me, if I'm wrong.

My one cent for this discussion: just like @TredwellGit, I use RNNoise extensively, and it's vital to my friends not wanting to kill me for my noisy setup.

My setup has the following flaws:

  1. My house is very old, and the electrical wiring is so badly done that the earthing got oxidised and is no longer working. This, plus the already existing noise in the wires, means that a lot of noise gets through to my computer chassis whenever I'm not on battery power.
  2. My headset is okay, but not great. It has slight electronic noises stemming from its poor connections, and those become very apparent whenever I use the volume controls and the mute switch.

Yet all of these, I repeat, _all of these_, get removed by RNNoise. I am astounded by how good it is. It really is a marvel. These are good times to be alive.

@jj777 @trudnorx @felix91gr as you seem to be power users of the RNNoise feature in Mumble, could you please test the changes made in #4212 ?
We'd need some feedback to verify that it still works as expected :point_up:

I can do some testing today/tomorrow - just to confirm, I should be building using mumble-releng's 1.3 scripts and the branch above? I saw a comment somewhere in the post re: the echo canceller fixes saying it wouldn't work until 1.4 - but I can't get that windows build of that compiling

All new features are built again 1.4.0 (current master).
You can simply download the windows installer from the CI though, so you don't have to compile it yourself :)

So just did a test with another user on my server.

This is the first time we've done a test with both people on continuous transmission mode and both on speakers - Windows 10 x64 both using the same CI build from yesterday (30/05).

Some notes:

  1. First off, we noticed echo cancellation wasn't working very well - however, when we ignored the Mumble text instructions around maximising the mic boost in Windows - we both reduced the Mic Windows from 100 to 75-80ish and then the echo canceller seemed to work pretty well (it may make sense to adjust this guidance on the help text before 1.4 goes live).
  2. We had a test conversation with me playing music on speakers - and the music was pretty much cancelled out properly (i.e. the most listening back to the recording was a tiny quarter sec bit that you'd need to be looking for).
  3. RNNoise seems to work great still in this setup - we did a recording test and it was almost impossible to hear me typing away on my Cherry Blue keys. When I disabled it, it appeared that my typing was very noticeable ("please stop").

EDIT: (Though I just noted that the changes in #4212 haven't been merged in yet, so we may have just tested the new echo canceller and old implementation instead based on that build). Let me know if there's a way of getting a binary to test of the 4212 code.)

Did you use the installer from the CI? If so you had the changes of that PR included

First off, we noticed echo cancellation wasn't working very well - however, when we ignored the Mumble text instructions around maximising the mic boost in Windows

Maximising the mic boost shouldn't be advisable. This is not a analog tape recorder or a guitar amp. Ideally you want plenty of headroom to avoid any digital clipping (or major analog distortion). If the voice is not loud enough it could be amplified after the ADC in the digital domain. It really doesn't matter much, if you lose 12 dB or 24 dB dynamic range, because the mic input is not levelled to maximum.

Did you use the installer from the CI? If so you had the changes of that PR included

Yep, from the link I included - should be right then! We were pretty happy with what we tested.

First off, we noticed echo cancellation wasn't working very well - however, when we ignored the Mumble text instructions around maximising the mic boost in Windows

Maximising the mic boost shouldn't be advisable. This is not a analog tape recorder or a guitar amp. Ideally you want plenty of headroom to avoid any digital clipping (or major analog distortion). If the voice is not loud enough it could be amplified after the ADC in the digital domain. It really doesn't matter much, if you lose 12 dB or 24 dB dynamic range, because the mic input is not levelled to maximum.

Yep, agree - I think this is the bit where the wording should be potentially reconsidered:
image

That's OT to this issue. Feel free to create a PR with the respective changes though :)
(The file in question would be https://github.com/mumble-voip/mumble/blob/master/src/mumble/AudioWizard.ui - can be edited with Qt Designer)

The audio wizard is useless anyway.

@streaps wasn't for me!

Slightly related question, does Mumble do any audio processing, filteration, modification, or similar? Input or output side that cannot be disabled? Equalization, loudness, compression, anything?

For example, if you have RNNoise off, Supression off, Amplification set to 1, is the audio completely direct? Or is there still stuff going on?

If the audio stream is not at 48KHz, mumble resamples it to that bitrate.

Then, the following speex filters are unconditionally enabled:

  • auto gain control
  • dereverb

@fedetft When you day not at 48KHz, do you mean higher or lower?
And how come we have no ability to change this?

When you day not at 48KHz, do you mean higher or lower?

@grravity this usually means lower. Audio nowadays is sampled either at 44.1 KHz or at 48 KHz.

And how come we have no ability to change this?

I don't think it's very sensible to add that ability. Lemme explain why I think that:

  1. The human ear can listen up to about 20 KHz. By the Nyquist-Shannon sampling theorem, you can replicate our experience by sampling audio at double that frequency, i.e., 40 KHz. 48 KHz has a good headroom on top of that, and is otherwise more than enough for the human ear.

  2. 48 KHz and 44.1 KHz are the two main standards for audio sampling, 48 being the most used of the two. Putting in a different sampling rate from those two would mean that support would be harder to achieve at a lower level of the stack, because they are standards.

Basically, I feel that the costs outweigh any benefits that you could get from a different sampling rate.

Ok this is understandable if it means lower, but if it was higher then I was slightly confused. I fail to see any reason not to set the standard for the rate so this makes sense. Understood. I am a little curious as to why the Speex filters being added would benefit this case? Dereverb and auto gain control? How do those play into the question?

Just curiously, what occurs if it is 48KHz or higher? This would mean that no processing, modulation, or filtering occurs? Which goes back to my original question.

The Opus codec supports 48 kHz sample rate only.

Also RNNoise only supports 48KHz, there's no easy way for mumble to support higher sample rates other than rewriting from scratch the dsp stuff it relies upon.

Was this page helpful?
0 / 5 - 0 ratings