It seems that Mumble requests a very low latency from PulseAudio, resulting in very high CPU load. The PA process eats around 15% and Mumble 10% (+/- 3%). That's at idle and without pressing the PushToTalk key. The CPU is an i5 2500K running at 4GHz.
Also, only Mumble is using the PA daemon. No other app is open that uses audio, so no resampling or mixing is being done by PA (the audio card is opened at the sample rate that Mumble requested: 44100Hz.)
The Mumble configuration is the one you get when you select the best quality in the Wizard (10ms).
Using this command:
pacmd list-sinks | grep latency
shows that Mumble does not actually request a 10ms latency from PA, but 2ms:
current latency: 1.92 ms
configured latency: 2.00 ms; range is 0.50 .. 2000.00 ms
This is insane. 2ms isn't even needed by professional DAWs. This needs way too much CPU. I would have expected Mumble to request 10ms, not 2ms. When I go to the Mumble Configuration and increase the Audio Output -> Output Delay slider from 10ms to 20ms, CPU usage drops quite a bit (but still too high: 11% for PA and 8% for Mumble) and the output of the command is now:
current latency: 4.73 ms
configured latency: 5.00 ms; range is 0.50 .. 2000.00 ms
That's better, but still too high. I would have expected 20ms, not 5ms. Increasing the Output Delay slider further doesn't change anything; Mumble still requests 5ms from PulseAudio, which is just weird.
I'm using PulseAudio 4.0, Mumble 1.2.4, on Gentoo Linux AMD64.
Interesting find, I'm curious how it looks on windows.
You cannot go under 10ms on the windows interface
Don't have the time to look into it right now. For later reference:
Parts that looked relevant: PulseAudio.cpp in "else if (do_start) { ... }" locations.
PA stuff that looked relevant:
http://freedesktop.org/software/pulseaudio/doxygen/structpa__buffer__attr.html
http://freedesktop.org/software/pulseaudio/doxygen/stream_8h.html#ab9544f6677af133fbe81bf8a21eb489c
Anyone else feel free to take a look.
OK, it seems it's the input that's at fault. If I choose ALSA in the input settings, but keep PulseAudio in the output settings, then CPU usage of both Mumble and PulseAudio goes down to a very acceptable 1.3% with a configured latency of 30ms for "output delay" and "jitter buffer".
This probably means that the pa_buffer_attr struct passed to pa_stream_connect_record() is bogus and specifies too small buffers. The pa_buffer_attr.tlength value used for pa_stream_connect_playback() (for output) seems OK:
buff.tlength = iBlockLen * (g.s.iOutputDelay+1);
But the pa_buffer_attr.fragsize value used for pa_stream_connect_record() does not have any multiplier applied to it:
buff.fragsize = iBlockLen;
I don't know what the multiplier should be, but it looks like it does need one.
A number of us had the same problem with Mumble - high CPU usage of both mumble and pulseaudio on Linux. We change to ALSA on the Audio Input setting (make sure you have the advanced tickbox ticked in the lower left hand corner of the configuration window) and then in the Device drop down we worked our way through the list (avoiding the pulseaudio ones) until one didn't thrown an error in the main mumble window and worked and BINGO - low CPU usage. I personally had to restart Mumble to get the PulseAudio cpu to drop down but it did work. Very happy now.
Just thought I'd add the specifics in case anyone else needed a work around - sorry its not actually fixing the original issue.
Thanks for the workaround @gmorph. I can run mumble now without burning my cpu, thank god. Is this issue unfixable? It has been around for many years, but for most of those years pulseaudio was a mess and nobody used it much so maybe this issue wasn't as visible. Now that pulseaudio is in higher use I feel like this needs to be fixed or many people are going to pass over mumble as an acceptable application to have running.
@realnc You seemed to have dug into the issue pretty well. What was the holdup on submitting a patch? Is it something that must be fixed in pulseaudio? Or is it something that can be patched into mumble?
I ask because I am very willing to dig this bug out and fix it. But I'd like to have a better idea where to start, in pulseaudio's or mumble's src.
I didn't submit any patches since I don't know the details of PulseAudio. I posted my findings just in case that spot in the code rings a bell with the Mumble team.
That makes a lot of sense.
The workaround of using Alsa instead of pulseaudio for the Input was still giving me high cpu load(and entire core at 100%). So in preparation for patching and testing I have installed version 1.3.0 built from git. To my surprise mumble has been open for around an hour now and the cpu load bug hasn't shown it's face, I think this may have been fixed in git since the 1.2.8 release. Both the input and output are set to pulseaudio defaults.
Edit: CPU Load Bug still exists in version 1.3.0 built from git using this PKGBUILD from the AUR.
No interest?
There is interest, but the project as a whole has been moving pretty slowly for quite a while, which is sad. We try to do our best.
Very few of us developers "dogfood" the Linux build anymore, so it doesn't get as many eyes as it used to. Also unfortunate.
@realnc The multiplier for output is expected, that's the "output delay" slider for output. We don't have the same concept for manipulating buffer-sizes/delays for AudioInput. I don't think that is why you're getting the latency numbers you're getting...
I have not yet looked into _why_ Pulse would be running at sub-10 ms yet, though.
I have assigned this bug to 1.3.0 and myself... I hope to take a look while I tackle a few other audio-related issues.
Sorry for letting this linger for so long.
I'd just like to add, regarding the
buff.tlength = iBlockLen * (g.s.iOutputDelay+1);
and
buff.fragsize = iBlockLen;
code... Obviously, if iBlockLen is very small, we would be trying to run PulseAudio at it limits (which it seems to obey). Perhaps the issue is that we try to pick as low a buffer as we can -- and use the smallest one the sound system provides. If that's the case, we should probably use 10 ms as our floor.
In case anyone else is bothered by this: each of the following PulseAudio settings (daemon.conf) reduced the CPU usage a little, both for Mumble and Pulse. YMMV.
default-sample-rate=48000
high-priority = no
realtime-scheduling = no
resample-method = trivial
On my PC, this made the CPU usage for Mumble and Pulse together drop from about 15% to 6%.
I've added a WIP PR to control "input delay" at https://github.com/mumble-voip/mumble/pull/2834/files if anyone wants to play with it.
I've not been able to reproduce the latency numbers that @realnc has provided. Mine hovers around ~25 ms via pacmd list-sinks | grep latency. But I agree the CPU% of Mumble+Pulse is excessive.
I have this issue too, I tried alsa for input and output, it helped abit but the spcu usage still averages around %13 both cores (have 2 cores), I am using 1.3.0 on Debian Buster
Is there a chance to optimize this somehow under Windows 10, or give options for people who dont need super fast latency? Also it seems the entire Mumble client is really badly optimized. Why does it still process and take CPU, even if you deaf yourself?
Running Mumble with no people inside, heats up my 8th gen Intel laptop. This isnt anything acceptable. Mumble alone with no people, no traffic, creates a 2-4Watt usage on the CPU.
We're aware of the issue, we will fix it as soon as possible because the requirements are indeed unacceptable.
We're aware of the issue, we will fix it as soon as possible because the requirements are indeed unacceptable.
1089
Then it is even worse and buggy, because deactivating those under settengs has no impact on the high CPU usage of Mumble: https://i.imgur.com/doXTYmf.png
The SpeexDSP preprocessor is always active: #3323
The SpeexDSP preprocessor is always active: #3323
Yeah, like I said or assumed, really badly optimized. Not sure if this will ever change. I hope so.
Of course it will, there are many changes planned for 1.4.
I assume you mean the as-yet-unreleased 1.3?
1.3 can be considered frozen.
Once it's released (hopefully in a few weeks) we can focus on optimization because 1.3 is supposed to be the last release supporting legacy stuff (e.g. Speex (not DSP) and CELT).
Most helpful comment
1.3 can be considered frozen.
Once it's released (hopefully in a few weeks) we can focus on optimization because 1.3 is supposed to be the last release supporting legacy stuff (e.g. Speex (not DSP) and CELT).