Tdesktop: Echo cancellation for voice calls

Created on 23 Jun 2018 · 27Comments · Source: telegramdesktop/tdesktop

As mentioned in auto-closed issue #3455, tdesktop has no echo cancellation. For users with laptops or desktops with loudspeakers, this makes tdesktop significantly substandard for voice calls at present, a shame as the voice feedback is fairly intolerable. It'd be great to see first-class telegram call support on tdesktop to the same quality as mobile.

enhancement

Source

qgates

👍4

Most helpful comment

@grishka some feedback. Good news - testing on windows, echo cancellation is now working on tdesktop (with mic and speakers) after what seems like a 'training period' of 10-15 seconds. For the first 10-15 seconds of a call between phone and desktop, the phone user's voice can still be heard loudly echoing back as before. But after 10-15 seconds, the echo cancellation kicks in and works perfectly - phone user's voice can no longer be heard echoing back. A big improvement - although I'm wondering whether it can be improved further to work perfectly from the beginning of the call?

Possibly related, from the tdesktop end there is an extremely low echo/feedback coming from the other end. It's very quiet, not a big deal, but noticeable. Not sure if that was there before or whether it was masked by the echo coming in the other direction.

qgates on 16 Jul 2018

👍2

All 27 comments

We do actually have echo cancellation I pulled out of WebRTC. It worked reasonably well in my tests, but it might need some tweaking to be better. Also, on OS X I make use of the built-in echo cancellation via VoiceProcessingIO which certainly works (presumably it's the exact same thing Apple uses in their own FaceTime). I know there's a similar thing on Windows since Vista and I plan on trying it too and enabling it if it works well.

grishka on 25 Jun 2018

@grishka just to be clear, are you suggesting that tdesktop already has echo cancellation? I doubt it because here, if I speak to anyone from a desktop PC with a mic & loudspeakers (using tdesktop), callers at the other end hear their own voice feeding back badly - even if I turn the volume down to a quiet level. I don't have the same issue using Skype or Google hangouts.

If the remote user is also on tdesktop with a laptop (built in mic and speakers), then the feedback loops are intolerable. Telegram calls mobile-mobile are perfect, even when on speakerphone. I assume the mobile version uses the built in hardware echo cancellation provided by the handset.

This is badly needed. Presently the calling feature is practically unusable with tdesktop unless the user is on a headset or headphones.

qgates on 25 Jun 2018

I am not _suggesting_ it, I am _stating_ it. It does have echo cancellation and it works on my Windows virtual machine. The only caveat here is that I didn't test on real Windows-compatible hardware because I don't own any, and apparently that's the problem. :(

Yes, there's built-in hardware AEC we use on most mobile devices as well as Mac computers, but there's also a fallback implementation that is needed for some older and/or low-end Android devices. On Windows there's this voice capture DSP thing and I was going to give it a shot anyway. I know nothing about Skype, but WebRTC uses it.

Thank you, I'll have a look into this issue.

grishka on 25 Jun 2018

I apologise for misunderstanding, but I'm simply reporting behaviour.

I run tdesktop mainly on Windows 7 desktop (with logitech c920 mic and
loudspeakers), Linux laptop (Lenovo T430, inbuilt mic and loudspeakers) and
iMac circa 2013. On all machines remote caller experiences bad feedback of
their voice. Also, if I call friends running tdesktop on their laptops from
my mobile, I experience it quite badly too. Only mobile to mobile works
nicely, whether on speakerphone or not.

Hope the additional info helps and thanks for looking into it :-)

On Mon, 25 Jun 2018, 18:07 Gregory K, notifications@github.com wrote:

I am not suggesting it, I am stating it. It does
https://github.com/grishka/libtgvoip/blob/public/EchoCanceller.cpp have
echo cancellation and it works on my Windows virtual machine. The only
caveat here is that I didn't test on real Windows-compatible hardware
because I don't own any, and apparently that's the problem. :(

Yes, there's built-in hardware AEC we use on most mobile devices as well
as Mac computers, but there's also a fallback implementation that is needed
for some older and/or low-end Android devices. On Windows there's this
voice capture DSP thing
https://msdn.microsoft.com/en-us/library/windows/desktop/ff819492(v=vs.85).aspx
and I was going to give it a shot anyway. I know nothing about Skype, but WebRTC
uses it
https://github.com/ReadyTalk/webrtc/blob/master/webrtc/modules/audio_device/win/audio_device_core_win.cc
.

Thank you, I'll have a look into this issue.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/telegramdesktop/tdesktop/issues/4881#issuecomment-400019782,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEBLBaZB50O83MInu6jN2OWNurOhcWwRks5uARSdgaJpZM4U01wX
.

qgates on 25 Jun 2018

I can easily understand that it doesn't work on Windows and Linux since those use software AEC that, as we've already established, currently sucks, but the iMac is the most interesting. Which OS X version is it running?

grishka on 25 Jun 2018

It's 10.11 el capitain I think, but I'll double check later :-)

On Mon, 25 Jun 2018, 18:54 Gregory K, notifications@github.com wrote:

I can easily understand that it doesn't work on Windows and Linux since
those use software AEC that, as we've already established, currently sucks,
but the iMac is the most interesting. Which OS X version is it running?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/telegramdesktop/tdesktop/issues/4881#issuecomment-400039077,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEBLBavqEGegpNDXgZP9DJ7__7e42YGzks5uASOdgaJpZM4U01wX
.

qgates on 25 Jun 2018

Ok then it's double weird. VoiceProcessingIO was introduced in 10.7 IIRC, so it has to be available there. If it wouldn't have been available, it would've just shown an error when starting a call. I have a suspicion that it requires a different sampling rate or data format for it to work, but I don't have any way to test this. VoiceProcessingIO is very poorly documented 😭

grishka on 25 Jun 2018

In fairness the imac is located in a fairly echoey room, but friends are
reporting it so I thought I'd mention. However comparing with mobile on
speakerphone in the same room the imac is still a lot worse.

Sounds like something is amiss across the different platforms. From here it
seems that Windows and Linux are the worst.

On Mon, 25 Jun 2018, 19:06 Gregory K, notifications@github.com wrote:

Ok then it's double weird. VoiceProcessingIO was introduced in 10.7 IIRC,
so it has to be available there. If it wouldn't have been available, it
would've just shown an error when starting a call. I have a suspicion that
it requires a different sampling rate or data format for it to work, but I
don't have any way to test this. VoiceProcessingIO is very poorly
documented 😭

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/telegramdesktop/tdesktop/issues/4881#issuecomment-400042969,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEBLBRnJwU7MrTyFaiDYqbdiGwh6xwWXks5uASZxgaJpZM4U01wX
.

qgates on 25 Jun 2018

So I started poking around with the voice capture DSP in Windows. The bad news is that it only supports sampling rates up to 22050 hz, which isn't exactly what I was hoping for. :(

grishka on 25 Jun 2018

I wonder if that's by design, being that voice codecs are generally optimised for lower data rates. That said, it would be nice to work at higher quality as telegram is already capable of scaling according to available bandwidth.

Btw I didn't realise you were part of the core team on the voip side, sorry for any perceived disrespect. Though a programmer (asm thru C/C++ thru web) who wears sound engineering hats on regular occasion I haven't dug into the telegram voip code at all. Perhaps if time permits and it's not nailed in the meantime I will - I'd love to see telegram offer a best-in-class voip experience across all the clients.

Thanks again for taking the time to look into this. And I can confirm that the iMac is running 10.11 El Capitan. Let me know if there's any way I can assist.

qgates on 26 Jun 2018

On the Windows/Linux part, I think it would be better to stick with the WebRTC AEC but this time test it thoroughly, haha. Especially since Linux doesn't offer anything built-in at all. There's also something new called "AEC3", might be better than both "AEC" (desktop) and "AECM" (mobile, integer-only).

On the OS X part, I remembered that a colleague of mine reported an echo issue on a 2011 iMac running 10.13.3, but he since updated it to 10.13.4, we had a call today and the issue seems gone. It might have been an Apple bug but I can't be so sure. I copy-pasted together a small test app that outputs the native stream formats used by the VoiceProcessingIO unit, it would be great if you run it and post its output here. Also, how well do other VoIP apps work on that computer?
Архив.zip (binary + source)

grishka on 26 Jun 2018

Here's the output:

Output stream format:
Sample Rate:              44100 
Format ID:                 lpcm 
Format Flags:                29 
Bytes per Packet:             4 
Frames per Packet:            1 
Bytes per Frame:              4 
Channels per Frame:           2 
Bits per Channel:            32 
kAudioFormatFlagIsFloat 
kAudioFormatFlagIsPacked 
kAudioFormatFlagIsNonInterleaved 
kLinearPCMFormatFlagIsFloat 
kLinearPCMFormatFlagIsPacked 
kLinearPCMFormatFlagIsNonInterleaved 
kLinearPCMFormatFlagsSampleFractionShift 
kAppleLosslessFormatFlag_16BitSourceData 
kAppleLosslessFormatFlag_24BitSourceData 

Input stream format:
Sample Rate:              44100 
Format ID:                 lpcm 
Format Flags:                 9 
Bytes per Packet:             4 
Frames per Packet:            1 
Bytes per Frame:              4 
Channels per Frame:           1 
Bits per Channel:            32 
kAudioFormatFlagIsFloat 
kAudioFormatFlagIsPacked 
kLinearPCMFormatFlagIsFloat 
kLinearPCMFormatFlagIsPacked 
kLinearPCMFormatFlagsSampleFractionShift 
kAppleLosslessFormatFlag_16BitSourceData 
kAppleLosslessFormatFlag_24BitSourceData

qgates on 26 Jun 2018

Thank you. This is exactly the same as what I get on my macbook where the echo cancellation works perfectly. I'm out of ideas now.

grishka on 26 Jun 2018

I'm not too concerned about the performance on the imac, because it's in an echoey space, using the inbuilt mic (which sucks) and with the output connected to amplified speakers. So it's unlikely the echo cancellation could work well in such an environment.

Mostly I'm on linux, but have the imac and windows desktops as well. Most friends/family/colleagues are using windows, some linux, usually on laptops (with inbuilt mic and speakers) but sometimes with speakers and a dedicated mic. In all cases the feedback is pretty dire, except when the call is mobile to mobile.

Eg. earlier today I spoke with a friend from my mobile (Samsung S7) to a windows laptop (Lenovo T430S) running tdesktop. At my end (on the phone) I'm hearing his voice and mine repeating with ringing feedback and other feedback-related noise. At his end, he hears me just fine, clear with no feedback.

qgates on 26 Jun 2018

Also, how well do other VoIP apps work on that computer?

On the iMac in that environment? Probably a little better than tdesktop but far from perfect. Which is why I wouldn't sweat it on the mac performance.

By the way, if you want to hear what's going on, you can call me via telegram and see how it sounds to you :-)

qgates on 26 Jun 2018

I've just ran a series of tests on the echo canceller I currently use, with the same parameters as in libtgvoip. I made a program (on OS X) that plays an audio file and records simultaneously, effectively simulating the call scenario. So far, it works well enough (the audio that it plays isn't heard in the output and everything else picked up by the microphone is heard) in all these cases: using the built-in laptop speakers, using an external speaker plugged into the onboard headphone jack, and using the same external speaker plugged into a USB soundcard while still using the built-in mic for recording. So it does definitely work.

The one thing I did in that program that libtgvoip lacks is that I explicitly set the I/O buffer size for both input and output to 480 samples which is the frame size the AEC algorithm works with so I could simplify my code by eliminating any additional buffering. I have a suspicion that the failing echo cancellation on Linux/Windows might have something to do with the default buffer sizes used by drivers. Those might be anything, really. But there's no way I'm testing this without real hardware, so I guess I'll have to put this on hold until I get my hands on some.

So, Linux. Are you using ALSA or PulseAudio?

grishka on 26 Jun 2018

Both. Usually PulseAudio (most distros default to it these days and it simplifies config) but one or two older machines with ALSA

qgates on 26 Jun 2018

I have an update for you. I've finally managed to reproduce this on Windows. Echo cancellation does work for the first ~30 seconds of the call but then stops working by itself. Also the buffer size I get with the built-in Realtek soundcard is consistently 10 ms (while I request 60) regardless of the driver (I tried both the one that comes with Windows and the one from Realtek) and the callback timing is very accurate so it's clear that this problem isn't caused by that (on Windows; I haven't tested on Linux yet, but there are some issues with PulseAudio).

grishka on 29 Jun 2018

Thanks for the update, really appreciate the effort looking into this - hopefully we can get to a place where tdesktop has best-in-class voip on all the platforms.

Speaking of general Linux issues with voip, apart from echo cancellation issues problems have been sporadic. But then I use ALSA on my main system. I'll double check a few systems running PulseAudio and post back any findings in #4219

qgates on 29 Jun 2018

👍1

Looks like uncommenting this line helps. At least it seems to - the echo canceller no longer breaks 30 seconds into a call. I'll also test on Linux now.

grishka on 30 Jun 2018

Any possibility I can test this? Not sure how libgtvoip fixes find their way into tdesktop :thinking:

qgates on 4 Jul 2018

TDesktop 1.3.10 is out now with libtgvoip 2.1.1 which includes fixed AEC and PulseAudio.

grishka on 14 Jul 2018

@grishka

Especially since Linux doesn't offer anything built-in at all.

That's not exactly true. Pulseaudio has module-echo-cancel with "speex", "webrtc" and "adrian" AEC engines.

It could be explicitly requested by the application or you can just add PULSE_PROP="filter.want=echo-cancel" environment variable.

ValdikSS on 16 Jul 2018

qgates on 16 Jul 2018

👍2

@grishka as mentioned above tdesktop AEC is working on Windows after the training period :+1:

However, on Mac OSX El Capitan (10.11) it does not appear to be working even with the latest build. Using the inbuilt mic and external speakers, remote callers using telegram on Android are hearing their voice echo back loudly even after several minutes. I wonder could the software AEC fallback be brought in on OSX where the hardware AEC is failing?

Linux testing still to do...

qgates on 20 Jul 2018

@grishka some feedback. Good news - testing on windows, echo cancellation is now working on tdesktop (with mic and speakers) after what seems like a 'training period' of 10-15 seconds. For the first 10-15 seconds of a call between phone and desktop, the phone user's voice can still be heard loudly echoing back as before. But after 10-15 seconds, the echo cancellation kicks in and works perfectly - phone user's voice can no longer be heard echoing back. A big improvement - although I'm wondering whether it can be improved further to work perfectly from the beginning of the call?

Possibly related, from the tdesktop end there is an extremely low echo/feedback coming from the other end. It's very quiet, not a big deal, but noticeable. Not sure if that was there before or whether it was masked by the echo coming in the other direction.

To fix this, need to be enable "extended filter" by: webrtc::WebRtcAec_enable_extended_filter(webrtc::WebRtcAec_aec_core(aecInst), webrtc::kAecTrue);

ud84 on 10 May 2019

It's been a long time. I've since changed it to use the WebRTC APM (audio processing module) that does all sorts of magic internally. The problem with 10-15 seconds is because I haven't found a way to get an accurate delay measurement out of WASAPI. In contrast, PulseAudio on Linux is able to provide the delay up to a sample, and there this same echo cancellation algorithm works flawlessly from the start.

grishka on 10 May 2019

Was this page helpful?

0 / 5 - 0 ratings