Godot-proposals: Add support for recording and sending VoIP data

Created on 22 May 2020 · 18Comments · Source: godotengine/godot-proposals

Describe the project you are working on:
Multiplayer shooting game

Describe the problem or limitation you are having in your project:
I am trying to send sound packets over the network to the other player so as to better my multiplayer game interaction

Describe the feature / enhancement and how it helps to overcome the problem or limitation:
Adding a Sound Recorder to record sounds and save them as temporary .wav or any other supported audio format files to send over the internet and other networks and can be useful for other purposes

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:
A SoundRecorder node would have able to save sounds like voice data which could be useful in sending greetings over a network by different players instead of having to type a message during game play , Voice data would also be useful in the authentication of users so as to prevent data manipulation by hackers and give better security for users

If this enhancement will not be used often, can it be worked around with a few lines of script?:
A few lines a code wouldn't cut it

Is there a reason why this should be core and not an add-on in the asset library?:
There are different platforms that have different sound recorder classes creating a sound recorder node will allow acess to all those platforms at once without having to rewrite the code for different platforms. A plugin wouldnt be able to manage that
Sound1
Sound2

audio network

Source

IoneGod

👍4

Most helpful comment

Ok I've taken the past few days to starting figuring out libOpus.
What I've got here is a proof of concept, mainly a way for me to learn how Opus works, and how we might be able to integrate it with Godot.

Here is a proof of concept GDNative library wrapping libOpus:
libopus-gdnative

And here is a demo project using the gdnative library:
libopus-gdnative-demo (_only compiled for windows x64 currently_)

One issue I ran into is that libOpus only accepts a select few sample rates, and Godot's 44.1kH is not one of them. The closest Opus has is 48kH.

Godot has a bicubic resampler it looks like for playback: AudioStreamPlaybackResampled, but ideally we'd be able to resample the input from the Microphone. If that's possible with AudioStreamPlaybackResampled I haven't figured it out yet.

This was causing me some problems, until I found a great hack. I get 44.1kH audio from Godot's microphone API. Then I tell libOpus that this is in fact 48kH audio. The resulting compression is distorted due to this. Then on the decode side, libOpus produces the distorted 48kH audio, which I hand off to Godot, but I tell Godot it is actually 44.1kh, and thus it de-warps it xD

As great as that is, if we really wanted 1st class support for VOIP, I think we'd need the Microphone API to allow us to specify the sample rate, as well as mono VS stereo. There is no need for Stereo PCM data for a microphone. For VOIP anyway.

Lastly, as expected, the compression ratio is just fantastic. In simple demos, I was seeing greater than 100x size reduction over the raw PCM audio.

Wavesonics on 2 Jun 2020

👍5

All 18 comments

Current workflow for recording: https://docs.godotengine.org/en/latest/tutorials/audio/recording_with_microphone.html
Lengthy discussion about VoIP: godotengine/godot#18133
VoIP demo: https://github.com/cbarsugman/godot-voip-demo

Jummit on 22 May 2020

Audio recording is already supported since Godot 3.1. That said, it may not work on all platforms due to bugs (see https://github.com/godotengine/godot/issues/33184).

Calinou on 22 May 2020

even though @Calinou reacted with a :confused: to my proposal over in the already mentioned lengthy discussion (without commenting, so not sure which part about it was confusing/bad, sorry, but i guess it might've been my not-so-professional wording as a result of reading through various overcomplicated proposed third party service bindings), i just wanna mention again, godot already successfully implements webrtc data channels for multiplayer on (afaik) all platforms, so it might be a good idea to just wrap the webrtc audio channels too and expose those to allow for VoIP.

nonchip on 24 May 2020

@nonchip WebRTC adds a significant amount of complexity on its own. I'd advise not relying on it unless you need to support HTML5 exports somehow. Most networked games don't need to support HTML5 exports, so I would prefer an easier to set up solution. (Do you have STUN/TURN servers at hand? To my knowledge, this is pretty much required for WebRTC.)

Calinou on 24 May 2020

@Calinou good point, might be worthy to take into consideration for the html5 export of the voip solution though.
also about stun/turn you technically don't need it but you really want to because it's a pain to do it in any other way. i'm running a spreed WebRTC service myself, that has it all included, but is pretty much meant as a "go to that website and start talking" kinda thing, i looked into setting up stun/turn manually and decided my sanity was more important :P

nonchip on 24 May 2020

I believe that in order to implement real time voice streaming, we're still missing one part of WebRTC: #813

That is of course if you want to do it using WebRTC

Wavesonics on 28 May 2020

I've been digging into this a little more recently. I implemented a VOIP demo similar to the one @Jummit linked, and that highlighted some of the issues here to me.

I have some time right now where I could probably get a real solution implemented, so I thought I'd get some input on what a real solution would actually look like.

Here is the breakdown of the problem as I see it:

1) Recording audio:

This was solved in 3.1. Maybe it could be made a little more friendly with something like a SoundRecorder node as @iapps207 suggested. But even if not, it does work as it exists today.

2) Sending the data over the wire:

There's a variety of ways we could accomplish this, and we probably don't want to be too prescriptive here. But the problem with how my demo works and the cbarsugman demo is that they are not truly streaming the audio. They record, then send the whole audio buffer. It's simple, but pretty terrible for real time communication. So here's the options as I see them:

A) Send via existing network methods (_rset or rpc argument_). Depending on how this is configured, all data will pass through the server on the way to each client. This as far as I can figure it, will work like my existing demo, and not be truly real time.
B) WebRTC MediaStreams: Not currently implemented yet #813. This is very prescriptive, but it would be extremely easy to setup, and using STUN/TURN would allow peer-to-peer when available, saving lots of bandwidth on the server.
C) WebRTC without MediaStreams: I haven't gone too deep here yet, but I can't see why we couldn't just use the existing WebRTC data channel. This just puts the burden on the sender and receiver to properly handle the data like audio.
D) Some sort of lower level system based on existing Godot networking, I don't think this would require any new features, but it would have to be able to work along side the existing high level multiplayer API in my opinion. I haven't tried to combine the high level API with low level networking, is it possible?

3) Encoding the data for transit:

All of this is a moot point at the moment, because the data returned from the Microphone API is (_as it should be_) a wav. Obviously the data will be far too large to use for a real-time VOIP application. And as it stands right now, there is no Audio encoder exposed to the scripting interface (_or from my discussion with @Calinou even in the engine at all_). _So I think this is the first issue that must be addressed._ From some research it looks like Opus is the best open codec for voice data, so I cloned their repo and have been poking around the docs.

My question to anyone here is: should we have libOpus in the engine it's self for this purpose? I would certainly lean toward yes, but I can see this being only for VOIP so maybe being too specific of a use case.

I asked around on the opus IRC channel, and it looks like the higher level opus libs are specifically for file access, or http streams. So we'll probably be stuck with just the base libOpus...

Last thing to note: If we do go with opus, it has the added advantage of being the codec used by all browsers for WebRTC Media Streams. So it might pair nicely with an implementation of that.

I'm definitely looking for feedback/suggestions. Am I totally off the mark on anything here? Is this even a thing people are interested in?

Wavesonics on 29 May 2020

👍4

Here is a proof of concept GDNative library wrapping libOpus:
libopus-gdnative

And here is a demo project using the gdnative library:
libopus-gdnative-demo (_only compiled for windows x64 currently_)

One issue I ran into is that libOpus only accepts a select few sample rates, and Godot's 44.1kH is not one of them. The closest Opus has is 48kH.

Lastly, as expected, the compression ratio is just fantastic. In simple demos, I was seeing greater than 100x size reduction over the raw PCM audio.

Wavesonics on 2 Jun 2020

👍5

Just some notes here from looking around at solutions to the Sample Rate Conversion problem.

The most common one I've found is: libsamplerate
Which is C and looks good over all. ~~The problem is it's GPL~~ (_It switched to BSD 2 in 2018_)

Here is a C++ library which is MIT license: r8brain

That might be a good option if any of this work was ever considered for inclusion in Godot.

Lots of into about sample rate libs here: https://ccrma.stanford.edu/~jos/resample/Free_Resampling_Software.html

Wavesonics on 2 Jun 2020

Doesn't Opus include its own resampler? I read that somewhere on a forum while searching for a solution to this specific issue.

Calinou on 2 Jun 2020

libOpus doesn't appear to? At least not that I could find looking through it's docs.

I think it's derivatives like opusenc or opusfile might.

Wavesonics on 2 Jun 2020

I was discussing this with iFire on discord, and he apparently had a PR which not only added libOpus support, but added the next component which I have not yet addressed: being able to actually stream the decoded audio into Godot's audio system.

https://github.com/fire/godot/commit/37ec39028e8f93147978606521f7a432dde6afc7

He said it was rejected, but we didn't have time to get into the details.

If any of the core contributors have any insight into what was wrong with it, how or if it could be changed to be acceptable I'd love to discuss it!

Lastly from my discussion with iFire, the existing AudioEffectRecord will probably not be ideal for streaming audio in it's current form as it does a large buffer re-allocation as part of it.

I'm going to pursue my current approach as a stop-gap (_providing libOpus as a gdnative library_) but that PR looks much more comprehensive starting point for providing true 1st class support for streaming VOIP.

Wavesonics on 2 Jun 2020

@Calinou found this in their FAQ, maybe opus-tools is what you had read about?

How do I use 44.1 kHz or some other sampling rate not directly supported by Opus?

Tools which read or write Opus should inter-operate with other sampling rates by transparently performing sample rate conversion behind the scenes whenever necessary. In particular, software developers should not use Opus Custom for 44.1 kHz support, except in the very specific circumstances outlined above.

Note that it's generally preferable for a decoder to output at 48kHz, even when you know the original input was 44.1kHz. This is not only because you can skip resampling, but also because many cheaper audio interfaces have poor quality output for 44.1kHz.

The opus-tools package source code contains a small, high quality, high performance, BSD licensed resampler which can be used where resampling is required.

So maybe their small BSD licensed resampler would be a good option:
https://opus-codec.org/release/dev/2018/09/18/opus-tools-0_2.html

Wavesonics on 9 Jun 2020

I polished up my addon and put it on the library here:
https://godotengine.org/asset-library/asset/650

It's certainly far from the real streaming solution we'd like to get to, but I'm using it to pretty good effect I think in my project.

The lag is obvious, but people seem to adjust to it pretty quickly, if you want to see what the experience is like my project is here: https://github.com/FugitiveTheGame/Fugitive3D/

On the path to true streaming audio, the next biggest blocking factor is the need for direct access to an audio buffer inside Godot which we can write frames to directly. @fire 's PR that I linked above looks like a good starting point to me, but I honestly haven't dug into that part of the issue much yet.

Transport of the audio is still an issue, but much more solvable in various ways.

Wavesonics on 15 Jun 2020

👍3

@Calinou @Wavesonics @nonchip you guys have to see Vivox which is integratable into other game engines made by unity https://unity.com/products/vivox

IoneGod on 25 Jun 2020

@iapps207 Vivox is a proprietary library, which is therefore unsuitable for inclusion in Godot. Nothing prevents a third-party from publishing a module for it though.

Calinou on 25 Jun 2020

@Wavesonics have you made any discoveries on this front since then?

For reference, I found the relevant PR here with a little more details: https://github.com/godotengine/godot/pull/35402
It includes some messages from @reduz on the topic, but that's about it.

Here's the original issue from @fire: https://github.com/godotengine/godot-proposals/issues/399

It seems they were closed in favor of a better implementation due to packet ordering and delays.

auderer on 28 Sep 2020

@auderer no, I haven't spent any time on this recently. The opus plugin I released on the asset store allows some simple forms of voip to work. But the road block to true voip now is lower level access to an audio buffer like that PR you linked provides. For opus in particular, we need to stream individual opus packets, decompress them and insert them directly into an audio stream buffer. As far as I'm aware that's not possible with the current audio system implementation.

Wavesonics on 2 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings