Godot: We might have to dump Opus.. here's why!

Created on 10 Jan 2017 · 20Comments · Source: godotengine/godot

This is long and open to discussion. I was thinking about modernization of the audio engine. The main main reasons why Godot's audio engine was written as it is were:

Traditionally, devices had little amount of memory until a few years ago. Additionally, decoding compressed audio by traditional libraries was CPU intensive and required allocating/deallocating memory (which is forbidden and must never be done in audio threads).

This combination of facts resulted in audio implementations where streaming audio (music) and samples (sound effects) had to be divided. Streaming happened from disk (OGG files in Godot are streamed from disk), while samples are played from memory, generally uncompressed.

The method for streaming music works like this:

-A thread decodes the audio from disk, then fills a ringbuffer
-The audio thread reads from the ringbuffer and writes audio

This method has the drawback that streaming or any kind of change to the audio results in high latency, and seeking, resampling, etc. is hard.

Nowadays computers and mobile devices have plenty of memory, so loading the ogg fles into memory and streaming them from there (just like you would do for regular sound effects) could probably be possible without much of a problem... however one problem remains:

Both libogg and libvorbisfile perform memory allocations during decode, which makes the library unsuitable for streaming from the audio thread.

So it crossed my head today, what if there was a way to stram vorbis, mp3 or any other format from memory directly? without allocating memory? nowadays you can have a full thread for this.. did no one think of this? We could unify the SAMPLE/STREAM classes in Godot if this was possible.

This is when I remembered stb_vorbis.c , which is an alternative implementation of Vorbis. I discarded it back in the day because it lacked seeking support, so I went to check today and I found this:

// Originally sponsored by RAD Game Tools. Seeking sponsored
// by Phillip Bennefall, Marc Andersen, Aaron Baker, Elias Software,
// Aras Pranckevicius, and Sean Barrett.

Aha! Someone did this reasoning before, Aras is one of the leads of Unity, so I definitely was on the right track :)

So, for the new audio engine, my proposal is to ditch all audio formats that are RT Unsafe (Opus and Vorbis bia libvorbisfile) and simply use std_vorbis and wav (via ima-adpcm) instead. This would allow us to use AudioStream for pretty much everything and enormously simplify the audio engine.

The only reason we could still support Opus is due to the very good voice compression ratio (great or in-game voice), but we would not support it as an AudioStream (unless we can figure out a way to stream it in a realtime-safe mode somehow).

Any thoughts?

discussion audio

Source

reduz

👍4

Most helpful comment

@reduz FYI me being one of sponsors for seeking implementation has nothing whatsoever to do with Unity (otherwise it would be "Unity" in the list). I paid my personal money, because I thought that would be a good thing to do. Unity does not use stb_vorbis.

aras-p on 14 Jan 2017

👍4

All 20 comments

The only real problem I see is that abandoning the traditional sample playback code will make streaming have more latency in the HTML5 backend :| right now we are using a webaudio implementation of samples and it works pretty well.. having a single stream API will undoubtly force it to go via the regular HTML5 streaming, which is processed in the main thread (having more lag).

Eventually AFAIK after web assembly is out, browsers will add thread support for it soon, so this will be a thing of the past, but we are talking a year or more ahead..

reduz on 10 Jan 2017

@reduz If webassembly gets thread support eventually I'd go for the audio change. I'd rather design for the future than to maintain an old system if you're breaking compatibility anyway. The GLES3 change already leaves out older devices and the web export will at least work. What magnitudes of latency are you talking about in the web version?

But then again I'm no audio expert..

karroffel on 10 Jan 2017

probably around 100-200ms as opposed to the 15-50 you usually expect from a
game

On Tue, Jan 10, 2017 at 4:12 PM, Karroffel notifications@github.com wrote:

@reduz https://github.com/reduz If webassembly gets thread support
eventually I'd go for the audio change. I'd rather design for the future
than to maintain an old system if you're breaking compatibility anyway.
The GLES3 change already leaves out older devices and the web export will
at least work. What magnitudes of latency are you we talking about in the
web version?

But then again I'm no audio expert..

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/godotengine/godot/issues/7496#issuecomment-271668429,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF-Z27mDfthVT-dSqHXJcKzYGVlpF1Gtks5rQ9gCgaJpZM4LfxKJ
.

reduz on 10 Jan 2017

basically take the minimum FPS you might find bottom barely acceptable for
a game (I don't know, 10f-15fps?), the audio will be processed between
frames, so you have to leave enough room for the audio to be sent to
webaudio.

On Tue, Jan 10, 2017 at 4:26 PM, Juan Linietsky reduzio@gmail.com wrote:

probably around 100-200ms as opposed to the 15-50 you usually expect from
a game

On Tue, Jan 10, 2017 at 4:12 PM, Karroffel notifications@github.com
wrote:

@reduz https://github.com/reduz If webassembly gets thread support
eventually I'd go for the audio change. I'd rather design for the future
than to maintain an old system if you're breaking compatibility anyway.
The GLES3 change already leaves out older devices and the web export will
at least work. What magnitudes of latency are you we talking about in the
web version?

But then again I'm no audio expert..

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/godotengine/godot/issues/7496#issuecomment-271668429,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF-Z27mDfthVT-dSqHXJcKzYGVlpF1Gtks5rQ9gCgaJpZM4LfxKJ
.

reduz on 10 Jan 2017

I know that a lot of people use streaming formats as sound effects, but how
common is it that these effects are longer than 1-2 seconds? In that case
we could just convert them to adpcm on export, and if they're too long
stream them, then you still need to to have 1 sound player node for both
types (so unify the SamplePlayer and StreamPlayer, etc into one) Long sound
files are usually used for voice or background music, in that case it's not
crucial to have low latency, so there's no need to sacrifice formats.

There might be some corner cases where a long sound file needs to be played
with no latency (maybe a very complex sound game), I'm not opposed to
supporting those by having the best implementation of the streaming code,
but I don't know how much it would duplicate code to support both the
threadable and non-threadable streaming..

On 10 January 2017 at 16:29, Juan Linietsky notifications@github.com
wrote:

basically take the minimum FPS you might find bottom barely acceptable for
a game (I don't know, 10f-15fps?), the audio will be processed between
frames, so you have to leave enough room for the audio to be sent to
webaudio.

On Tue, Jan 10, 2017 at 4:26 PM, Juan Linietsky reduzio@gmail.com wrote:

probably around 100-200ms as opposed to the 15-50 you usually expect from
a game

On Tue, Jan 10, 2017 at 4:12 PM, Karroffel notifications@github.com
wrote:

@reduz https://github.com/reduz If webassembly gets thread support
eventually I'd go for the audio change. I'd rather design for the future
than to maintain an old system if you're breaking compatibility
anyway.
The GLES3 change already leaves out older devices and the web export
will
at least work. What magnitudes of latency are you we talking about in
the
web version?

But then again I'm no audio expert..

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
issuecomment-271668429>,
or mute the thread
dSqHXJcKzYGVlpF1Gtks5rQ9gCgaJpZM4LfxKJ>
.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/godotengine/godot/issues/7496#issuecomment-271673327,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGVmPT1yh3dVEHr3eLZQ1LKeO0amMj9lks5rQ9wzgaJpZM4LfxKJ
.

punto- on 11 Jan 2017

May lip sync be an issue for choice dependent dialog voice files? May be It can be corrected easily by delaying audio or effects before hand but it just crossed my mind when reading.

hubbyist on 11 Jan 2017

aras-p on 14 Jan 2017

👍4

@aras-p Thanks for the clarification! And since you are here, do you think it would be possible to ask you how does Unity decode OGG audio data and sends it to the mixing thread?

reduz on 15 Jan 2017

@reduz I have no idea. I know we use FMOD as the audio engine, but whether Vorbis decoding is done by FMOD itself, or some other piece of code, I don't know...

aras-p on 15 Jan 2017

What you seem to be missing is that Opus itself does not allocate memory, it's the container parser opusfile that allocates. And if you're dealing with containers, you're already not real time.

So maybe, if allocation is so deathly bad for the mixing process, pre-walk the entire Ogg stream into pre-sized arrays of packet sizes and packet pointers, and hand those packets off to libopus yourself instead of using opusfile?

I mean, cripes, talking about transcoding to ADPCM at the build stage, just to spare CPU cycles? If, then, what is the point of even supporting high compression lossy formats on input? There is no point of even using high compression lossy formats at the authoring stage, they're designed for distribution. Authoring should be in lossless formats, compression optional.

kode54 on 29 Jan 2017

Yeah I don't like that people are using ogg or mp3 for sound effects, but
that seems to be the problem here..

On 29 January 2017 at 19:42, Christopher Snowhill notifications@github.com
wrote:

What you seem to be missing is that Opus itself does not allocate memory,
it's the container parser opusfile that allocates. And if you're dealing
with containers, you're already not real time.

So maybe, if allocation is so deathly bad for the mixing process, pre-walk
the entire Ogg stream into pre-sized arrays of packet sizes and packet
pointers, and hand those packets off to libopus yourself instead of using
opusfile?

I mean, cripes, talking about transcoding to ADPCM at the build stage,
just to spare CPU cycles? If, then, what is the point of even supporting
high compression lossy formats on input? There is no point of even using
high compression lossy formats at the authoring stage, they're designed for
distribution. Authoring should be in lossless formats, compression optional.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/godotengine/godot/issues/7496#issuecomment-275952476,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGVmPahCNtZEOosXUgjiqbnX4d0X1X-fks5rXRXngaJpZM4LfxKJ
.

punto- on 30 Jan 2017

@kode54 I don't think pre-parsing the opus block from the container to memory would be a problem, but I'm not sure that Opus does not allocate memory during playback, I think that's probably not the case, but have no idea how to find information about that.

@punto- No, the problem is that having two separate paths for playing audio, one for SFX and one for music is just a ton more code to maintain. Merging them is having a lot less code.

reduz on 30 Jan 2017

So I can confirm that libopus never ever allocates memory in anywhere outside of the functions with _create() in the name. That means it'll never do memory allocation during encoding or decoding. Opus is designed to run in real-time even in tiny DSPs, so it's very careful about allocation. In fact, you can even compile it without libc!

That leaves Ogg. I believe that the current libogg may do some buffer resizing on decode, but those should be rare, and likely only happen at the beginning of a stream. If that's still a problem, Ogg is a pretty simple format and if you're just decoding a file from the beginning without seeking, you can implement the Ogg parsing in very few lines.

jmvalin on 16 Jun 2017

@jmvalin Thanks for the information! Unfortunately no one in our team is familiar with the ogg format (and for anyone untrained in these kind of file formats I admit it's a bit scary), so I'm afraid it may be impossible for us to do this. If liboups or any of the official samples/code would provide a way to do decoding directly in the audio thread, we would be glad to add Opus back.

reduz on 16 Jun 2017

@reduz Have you actually tried using libopus+libogg? Did you observe anything going wrong? I had a look at the libogg code and basically it will only use realloc() when the current working buffer it's using is too small, which means it's going to be very rare. Not only that, but on modern systems a small realloc() is no longer a big deal RT-wise (no worse than many other things that can happen). I assume we're not talking about sub-millisecond latency, right?

Also, from what I read, the audio is originally on file anyway, which has much worse latency issues. If the idea is to read to memory and then have low latency decoding-from-memory, then it wouldn't be hard to simply do the Ogg unpacking on the "read-from-file" part, and then store the raw Opus packets in memory.

In any case, there's probably a dozen different ways this can work without much difficulty, assuming you're actually interested in Opus support. If so, you can ask further questions in #opus on irc.freenode.net or on the Opus mailing list.

jmvalin on 16 Jun 2017

❤1

@jmvalin Can you provide samples/code would provide a way to decode directly in the audio thread? It would helpful to retaining opus support.

@reduz is not familiar with the code and not having such example will mean dropping support for opus.

fire on 23 Jun 2017

@jmvalin We are interested in Opus support, but honestly just need a ready to go solution. We implemented Vorbis from stb_vorbis instead of libvorbis/libogg because they provided this packaged and ready to use. Most contributors here do their work on free time and are not that experience with audio programming (and much less container formats).

You mention things such as "packets" that to you probably obvious, but we honestly have little idea what you are refering to. All we need is to read audio frames from a function, mix them back and eventually seek. That's it.

reduz on 23 Jun 2017

The Ogg specification is available here and here. For opus, you can assume two additional things to hold:

There is a maximum size in bytes for packets. This makes writing an allocation free ogg parser possible in the first place. Not sure about the exact number though, its not put explicitly in the spec...
There is only one logical stream, the main opus stream, per physical stream (for what "logical"/"physical" stream means, consult the spec :P)

est31 on 3 Jul 2017

👍1

Just out of curiosity, what's the current (3.1.1) behavior of ogg files? This topic is fairly old, and I was curious if it would be safe to use .ogg files for all sound effects, or if those are still streaming and should be avoided for everything except for music. It would be nice if small .ogg files simply loaded into memory decompressed at load time and played back quickly in realtime. Should we still be using .wav files for performance?