Monogame: OpenGL implementation of VertexBuffer generates garbage

Created on 18 Sep 2019  路  12Comments  路  Source: MonoGame/MonoGame


Hi, I've been trying to figure out what generates garbage in VertexBuffer.SetData and I believe I have nailed it. In OpenGL implementation most of the methods are wrapped in a Threading.BlockOnUIThread with a lambda. The thing is that this lambda refences local variables, like here:
c# private void PlatformSetDataInternal<T>(int offsetInBytes, T[] data, int startIndex, int elementCount, int vertexStride, SetDataOptions options, int bufferSize, int elementSizeInBytes) where T : struct { Threading.BlockOnUIThread(() => SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options)); }
Even though the lambda it self may be referenced as a pointer to code, local variables referenced in this lambda have to be stored somewhere, and internally memory is allocated for them on heap (https://stackoverflow.com/a/7133064). I've tried to get rid of allocations in this case, but I believe it might be impossible. Actions are later stored in a List, so that they can be run on a main thread. Since there are different lambdas with different local variable counts and types referenced in each, it's not possible to hack it in a way that these arguments are stored in a List of some value type.

Right now I see the following possible solutions:
1) Create a class for each lambda type used in Threading.BlockOnUIThread, and keep them in an object pool.
2) Make user responsible for calling all these methods on ui thread, and just throw an exception if these methods are not called on ui thread (by a preprocessor directive or exposing additional methods)
3) Ignore this issue

What version of MonoGame does the bug occur on:

  • dc4b541e3 commit

What operating system are you using:

  • Windows
  • Linux

What MonoGame platform are you using:

  • DesktopGL
OpenGL Performance

Most helpful comment

This is an old discussion, but for anybody who reads about local functions not allocating

But the first example does not allocate, C# magic

C# compiler is pretty smart about local functions and instead of generating a class it will generate a struct.
For instance:

    private void PlatformSetDataInternal(int p)
    {
        void Buffer() => BufferData(p);  

        Buffer();
    } 

    private void BufferData(int p) {}

Then for local function void Buffer() the compiler will generate this:

    [StructLayout(LayoutKind.Auto)]
    [CompilerGenerated]
    private struct <>c__DisplayClass1_0
    {
        public C <>4__this;

        public int p;
    }

so it is stack allocated. But that is an extremely thin ice, because once you pass it to a delegate that is a reference type like Action:

    public static void BlockOnUIThread(Action action)

the compiler no longer sees any reason in generating a struct, because it can no longer be used as such, so once you got this:

   private void PlatformSetDataInternal(int p)
    {
        void Buffer() => BufferData(p);

        Threading.BlockOnUIThread(Buffer);
    }

it decides to generate a class:

    [CompilerGenerated]
    private sealed class <>c__DisplayClass1_0
    {
        public C <>4__this;

        public int p;

        private void <PlatformSetDataInternal>g__Buffer|0()
        {
            <>4__this.BufferData(p);
        }
    }

same goes if you make your local function to return a type that requires compiler time class generation like an iterator block IEnumerator<T>.

So unless we get a value type delegates and thus stack allocated closures it won't be possible to pass local functions as arguments without garbage allocation.

For those who want dive a bit deeper, examples I've used:
local func as struct
local func as class

All 12 comments

Hi!

I think that _2_ should be avoided. MonoGame is designed to have a consistent behavior across all targets with the same API. So I guess the solution would be to find a way to cache that.

I've also observed this and mitigated it with pretty simple code.
I used a local method nested inside the API method, e.g.

private void PlatformSetDataInternal<T>(int offsetInBytes, T[] data, int startIndex, int elementCount, int vertexStride, SetDataOptions options, int bufferSize, int elementSizeInBytes) where T : struct
{
    void Body() => SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options);

    if(Threading.IsOnUIThread())
        Body();
    else
        Threading.BlockOnUIThread(Body);
}

Local methods are just private methods that you can nest inside methods (no hidden allocations).
I don't remember what C# version the develop branch/different platforms use but it's surely not C# 7, so you would need to inline the Body() method:

private void PlatformSetDataInternal<T>(int offsetInBytes, T[] data, int startIndex, int elementCount, int vertexStride, SetDataOptions options, int bufferSize, int elementSizeInBytes) where T : struct
{
    if(Threading.IsOnUIThread())
        SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options);
    else
        Threading.BlockOnUIThread(() => 
            SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options););
}

Pooling the lambdas seems like a lot of work, but would function fine I suppose.

That technique does not prevent the allocation. The parameter values are used in the lambda or local function, so when you pass the function the current parameter values need to be stored. That's what's causing the allocation, capturing the parameter values in a closure, not the lambda by itself. If you use a lambda that does not capture variables outside its own scope, there's no allocation because the lambda can be stored in the program like a regular function. So the pooling solution would have us pool the _parameters_ that the lambda uses.

I don't think it was a good idea to support multithreaded OpenGL in the first place, but if we want to prevent the allocations here, I think the only acceptable solution fitting with MG's design is to go with pooling.

void Body() => SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options);

// also written as
void Body()
{
    SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options);
}

This is not a closure, it's a private method. Only Threading.BlockOnUIThread(Body); would actually allocate a closure, not the private method call.

SetBufferData(bufferSize, elementSizeInBytes, offsetInBytes, data, startIndex, elementCount, vertexStride, options);

This uses parameters outside the scope of the local method. If you pass the local method as a method group, to be able to execute it you need to know the value of those parameters at execution time. Therefore it does create a closure when you pass that method as a method group. The runtime creates a class that can hold the parameter values and creates an instance of it. Otherwise, when the function is actually executed it would not be able to know the parameter values because they're not on the stack anymore.

I'm not actually sure if this allocates (I believe it does, because I think local methods are only different in that they are scoped to the method that contains them but I'm not sure):

void DoSomething(string value)
{
    void DoSomethingInternal()
    {
        Console.WriteLine(value);
    }

    DoSomething();
}

But this definitely allocates:

void DoSomething(string value)
{
    void DoSomethingInternal()
    {
        Console.WriteLine(value);
    }

    Task.Run(DoSomethingInternal);
}

image
_Observe how valueOnStack3 is assigned after the method call._

Yes you are right about the lifetime of stack values, but local methods are practically just syntactic sugar for methods.
MonoGame doesn't support local methods because of the C# 7 requirement nonetheless. The best bet would be pooling or just writing out the method call twice.

Yes, Task.Run needs a delegate, so DoSomethingInternal will allocate a closure.
But the first example does not allocate, C# magic 馃寛

@TechnologicalPizza Oh, I see where you're coming from know! I thought you were talking about not having the closure allocation in general, not just in case you are on the UI thread. Sorry about that. What I tried to say - using your first example - was that
Threading.BlockOnUIThread(Body);
allocates a closure.

But the first example does not allocate, C# magic

Didn't know about that, pretty cool!

Great that we understood eachother, so now it's just to decide on the appropriate solution.

Keep in mind that System.Span<T> and System.Memory<T> will be available in the near future. I have a fork that uses Span<T> in a variety of places (Texture2D.SetData, [Vertex/Index]Buffer.SetData, GraphicsDevice APIs) where arrays are often used, and OpenGL is quite a hassle about this.

My solution was to add Memory<T> overloads so GL platforms can still call API's through other threads, where the Span<T> overloads throw if not on the UI thread (which was suggested solution 2).
DirectX supports the Span<T> overloads from any thread.

This is an old discussion, but for anybody who reads about local functions not allocating

But the first example does not allocate, C# magic

C# compiler is pretty smart about local functions and instead of generating a class it will generate a struct.
For instance:

    private void PlatformSetDataInternal(int p)
    {
        void Buffer() => BufferData(p);  

        Buffer();
    } 

    private void BufferData(int p) {}

Then for local function void Buffer() the compiler will generate this:

    [StructLayout(LayoutKind.Auto)]
    [CompilerGenerated]
    private struct <>c__DisplayClass1_0
    {
        public C <>4__this;

        public int p;
    }

so it is stack allocated. But that is an extremely thin ice, because once you pass it to a delegate that is a reference type like Action:

    public static void BlockOnUIThread(Action action)

the compiler no longer sees any reason in generating a struct, because it can no longer be used as such, so once you got this:

   private void PlatformSetDataInternal(int p)
    {
        void Buffer() => BufferData(p);

        Threading.BlockOnUIThread(Buffer);
    }

it decides to generate a class:

    [CompilerGenerated]
    private sealed class <>c__DisplayClass1_0
    {
        public C <>4__this;

        public int p;

        private void <PlatformSetDataInternal>g__Buffer|0()
        {
            <>4__this.BufferData(p);
        }
    }

same goes if you make your local function to return a type that requires compiler time class generation like an iterator block IEnumerator<T>.

So unless we get a value type delegates and thus stack allocated closures it won't be possible to pass local functions as arguments without garbage allocation.

For those who want dive a bit deeper, examples I've used:
local func as struct
local func as class

The only reason for using a local method was to not write the SetData(...) code twice.

Only Threading.BlockOnUIThread(Body); would actually allocate a closure, not the private method call.

It wasn't really meant to reduce garbage allocs but to reduce duplicated code 馃槈

馃憤 just wanted to give more clarity on the topic.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dazinator picture dazinator  路  5Comments

griseus picture griseus  路  5Comments

rds1983 picture rds1983  路  5Comments

Ellesent picture Ellesent  路  5Comments

Grabiobot picture Grabiobot  路  5Comments