Godot version:
master
/ 511742eb68790345abf594b42b6c25b38360c3df
OS/device including version:
_Manjaro Linux 17.1.1_
Issue description:
When running an example code that creates new Sprite
instances and manipulates them (actual code below), the C# version runs almost 130 times slower than an equivalent code written in GDScript.
Steps to reproduce:
Attach the following scripts to a node one at a time and compare the results after running them:
```c#
using Godot;
public class Test {
public override void _Ready() {
var start = OS.GetTicksMsec();
for (var i = 0; i < 100000; i++)
{
var s = new Sprite();
s.SetPosition(new Vector2(100, 100));
s.SetName("Hello");
s.Free();
}
GD.Printt("GDScript:", OS.GetTicksMsec() - start);
}
}
Above code prints out `38701`, while below one shows `300`.
```gdscript
extends Node
func _ready():
var start = OS.get_ticks_msec()
for i in range(100000):
var s = Sprite.new()
s.set_position(Vector2(100, 100))
s.set_name("Hello")
s.free()
printt("GDScript:", OS.get_ticks_msec() - start)
CC @neikeq
I'm going to make a wild guess and say it's the garbage collection. For an extreme example like this you'd want to use object pooling to get rid of the GC.
yes, most likely the garbage collector
On Fri, Jan 26, 2018 at 9:50 AM, Nathan notifications@github.com wrote:
I'm going to make a wild guess and say it's the garbage collection. For an
extreme example like this you'd want to use object pooling to get rid of
the GC.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/godotengine/godot/issues/16076#issuecomment-360776373,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF-Z2_c4hnJRfXS8lNzd4Wq47SUvSA1Kks5tOcoJgaJpZM4Rt4GV
.
I compared each line separately and got this (10000 iterations):
C# 1: 11659
C# 2: 7
C# 3: 33
C# 4: 135
GDScript 1: 63
GDScript 2: 16
GDScript 3: 20
GDScript 4: 40
Line 3 is probably a bit slower because of string marshaling, not much we can do about it.
Line 4 is probably slower because of the code generated for free, internal call -> ptrcall -> Object::call
with a String
that must be converted to StringName
everytime, when we could make it internal call -> Object::call
with a cached StringName
.
Line 1 is clearly the code that takes most of the time. I find it hard to believe the internal call for constructing the native instance and tying it to the MonoObject *
takes that long, so it must be the GC. I will compare the internal call, without the GC allocations for Sprite, to confirm.
My conclusion is the same as neikeq. Heap memory allocation/deallocation is a well known weakness of C#. You really need to implement object pooling in order to speed up your code. Doing this in your GDScript code would probably help as well. Just as in neikeq's examples above, the actual set position calls are 2x faster in C# than GDScript.
Here's a basic rule to follow for C#:
Value types (struct, int, float, etc) get created on the stack: very fast
Classes get created on the heap: very slow, use sparingly
Vector2 and Vector3 are structs, therefore they're very fast.
If you need an example of object pooling, it looks like this:
This will make your memory allocations essentially disappear unless they are absolutely necessary.
Here's a real world example of object pooling in an AI system I'm writing, it's still a work in progress:
https://github.com/NathanWarden/godot_ai_csharp/blob/master/Godot_Project/BehaviorTree/AttackSystem/Weapons/LauncherWeapon3D.cs
That slowdown is kind of insanely slow, though.
100,000 iterations giving you 11.659 seconds means freeing ONE sprite costs roughly 0.1 second (but packed in big GC stalls).
@Zylann I agree, that does seem to be excessively slow even for the garbage collector.
@Zylann That's what it would take for 1000 sprites, not one; but my output was actually with 10.000 iterations, so 100 sprites. But that's making assumptions, here is a tests:
Sprite x 1: 3 msecs
Sprite x 10: 14 msecs
Sprite x 100: 124 msecs
Sprite x 1000: 1232 msecs
Sprite x 10000: 12023 msecs
I just tested with the internal call only and it took roughly 120 msecs for 10.000 iterations, so it's definitely the GC allocations.
I wanted to test the performance with release configurations, so I replaced the Debug assembly with one built with /p:Config=Release and the result was (10.000 iterations):
GDScript 1: 62
GDScript 2: 11
GDScript 3: 16
GDScript 4: 40
C# 1: 11018
C# 2: 6
C# 3: 30
C# 4: 121
Pretty much the same... so I decided to test it with a Godot release template, and surprise:
GDScript 1: 19
GDScript 2: 4
GDScript 3: 7
GDScript 4: 8
C# 1: 1792
C# 2: 3
C# 3: 8
C# 4: 34
So if anyone is making release benchmarks, make sure to not only build the assembly for release, but also use a release template.
EDIT: Added GDScript times
I'm starting to the issue isn't with the garbage collector, because the garbage collector only gets hit periodically and only after the objects it's cleaning up go out of scope. Unless I'm missing something, the performance hit isn't being caused when anything is going out of scope, since nothing has gone out of scope.
I'm getting the biggest performance hit on the node creation.
A couple of notes:
I think there are some steps that could be taken:
For instance on point 2:
C#
Sprite[] sprites = GodotSharp.Spawn<Sprite>(10000);
@neikeq Wow, that's a big difference! :) So, maybe it's the debug code in the editor that causes the performance hit? That would make sense.
So, maybe it's the debug code in the editor that causes the performance hit? That would make sense.
Well yes and no. Debug code does impact performance (check GDScript comparison between debug and release), but it doesn't explain 1792 ms for 1. vs 19 ms for GDScript.
So, maybe it's the debug code in the editor that causes the performance hit? That would make sense.
Well yes and no. Debug code does impact performance (check GDScript comparison between debug and release), but it doesn't explain 1792 ms for 1. vs 19 ms for GDScript.
Yes, my example was to show that it improves a lot in release builds, not that it was the cause for the performance hit.
I still find it weird that it takes that long though. I've tried to reproduce the same slowdown with custom classes with the exact same construction behavior, except the internal call functions they call are empty, and the performance was of about 2 ms... It must be something with the internal call, I'm not sure what I measured wrong when I discarded that, but I'll check again.
EDIT: Well, basically removed out the internal call from the constructors and nothing changed, so I'm speechless :P I'll have to profile later.
Tested C# code on macOS - 10000 iterations take 57000 msec (release_debug Godot, Debug assembly).
Xcode time profiler point to string conversion in the godot_icall_ClassDB_get_method
(80% weight):
@bruvzg Looks like we found that out at the same time :D
Now Sprite x 10.000 on a release template takes 48 ms.
Line 4 is probably slower because of the code generated for free, internal call -> ptrcall -> Object::call with a String that must be converted to StringName everytime, when we could make it internal call -> Object::call with a cached StringName
@neikeq @karroffel I'm binding AngelScript to Godot.
The String
to StringName
convertion also takes a large part of the function. In this case it takes about 28% time in this code block.
for (int i=0; i<100000; i++) {
godot::Sprite s;
s.set_position(Vector2(100, 100));
s.set_name("Hello");
s.free();
}
I read the implement of the StringName and find it is hard to speed it up by cache the StringName objects. As the contents of it may be changed by another constructor of StringName .
@reduz Do you have any idea about that?
@Geequlim You can safely cache StringName. Have a look at core/core_string_names.h.
@neikeq Thank you!
Here is a simple benchmark of the AngelScript binding for this code below
for (int i=0; i<100000; i++) {
godot::Sprite s;
s.set_position(Vector2(100, 100));
s.set_name("Hello");
s.free();
}
GDScript run the same version of test spend 251ms
Object::call
spend 307ms
Object::call
spend 422ms
@Geequlim if you are binding angelscript (which is statically typed, right?) remember you have ptrcall.
Most helpful comment
Now Sprite x 10.000 on a release template takes 48 ms.