Godot: Skeleton performance is low on GLES2 Android

Created on 8 Apr 2020  路  33Comments  路  Source: godotengine/godot

Godot version:
3.2.2 02ed72c373

OS/device including version:
Linux mint 19.3 / Galaxy S8+

Issue description:
FPS is dropped to around 40 when using skeleton on GLES2.
It's steady 60fps on GLES3 with same scene.
I tested it on GLES2 first and then GLES3 with the same amount of time.

Screenshot_20200409-001642_skeleton_performance_gles2
gles2_phone
Screenshot from 2020-04-09 00-42-57
GLES2

Screenshot_20200409-001907_skeleton_performance_gles2
image
Screenshot from 2020-04-09 00-47-04
GLES3

(ignore the distorted mesh with bones)

Steps to reproduce:

  1. Download attached project
  2. run on Android GLES2
  3. wait about 1 minute or little more and see FPS is getting to drop
  4. run on Android GLES3
  5. see pretty steady 60fps even after several minutes

Minimal reproduction project:
skeleton_performance_gles2.zip

bug android rendering

Most helpful comment

@pouleyKetchoupp wow. I just tested it with my project.
without software skinning, it runs 35~40 fps on Galaxy s8+
with software skinning, it's pretty stable 60 fps! yes!

All 33 comments

I think GLES2 uses a software path for skeletons (see define USE_SKELETON_SOFTWARE). I don't remember why but I guess it probably means that the GPU implementation requires features that are not available on all GLES2 devices.

This issue makes an irony for me.
I need GLES2 for Android game for supporting many devices, stability and performance.
GLES3 gives crashes on many devices, but better performance with Skeleton.
I can't choose either... :cry:

Well the thing is that if your device supports GLES3, it should also support the GPU skeleton path on GLES2, unless for some weird reason the driver vendors decided not to implement the float texture support and vertex textures on GLES2 even though their GLES3 (and thus hardware) supports them.

If it's not taking the software path though, that would be a bug as the GLES2 GPU path shouldn't be drastically slower than the GLES3 (I don't have specific knowledge about this though, this is just an expectation).

If you are using the software path, we could still look into possible ways to optimize it for speed.

the driver vendors decided not to implement the float texture support and vertex textures on GLES2 even though their GLES3 (and thus hardware) supports them

I guess this is the case.
according to test result in OP, it runs well on GLES3.

Well it's not the same code on GLES2 and GLES3, so that doesn't tell us which code path it uses on GLES2.

Please check this as I mentioned: https://github.com/godotengine/godot/issues/37696#issuecomment-611127505

You can add a print_line, or check your device specs with http://opengles.gpuinfo.org/ (there's an Android app to generate the report if your device is not already in the list).

here's my phone specs.
http://opengles.gpuinfo.org/displayreport.php?id=4476

and confirmed it uses skeleton software.

in godot/drivers/gles2/rasterizer_storage_gles2.cpp

config.use_skeleton_software = (config.float_texture_supported == false) || (config.max_vertex_texture_image_units == 0);
if (config.use_skeleton_software) print_line("use_skeleton_software = true");
else print_line("use_skeleton_software = false");
--- Debugging process started ---
Godot Engine v3.2.2.rc.custom_build.97fe589ff - https://godotengine.org
OpenGL ES 2.0 Renderer: Mali-G71
use_skeleton_software = true

The logic is correct, as without either float texture or vertex texture read the hardware path won't work. Looking at the phone specs it looks like it doesn't support float texture in GLES2.

If I remember correctly the skeleton software path may be horribly inefficient, I'd not seen that approach before, and I suspect it was done because it was easier to retrofit to the existing pipeline rather than efficiency.

Probably more standard hardware skinning (passing the matrices in array or uniform), or even software skinning would be faster. But they might be a bit more involved to fit into the existing framework. It might be worth adding both because some hardware has bugs about what hardware methods are supported, and software skinning will always be supported.

Same problem here, im working with a scene with 10 characters with skeletons and the fps stay in 2 or 3 in GLES2, but runs normally on GLES3 at 60FPS. This frame rate makes the game unplayable.

I would really like this problem to be solved asap because i intended to publish this game soon, but this incompatibility is becoming a huge obstacle for us, and i don't want to publish it with GLES3 because of a lot of others incompatibilities.

This problem occurs on this device (Samsung Galaxy A8+):

IMG-20200616-WA0020

The problem occur on Sansumg Galaxy S8+ too, and did not occur on the Zenfone 5, Zenfone Selfie and Moto Z 2 Play that I tried.

Same problem here, im working with a scene with 10 characters with skeletons and the fps stay in 2 or 3 in GLES2

How many vertices per model out of interest? You may need to drop your vertex count - high vertex count skinned models is unlikely to work well with fallback methods. You could for example, ship 2 variations of skinned mesh, and switch at runtime depends on your frame rate.

Or perhaps there is something else going on, depends on your models - 2,3 fps is quite low.

@pouleyKetchoupp actually already wrote a software skinning implementation recently as part of:
https://github.com/godotengine/godot-proposals/issues/784

Which we suggested to reduz at the time might be of use for software skinning fallback, but he was against it, I can't find the irc logs. He may have believed it wouldn't be faster than the existing fallback. Irrespective, I'm aiming to experimentally try this out for 4.x as well as some alternate hardware skinning implementations. @endragor also did some earlier research in this area I believe.

@pouleyKetchoupp actually already wrote a software skinning implementation recently as part of:
godotengine/godot-proposals#784

The software skinning I've implemented is currently limited to GLES3. It crashes with GLES2 on exported games, because RasterizerStorageGLES2::mesh_surface_get_array is allowed only in tool.

https://github.com/godotengine/godot/blob/7f6767470dda159832162e0670ecbe1fbdfa7e71/drivers/gles2/rasterizer_storage_gles2.cpp#L2671-L2673

I haven't investigated this problem yet, so I'm not sure how it works in the editor and if it would be possible to either make this functionality available for non-tool, or change the skinning implementation to update the vertex buffer in a different way.

Ah I was hoping you'd got around that. :smile: That was the bit I was going to copy lol. Yeah if there's no support for dynamic VBs in GLES2 we will have to write for 4.x.

How many vertices per model out of interest? You may need to drop your vertex count - high vertex count skinned models is unlikely to work well with fallback methods. You could for example, ship 2 variations of skinned mesh, and switch at runtime depends on your frame rate.

@lawnjelly on blender: 2,155 vertices; on godot: 14,850 on GLES2 and 2,358 on GLES 3. I do not believe that it is a high quantity, the project is low poly by nature and I have had a lot of difficulty to reduce even that quantity, I do not know if it is possible to reduce it even more without losing a considerable level of quality.

If i run the project with only one character on scene the fps is around 20, but with 2 characters it reduces to 3. The reduction occurs since the scene begins, not after some minutes as the original report of @volzhs, and if i put the object on scene without skeleton it runs at 60 FPS normally.

After post i tried on a Sansumg Galaxy S9 with a Snapdragon and the problem does not occurs, apparently it's just happening with sansumg's Exynos Chipset.

Prints of my animated model:

image

image

image

@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2.

The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported.

@Host32 Sorry I only just saw this.

14K skinned verts will indeed toast a lot of GLES2 devices, even best case. It is interesting the discrepancy between GLES2 and GLES3 (not sure how this is measured, debug monitor?). Are you using shadows? That could be causing problems too, each shadow might be causing another skinning pass (I haven't really examined this stuff yet, @clayjohn will know more). I would try turning shadows off to confirm this.

One extra advantage of software skinning is that you can reuse the same skinned mesh for shadow passes.

@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2.

The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported.

Retrieving mesh array data shouldn't be necessary for skinning. Maybe it is for historical reasons in the functions that are currently available for dynamic use. The relationship only needs to be one way.

Retrieving mesh array data shouldn't be necessary for skinning. Maybe it is for historical reasons in the functions that are currently available for dynamic use. The relationship only needs to be one way.

In my case with subdivision, I need to retrieve weights from the array data so I can apply skinning from the mesh directly without storing any extra information. But yeah, there are probably better ways to implement software skinning within the GLES2 code.

@lawnjelly

14K skinned verts will indeed toast a lot of GLES2 devices, even best case.

I didn't see it in practice, all the devices i tested with 1GB of RAM and weak CPUs could run the project at 60FPS, only this specific chipset has problems with the animations.

not sure how this is measured, debug monitor?

I have no idea. I see that enabling "View Information" on the editor.

Are you using shadows?

No, i tested it in every possitble way and got the same results. The only difference when enabling shadows is that performance also drops a little on GLES3, but nothing really significant. I did an extensive job to optmize the shaders and use as few passes as possible, so the rendering cost is very low. The only factor that is reducing performance to the point of making the game unplayable is the use of the skeletal animations in the characters.

@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2.

The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported.

@pouleyKetchoupp what do i need to do to teste your algorithm? Can i compile the project from your fork or do i need to wait for these changes to be merged with the official branch? I'm in a bit of a hurry for this solution as it could compromise our launch schedule.

906b5e7f3f it's already merged since stable-3.2. so there is if you using 3.2.x
I also tested in a various way but the result is always same when using skeleton animation as @Host32 did.
GLES3 performs good but crashes on many devices, GLES2 performs badly on some devices but runs fine on every device.
_I am eager this is solved anytime soon._

@Host32 @volzhs I've just made a quick implementation of the software skinning I'm using for subdivision in MeshInstance directly:
https://github.com/nekomatata/godot/commit/4bc6923bb70f1f4da0dcdf727641fa95d24aa6ee

If you want to test it, you need to compile a custom version of the 3.2 branch including this commit. Make sure it's the latest 3.2 branch to get #40235 otherwise you'll be spammed with errors at runtime.
In order to test on Android, you also need to compile custom templates using the same changes.

Then you can just check "Software Skinning" property in a mesh instance to test it.

@Host32 @volzhs I've just made a quick implementation of the software skinning I'm using for subdivision in MeshInstance directly:
nekomatata@4bc6923

If you want to test it, you need to compile a custom version of the 3.2 branch including this commit. Make sure it's the latest 3.2 branch to get #40235 otherwise you'll be spammed with errors at runtime.
In order to test on Android, you also need to compile custom templates using the same changes.

Then you can just check "Software Skinning" property in a mesh instance to test it.

I suspect that implementation could be sped up quite a bit too, but it would be interesting to see the comparison with the hardware method, I will try your version later. :+1:

EDIT - Updated figures are in post below. Using OP's test project on desktop (Intel integrated GPU).

Things get interesting once you start adding lights, presumably because the software skinning is a one off cost and can be reused for shadow passes.

Ok I've now forced SKELETON_SOFTWARE path (which as I said is very inefficient). The results are very enlightening:

RELEASE BUILDS

  • Using the OP's project, with joints hidden, just showing Beta_Surface2.
  • Screen is 320x240, shadow map 256x256 (in order to decrease fill rate effects as we are interested in vertex processing)
  • Intel Core i5-7500T 2.7ghz
  • Intel HD graphics 630
  • Software skinning has the modification I mentioned a few posts down.

4 directional lights:

Software Skinning 477fps
Hardware (SKELETON_SOFTWARE) 18fps
Hardware (Main method) 387fps

1 directional light

Software Skinning 550ps
Hardware (SKELETON_SOFTWARE) 62fps
Hardware (Main method) 1170fps

Lights off

Software Skinning 580fps
Hardware (SKELETON_SOFTWARE) 283fps
Hardware (Main method) 3055fps

Software skinning is beating the current hardware fallback path by 2x, and by an increasing margin as lights are added. These lights are directional so may have splits so increase the advantage of the one off skinning.

P.S. For anyone wanting to force USE_SKELETON_SOFTWARE to compare, add the test line here to rasterizer_storage_gles2.cpp, line 6061, in order to hard code it.

    // the use skeleton software path should be used if either float texture is not supported,
    // OR max_vertex_texture_image_units is zero
    config.use_skeleton_software = (config.float_texture_supported == false) || (config.max_vertex_texture_image_units == 0);
    // test
    config.use_skeleton_software = true;

@pouleyKetchoupp wow. I just tested it with my project.
without software skinning, it runs 35~40 fps on Galaxy s8+
with software skinning, it's pretty stable 60 fps! yes!

@pouleyKetchoupp I found that all skeleton nodes play the same animation at the same time with duplicated instances when using software skinning.

@pouleyKetchoupp wow. I just tested it with my project.
without software skinning, it runs 35~40 fps on Galaxy s8+
with software skinning, it's pretty stable 60 fps! yes!

This is all adding up to be a convincing case to have this available in addition (and possibly replace in the long run) the USE_SKELETON_SOFTWARE path. I spoke to @clayjohn yesterday and he agreed it seemed convincing.

We can try and bring this to @reduz attention on irc so we can all discuss it.

I'm not sure about the history of USE_SKELETON_SOFTWARE, but indeed it seems somewhat pointless. A faster implementation (and the one we use) would be to store bone transforms in a uniform vector. The downside is that size of uniforms is limited, but even on older devices it allows about 75 bones per mesh, which is more than enough for most use cases on mobile. The uniform bone limit could be set as a project setting.
This is also significantly faster than the "main method" where transforms are provided through a texture.

I'm not sure about the history of USE_SKELETON_SOFTWARE, but indeed it seems somewhat pointless. A faster implementation (and the one we use) would be to store bone transforms in a uniform vector. The downside is that size of uniforms is limited, but even on older devices it allows about 75 bones per mesh, which is more than enough for most use cases on mobile. The uniform bone limit could be set as a project setting.
This is also significantly faster than the "main method" where transforms are provided through a texture.

Yup indeed I also understood this to be the most common method of skinning for GLES2 (and was thinking in terms of using this for the rewrite of GLES2 3d in 4.x, with software skinning fallback).

@pouleyKetchoupp I found that all skeleton nodes play the same animation at the same time with duplicated instances when using software skinning.

Could you please share a minimal repro for this case?

@pouleyKetchoupp - I've made a minor modification to the skinning code:

No lights:

Software skinning 199fps -> 580fps
USE_SKELETON_SOFTWARE 285fps

So now software skinning is twice as fast as the old fallback path, even with no lights, and should be even faster with lights.

The modification was to add this at the start:

    // pre get bones
    int num_bones = visual_server->skeleton_get_bone_count(skeleton);
    const int SKIN_MAX_BONES = 128;
    Transform bone_transform[SKIN_MAX_BONES];
    for (int n=0; n<num_bones; n++)
    {
        bone_transform[n] = visual_server->skeleton_bone_get_transform(skeleton, n);
    }

and change the per vertex to this:

            int b0 = bone_id[0];
            int b1 = bone_id[1];
            int b2 = bone_id[2];
            int b3 = bone_id[3];

            Transform transform;
            transform.origin =
                    bone_weight[0] * bone_transform[b0].origin +
                    bone_weight[1] * bone_transform[b1].origin +
                    bone_weight[2] * bone_transform[b2].origin +
                    bone_weight[3] * bone_transform[b3].origin;

            transform.basis =
                    bone_transform[b0].basis * bone_weight[0] +
                    bone_transform[b1].basis * bone_weight[1] +
                    bone_transform[b2].basis * bone_weight[2] +
                    bone_transform[b3].basis * bone_weight[3];

(The aliasing isn't necessary of course).

There's probably still quite a bit of gains to be got.

I've also worked out why the cliff performance drop with lights, it was the shadow maps default to 4096 size. I'm going to rerun the tests with a 256 size shadow map and the improved skinning code, will update the earlier post - DONE.

If anyone wants to test with these modifications, my branch is at:
https://github.com/lawnjelly/godot/tree/soft_skin

@pouleyKetchoupp Something I just noticed, we are not transforming normals I don't think? So it is not exactly like for like at the moment.

For performance reasons in software skinning it can be nice to have the option to not transform normals, but it should be optional. This might be something that could benefit from a per mesh setting, as well as a global setting - you could e.g. not transform normals on enemies, but do transform on the main player.

@lawnjelly I've just pushed a new version with prepared bone transforms and I've made it a draft PR to make it easier to test and make more changes if needed : #40313

Good point for the normals, and it sounds good to have it as an option.

Fixed by #40313.

Was this page helpful?
0 / 5 - 0 ratings