Godot: High GPU usage with Particles2D with Radeon

Created on 16 Oct 2020  ·  34Comments  ·  Source: godotengine/godot

Godot 3.2.3
macOS Catalina 10.15.7
AMD Radeon Pro 570X 4 GB
GLES3

1 - Adding a Particles2D node: the GPU goes up to 70% in the editor.
2 - Adding a texture to the Particles2D node: the GPU goes up to 100% in the editor.

Related:

bug confirmed macos editor rendering

Most helpful comment

Done some testing:

  • I can confirm the issue, GPU usage jumps from 0...8% to 70...90% after adding Particles2D to the empty scene, CPU usage from 3...6% to 30...35%.
  • Seems like only editor is affected, exported project never goes above 15% (with v-sync on and, regardless of low-cpu mode) and above 40% GPU usage with v-sync off.

Xcode profiler seems to give almost no useful info for OpenGL apps (it's focused on Metal apps now).

Here's heaviest CPU stack:

  19  12908.0  Godot_mono (23154) :0
  18  11601.0  Main Thread  0x39066 :0
  17 libdyld.dylib 11477.0  start
  16 Godot 11477.0  main
  15 Godot 10459.0  OS_OSX::run()
  14 Godot 9481.0  Main::iteration()
  13 Godot 8640.0  VisualServerRaster::draw(bool, double)
  12 Godot 6075.0  VisualServerViewport::draw_viewports()
  11 Godot 6034.0  VisualServerViewport::_draw_viewport(VisualServerViewport::Viewport*, ARVRInterface::Eyes)
  10 Godot 6017.0  VisualServerCanvas::render_canvas(VisualServerCanvas::Canvas*, Transform2D const&, RasterizerCanvas::Light*, RasterizerCanvas::Light*, Rect2 const&, int)
   9 Godot 5533.0  RasterizerCanvasGLES3::canvas_render_items(RasterizerCanvas::Item*, int, Color const&, RasterizerCanvas::Light*, Transform2D const&)
   8 GLEngine 1898.0  glDrawArrays_GL3Exec
   7 GLEngine 1329.0  gleDrawArraysOrElements_Entries_Body
   6 GLEngine 1286.0  gleDoDrawDispatchCoreGL3
   5 AMDRadeonX4000GLDriver 954.0  gldUpdateDispatch
   4 AMDRadeonX4000GLDriver 394.0  glrATI_SI_UpdateHardwareState
   3 AMDRadeonX4000GLDriver 216.0  glrATI_SI_ValidatePipelinePrograms
   2 AMDRadeonX4000GLDriver 143.0  glrAMD_LoadConstants
   1 AMDRadeonX4000GLDriver 32.0  glrAMD_Hwl_BindConstantBuffer
   0 AMDRadeonX4000GLDriver 18.0  glrATI_SI_SRDMgrBindBuffer

All 34 comments

This will likely be fixed by https://github.com/godotengine/godot/pull/42734 in 3.2.4. If you can build from source, can you test from the latest 3.2 branch?

I've just made a unified batching build which has the buffer orphaning, so you are welcome to try this:
https://github.com/lawnjelly/Misc/releases/tag/ewok_v0.17

That said afaik the particles2d uses a multimesh command, which I don't remember changing anything in (unless indirectly).

I've just made a unified batching build which has the buffer orphaning, so you are welcome to try this:
https://github.com/lawnjelly/Misc/releases/tag/ewok_v0.17

That said afaik the particles2d uses a multimesh command, which I don't remember changing anything in (unless indirectly).

I tried it, it doesn't work. (15" macbok pro retina, I tried with scaled 1440x900 option in system settings)
There are cases that are better compared to godot_v3.2.3-stable_osx.64. but the problem was not completely fixed.

  • When opened with open in low resolution, GPU usage drops by half.
  • When the AnimationTree panel is hidden, the GPU is drastically reduced. (not for AnimationPlayer)
  • Dragging with hand tool in low res - 50%
  • Dragging with hand tool - 100%

_AnimationTree_ panel experiment (100%)
https://streamable.com/1pqhh8

low-res _AnimationTree_ experiment (50%)
https://streamable.com/4qoyp8

_Animation panel timeline_ experiment(hiding the panel makes no difference - 100%)
https://streamable.com/ghhp01

That was kind of expected, but worth a try none the less. (I'm not really familiar with the particles code, have just left it as is - it doesn't look easily batchable in GLES3).

It's likely the improvement you are seeing is due to the improvements in the performance in the rest of the UI, rather than any improvement in the particles.

When the AnimationTree panel is hidden, the GPU is drastically reduced. (not for AnimationPlayer)

Yes. I also have 100% GPU usage with the AnimationTree node.

Just to clarify, if you are reporting a performance problem that isn't to do with particles, that should be reported in a separate issue, so it can be investigated and fixed.

It is highly likely that the buffer orphaning didn't fix all the issues on Mac, but one issue for one bug is far easier to deal with. :+1:

It is also perfectly fine to reference my build numbers in the issue, if you are using them rather than building from source. They are built on the day of release from 3.2 dev branch, and with the unified batching PR, so it is fairly easy to map them to the commits in core.

Just to clarify, if you are reporting a performance problem that isn't to do with particles, that should be reported in a separate issue, so it can be investigated and fixed.

There is a contradiction here. @Calinou says that the animation and particles problem has been fixed in this pull request. https://github.com/godotengine/godot/issues/42849#issuecomment-710029823

cap

No, as it says above, _'this is presuming these issues are due to the same bug'_.

We can't test on Macs because neither myself or @clayjohn has a Mac, therefore we have to rely on second hand reports as to whether bugs have been fixed. (I only have linux machines)

What is your frame time / FPS like when the GPU usage jumps like that? Sounds to me like V-sync may be turned off/ not working.

What is your frame time / FPS like when the GPU usage jumps like that? Sounds to me like V-sync may be turned off/ not working.

Adding a Particles2D node

https://github.com/lawnjelly/Misc/releases/tag/ewok_v0.17
project settings > V-sync ON/OFF

“Open in Low Resolution” = on (1440x900) V-sync ON/OFF
cap 2

“Open in Low Resolution” = off (1440x900) V-sync ON/OFF

cap 3

When scene is played, framerate is +60 fps. The problem is in the editor.
Captura de pantalla 2020-10-16 a las 18 18 40

Looks like a duplicate of https://github.com/godotengine/godot/issues/40499

It seems on certain hardware setups V-sync is working in the editor

Looks like a duplicate of #40499

It seems on certain hardware setups V-sync is working in the editor

No. this is not duplicate. GPU is not high in the blank editor. GPU high in animated content.
also i have tested on Windows right now. It's not just limited to Mac.

Godot 3.2.3 stable - Amd radeon hd 7350 - 1920x1080 screen resolution (not hi-dpi)
Particles2D = 88% GPU
Animated content = 88% GPU
3D viewport = 90-100%

While the variation in GPU usage depending on window size suggests the 2nd issue may not be to do with buffer stalls, I've still made another Mac only build with a few more tweaks that could possibly be more driver friendly for buffer usage. This is basically random guesswork, but is provided if anyone wants to test.

https://github.com/lawnjelly/Misc/releases/tag/ewok_v0.18

Incidentally, I noticed that the particles system is quite different on GLES2 and GLES3. Do you guys get the same slowdown on both APIs? Also be sure to try both with batching on and off in the editor, this may provide some extra information.

Incidentally, I noticed that the particles system is quite different on GLES2 and GLES3. Do you guys get the same slowdown on both APIs? Also be sure to try both with batching on and off in the editor, this may provide some extra information.

Particles2D:

  • GLES3: 100 % GPU in editor

CPUParticles2D:

  • GLES3: 100 % GPU in editor
  • GLES2 with 16 K buffer batching: 100 % GPU in editor
  • GLES2 with 64 K buffer batching: 100 % GPU in editor
  • GLES2 without batching: 100 % GPU in editor

It doesn't seem a problem of the particle nodes, because GPU particles and CPU particles behave the same in the editor.

AMD Radeon Pro 570X 4 GB

Looks like a duplicate of #40499
It seems on certain hardware setups V-sync is working in the editor

No. this is not duplicate. GPU is not high in the blank editor. GPU high in animated content.
also i have tested on Windows right now. It's not just limited to Mac.

Note that GPU being high in animated content only does not mean clayjohn's hypothesis is incorrect. In the editor (unless continuous update is on which is slightly different), frame are drawn due to requests, rather than on a schedule.

During normal use, there aren't requests (due to no changes to the UI) so new frames aren't drawn (GPU use is low). When animated content is shown, it makes a request after each frame is drawn, and this will either be limited by vsync, unlimited (in which case 100% GPU) or limited by sleeps in the low processor etc code (that Calinou is familiar with).

So 100% GPU is entirely consistent with vsync / sleeps not working. The big question, rather than GPU percentage, is how does the editor feel / perform? Is it chugging, or is it running smoothly?

So far I had been interpreting this issue as a performance problem (the buffer usage is primarily concerned with performance problems), but am now getting the impression this is not the case, it is purely an observation of high GPU?

Particles2D:
GLES3: 100 % GPU in editor

Thanks. It was worth trying.

If I was to hazard a guess I would also say the current data leans towards this second bug being to do with the refresh frequency (as clayjohn says, a duplicate of #40499) or fill rate (but this could be a secondary effect of the former). Maybe someone familiar with the Mac OS platform specific code could take a look (@bruvzg , @samgreen ?) to either confirm or deny the refresh hypothesis / see if they could work out if something was going wrong in this area?

EDIT: bruvzg has confirmed that Sleep is working correctly on his Mac.

Or maybe there is a Mac graphics debugger which will tell you how many frames have been rendered in the Godot window (sorry, not familiar with Macs)?

Another thing to try:

  • In project_settings->application->run->frame_delay_msec, try setting this to say 300. Let us know if this has an effect. This should let us know whether sleep is working on your hardware. Let us know what the GPU usage is with this.

Yes, I began to notice a certain lag in the editor and operating system windows, so I started to investigate and found that the Particle2D and AnimationTree nodes produced 100% GPU loading.

I tryed Low Processor Mode, but It doesn't change the GPU load.

New experiment:

  • New project.
  • Added a Particle2D node (nothing more: no material nor texture)

Result:

  • macOS Catalina: 100 % GPU
  • Windows 10: 35 % GPU

AMD Radeon Pro 570X 4 GB

Have you tried this:

In project_settings->application->run->frame_delay_msec, try setting this to say 300. Let us know if this has an effect. This should let us know whether sleep is working on your hardware. Let us know what the GPU usage is with this.

If this works to reduce GPU it makes it likely there is a problem with the low processor logic or vsync.

Done some testing:

  • I can confirm the issue, GPU usage jumps from 0...8% to 70...90% after adding Particles2D to the empty scene, CPU usage from 3...6% to 30...35%.
  • Seems like only editor is affected, exported project never goes above 15% (with v-sync on and, regardless of low-cpu mode) and above 40% GPU usage with v-sync off.

Xcode profiler seems to give almost no useful info for OpenGL apps (it's focused on Metal apps now).

Here's heaviest CPU stack:

  19  12908.0  Godot_mono (23154) :0
  18  11601.0  Main Thread  0x39066 :0
  17 libdyld.dylib 11477.0  start
  16 Godot 11477.0  main
  15 Godot 10459.0  OS_OSX::run()
  14 Godot 9481.0  Main::iteration()
  13 Godot 8640.0  VisualServerRaster::draw(bool, double)
  12 Godot 6075.0  VisualServerViewport::draw_viewports()
  11 Godot 6034.0  VisualServerViewport::_draw_viewport(VisualServerViewport::Viewport*, ARVRInterface::Eyes)
  10 Godot 6017.0  VisualServerCanvas::render_canvas(VisualServerCanvas::Canvas*, Transform2D const&, RasterizerCanvas::Light*, RasterizerCanvas::Light*, Rect2 const&, int)
   9 Godot 5533.0  RasterizerCanvasGLES3::canvas_render_items(RasterizerCanvas::Item*, int, Color const&, RasterizerCanvas::Light*, Transform2D const&)
   8 GLEngine 1898.0  glDrawArrays_GL3Exec
   7 GLEngine 1329.0  gleDrawArraysOrElements_Entries_Body
   6 GLEngine 1286.0  gleDoDrawDispatchCoreGL3
   5 AMDRadeonX4000GLDriver 954.0  gldUpdateDispatch
   4 AMDRadeonX4000GLDriver 394.0  glrATI_SI_UpdateHardwareState
   3 AMDRadeonX4000GLDriver 216.0  glrATI_SI_ValidatePipelinePrograms
   2 AMDRadeonX4000GLDriver 143.0  glrAMD_LoadConstants
   1 AMDRadeonX4000GLDriver 32.0  glrAMD_Hwl_BindConstantBuffer
   0 AMDRadeonX4000GLDriver 18.0  glrATI_SI_SRDMgrBindBuffer

Am wondering how many frames are being rendered when it goes into this high GPU usage mode?

Can the profiler tell us how many times VisualServerViewPort::draw_viewports() is getting called in say 10 seconds for instance? That will help us pin down the cause.

GPU profiler reports frame time of constant 16.6 ms, but CPU profiler shows 1432 calls of VisualServerViewPort::draw_viewports() in 10 seconds.

Number of [NSOpenGLContext flushBuffer] calls seems to be consistent with 60 HZ refresh rate.

Also, editor with non-focused (window is still maximized and fully visible) window only use 30% of GPU in the same conditions.

This is interesting -

Assuming that draw and draw_viewports should only be called once per frame, it sounds like some of the frames are being wasted.

Could this be problematic?

void OS_OSX::swap_buffers() {
    [context flushBuffer];
}

I don't know this language, but maybe flushBuffer is not doing the same as we would expect SwapBuffers.

If NSOpenGLCPSwapInterval is never called (which it doesn't seem to be), the default is to flush without regard to frame rate. This could be the problem.

https://developer.apple.com/documentation/appkit/nsopenglcpswapinterval

There is a function here:

void OS_OSX::_set_use_vsync(bool p_enable) {
    CGLContextObj ctx = CGLGetCurrentContext();
    if (ctx) {
        GLint swapInterval = p_enable ? 1 : 0;
        CGLSetParameter(ctx, kCGLCPSwapInterval, &swapInterval);
    }
}

But this area sounds suspect and worth checking that vsync is working properly on this platform. Is this the correct API function to pair with context flushBuffer];?

Without vsync, in the absence of a Sleep, GPU 100% would be expected.

Edit: Or do your results indicate that swap_buffers is being called only 60fps? (Am not clear on whether the profile counts ignored calls, I don't know the language etc).

Could also be VisualServerRaster::draw being called with p_swap_buffers set to false.

Edit: Yes AnimationPlayerEditorPlugin calls VS::get_singleton()->draw(false);, which could be causing these wasted frames. Not clear on why it is called with false yet.

^ This could be it. I'll ask on IRC see if I can find out why this is set to false, it may be a mistake. Actually it may be a mistake but I'm not sure it is being called during the error condition.

Also, editor with non-focused (window is still maximized and fully visible) window only use 30% of GPU in the same conditions.

The low processor mode sleep duration increases to 50,000 microseconds (= 20 FPS) when the window isn't focused. You can adjust this in the Editor Settings.

I recommend you do the tests with "Open in low resolution" checked on macOS. Otherwise the GPU is always 100%. https://github.com/godotengine/godot/issues/42849#issuecomment-710249225

I recommend you do the tests with "Open in low resolution" checked on macOS. Otherwise the GPU is always 100%.

You are right. Down to 60 % GPU.

Captura de pantalla 2020-10-19 a las 16 28 48

I recommend you do the tests with "Open in low resolution"

Obviously it's taking less GPU resources to render at half resolution, but 60% for an empty scene is still too many.

@bruvzg can you do a render doc capture on a frame that reproduces the issue?

My current theory is that the editor viewport (2D/3D) viewport is updating multiple times per frame. However, the root viewport is still only updating once as it is supposed to.

I recommend you do the tests with "Open in low resolution"

Obviously it's taking less GPU resources to render at half resolution, but 60% for an empty scene is still too many. @bruvzg

Empty scene? Or is Particles2d the added scene? and which Godot build? I have no problem in the empty scene.

@luislodosm can you try this settings https://github.com/godotengine/godot/issues/39758#issuecomment-710636158

can you do a render doc capture on a frame that reproduces the issue?

RenderDoc builds on macOS, but crashes instantly with illegal hardware instruction when I try to capture the frame.

@hazarek sure.

  • Low-res + use_vsync = true: GPU 60 %
  • Low-res + use_vsync = false: GPU 100 %
  • Low-res + use_vsync = true+ use_nvidia_rect_flicker_workaround = true: GPU 100 %
  • force_fps = 60: No changes.
  • orphan_method: Didn't find this.

@luislodosm I guess you are getting different values from me because the resolution of the monitor is high. 15" macbook and a 27" iMac will not produce the same result. Will gpu usage decrease when you shrink the window?

@luislodosm I guess you are getting different values from me because the resolution of the monitor is high. 15" macbook and a 27" iMac will not produce the same result. Will gpu usage decrease when you shrink the window?

Yes. When I shrink the Godot editor window to 1/4 of the screen size, the GPU load lowers to 50%. I have a 5K iMac.

Was this page helpful?
0 / 5 - 0 ratings