Godot: Using SSAO strongly impacts performance (down to 30 FPS on mid-high end GPU)

Created on 16 Mar 2018  路  13Comments  路  Source: godotengine/godot

Godot version:
3.0.2

OS/device including version:
Ubuntu 16.04, i7, GTX 965M

Issue description:
If you add ao, using 1080p. we obtain 30 fps.
FPS falls at any setting ssao

Steps to reproduce:
1920x1080 + SSAO = 30FPS

Minimal reproduction project:
Grass_16_03_18.zip

enhancement rendering

Most helpful comment

I finally got the RenderDoc to work with Godot. Turns out you need to un-pause the program after injection. Silly me for waiting for minutes for RenderDoc to finish up injection while the program is on pause. Duh.

Anyway I got some screenshots here to show that the current implementation is indeed overkilling.
High quality, 3x3 blur, 1024 x 600
(@Calinou you can use them for your PR if you want to :)

This is what the blurred result looks like for the curr. ver. (CV):
curr

And for the reduced sampling version (RSV)
smallSample

They look alike IMO. But the CV takes almost 5 times longer to render. Actually, the CV uses so many samples that it looks better off without blurring. The blurring is mostly used to counter the high-freq noise introduced with a lower sampling count.

CV, before blurring:
pretty_good

RSV, before blurring
pretty_noisy

Note: SAO apx. normal info with the z-buffer, so that it works with forward renderer. But the apx normal is wrong along depth discontinuities, so a blurring is required regardless of sampling count. Also, SAO actually gives the central sample a rather low weight, and nearby samples a higher weight, to offset the error along depth discontinuities. You can see a noise pattern at the end of the hallway in the blurred CV image, which I at first thought occurred because of the sampling pattern. Now I think it is actually because of the blurring.

full album : https://imgur.com/a/3HTL69M

All 13 comments

IMO SSAO needs to be improved, both in performance and quality compared to competing renderers (Blender's edit mode SSAO or Unity's ambient obscurance effects are better, don't suffer artifacts and look less "dirty")

3.1 alpha3
30fps

If something like TAA is supported in the future, a solution with temporal filtering such as GTAO could improve both quality and performance. This is also what Blender's Eevee uses as far as I know.

Should this not be labeled as a bug? It's completely unusable for me, I'm getting between 11 and 45FPS in a scene with just two cubes in it running on a pretty decent computer. Turn SSAO off and it's right back to 60FPS

What hardware are you using?

GTX 770 (with freshly updated drivers), i5-4670K, 16GB ram, running Godot 3.1.1 on Windows 7. I'll mention too that with SSAO on it hits the processor pretty hard, maxes out one core.

Nothing looks very strange otherwise, The profiler reports that my script functions take 0.00361sec but total frame time is between 0.09 and 0.13sec. The Monitor only shows 9 draw calls. Does not appear to have a memory leak or anything. I've attached my Godot project to this post.

I should mention too that I said between 11 and 45FPS but it's usually more like 11, sometimes for no discernible reason it will go up to around 22 or 45 for a while then back to around 11.

source.zip

Sounds strange to me. SSAO should have almost no effect on the CPU. I'm not sure about the GPU, maybe you can improve performance by adjusting the SSAO settings to a lower quality level.

Both quality and blur settings seem to have no real effect, maybe 1 or 2 FPS difference. I have tried adjusting most of the other settings hoping that I had just set something off of what it should be but nothing seems to help.

I've been looking into the ssao shader. The implementation is basically the Scalable AO from Morgan McGuire.

The Godot implementation uses 15/40/80 samples for low/med/high quality respectively. I think that is a bit of overkill. I think most SSAO algorithms make do with less than 32 spp, mostly between 10-20 (No temporal sampling). Since sampling is a slow process, I would imagine that there is performance to be gained by optimizing the sampling process.

I experimented with 8/12/16 spp and the quality difference is minor. I have yet to find a way to profile the performance. I have not found any way to use RenderDoc/NSight with Godot. Anyone has a pointer? Also I am actually a bit surprised that the difference is so small. I would image that 8 spp and 80 spp should be night and day.

One other thing, both with the default 15/40/80 spp and reduced 8/12/16, there is noticeable noise pattern. Since the implementation generates a angle value per pixel via id hashing, such a pattern should not be present. Usually this is very noticeable when you use a 4x4 noise kernel.

@Outrider0x400 there is a trick to use renderdoc. You put a breakpoint in the platform Main (before it initializes gl), and then inject renderdoc. Be aware that due to godot hilarious amount of drawcalls, attaching renderdoc will murder performance into the low 10s FPS (in TPS demo).
I havent been able to run Nsight, as it cant inject. Launching godot from both RenderDoc and Nsight instantly close the window before even reaching Main() (debugged). Thre is something in godot static variables (static initialization runs before main) that crashes debuggers.

@vblanco20-1 Thanks for the pointer! I suppose RenderDoc is probably sufficient to get a rough idea of what's going on. The breakpoint/injection trick is pretty cool.

I finally got the RenderDoc to work with Godot. Turns out you need to un-pause the program after injection. Silly me for waiting for minutes for RenderDoc to finish up injection while the program is on pause. Duh.

Anyway I got some screenshots here to show that the current implementation is indeed overkilling.
High quality, 3x3 blur, 1024 x 600
(@Calinou you can use them for your PR if you want to :)

This is what the blurred result looks like for the curr. ver. (CV):
curr

And for the reduced sampling version (RSV)
smallSample

They look alike IMO. But the CV takes almost 5 times longer to render. Actually, the CV uses so many samples that it looks better off without blurring. The blurring is mostly used to counter the high-freq noise introduced with a lower sampling count.

CV, before blurring:
pretty_good

RSV, before blurring
pretty_noisy

Note: SAO apx. normal info with the z-buffer, so that it works with forward renderer. But the apx normal is wrong along depth discontinuities, so a blurring is required regardless of sampling count. Also, SAO actually gives the central sample a rather low weight, and nearby samples a higher weight, to offset the error along depth discontinuities. You can see a noise pattern at the end of the hallway in the blurred CV image, which I at first thought occurred because of the sampling pattern. Now I think it is actually because of the blurring.

full album : https://imgur.com/a/3HTL69M

SSAO has been completely re-written for 4.0, you can now render at half-res for much better performance, it also looks much better than it did in 3.x. :D

Was this page helpful?
0 / 5 - 0 ratings