Windows 10
NVIDIA GeForce GTX 1050
Rendering the same static scene with no per-frame calculations at 1200x800 is unusable beyond Godot 3.0.2 due to frames plummeting to 10 fps at this resolution and gets worse with it increasing. This issue is not present in 3.0.2, but exists in all versions after it (3.0.3-3.0.6 + 3.1.alpha).
3.0.2 | 3.0.3
:-------------------------:|:-------------------------:
|
Please provide a link to the scene, or to another minimal reproduction project so that others can test it and diagnose the problem.
Will do as soon as the time allows, since the project is relatively complex.
You have lights with shadows enabled on your scene? Try to disable shadows or lights and test again. Check also if you are using pbr materials with normal maps and other special textures. At least in my case these are the "performance killers" in Godot 3.1.
He is running the same scene in all Godot versions, and those things you mention have always been "performance killers"
Reproduction:
It seems like the performance hit comes from merely having 5 large viewports.
Here's the scene tree:
Here's the Game node script:
extends Spatial
var vps = []
class VP:
var vp
var quad
func add_vp(vp, quad):
var nvp = VP.new()
nvp.vp = vp
nvp.quad = quad
vps.append(nvp)
func _ready():
var root = get_tree().get_root()
root.connect("size_changed",self,"resize")
add_vp($gameVp, $gameVpQuad)
add_vp($menuVp, $menuVpQuad)
add_vp($gameVp/bgVp, $gameVp/bgVpQuad)
add_vp($gameVp/mainVp, $gameVp/mainVpQuad)
add_vp($menuVp/mainVp, $menuVp/mainVpQuad)
vps = vps
for i in range(vps.size()):
var vp = vps[i]
setup_vpMat(vp.vp, vp.quad)
yield(get_tree(), "idle_frame")
resize()
func setup_vpMat(vp, vpQuad):
var mat = load("res://Materials/ColorViewport.tres").duplicate()
mat.set_shader_param("vpTex", vp.get_texture())
vpQuad.material_override = mat
func resize():
print("vps: ", vps.size())
for vp in vps:update_vp(vp.vp, vp.quad)
func update_vp(vp, vpQuad):
var nWinSize = OS.window_size
var usedSize = nWinSize*2
vp.size = usedSize
print(usedSize)
vpQuad.scale = Vector3(usedSize.x, usedSize.y, 1)
Here's the shader used for rendering viewports:
shader_type spatial;
render_mode blend_mix,unshaded;
uniform sampler2D vpTex;
vec3 adjust_rgb(vec3 color){
return mix(pow((color + vec3(0.055)) * (1.0 / (1.0 + 0.055)),vec3(2.4)),color * (1.0 / 12.92),lessThan(color,vec3(0.04045)));
}
void fragment() {
vec4 albedo_tex = texture(vpTex,UV);
albedo_tex.rgb = adjust_rgb(albedo_tex.rgb);
ALBEDO = albedo_tex.rgb;
ALPHA = albedo_tex.a;
}
NOTE:
_The issue_ here is that "nothing" (apart from the large viewports) is being rendered and version 3.0.2 deals with this fine, but later versions tank performance to unusable levels.
Using v3.1.alpha.custom_build.aeddb30 here, built a few days ago; my OS and GPU are Windows 10 Home v1803 and a GeForce GTX 1060 6GB. Additionally, in case it's needed, I use an i5-2400S as my CPU (yes, it's old, but I've yet to ever actually hit a point where I need to upgrade).
I'm not seeing any issues with performance while the game is idling. The game does seem to have some framerate issues during the window resize, but the framerate smooths out relatively quickly afterwards. This is related to resizing the viewports so much, as when I comment out vp.size = usedSize from the update_vp function, those framerate issues during resizes go away.
If I can ask, why are you using so many viewports? I get the feeling you don't need all those viewports for what you're trying to do.
If I can ask, why are you using so many viewports? I get the feeling you don't need all those viewports for what you're trying to do.
Well, your feeling is wrong. If anything, 5 viewports is a small number of them for a non-trivial game. Regardless of what I'm doing, the issue here is that there was a significant performance regression, which is important to address.
Regardless of whether my feeling is wrong, I'd still like to know what you're needing so many viewports for. I've never needed more than the one that Godot gives you by default, and I never thought you'd need any more except for multiplayer or for UI elements, and so it seems very odd to me that you'd need so many. It'd be nice to have an explanation as to why my feeling is wrong, if you feel it is.
I'd be happy to go into the details on Discord (Rokas#0521) if you're curious, so as not to detract from the topic here.
@rokasv I just took the time to run your test project and it is indeed way slower than it should be (~7 FPS for me). However, the speed does not change based on the version. It was the same speed for 3.1 alpha, master, and 3.02 for me.
Edit: the only things that made a difference for me was keeping the viewports small and removing MSAA. Both of these changes substantially improved performance.
@clayjohn just to make sure, I downloaded 3.0 from https://downloads.tuxfamily.org/godotengine/3.0/ for windows 64 bit and am still getting stable 60 fps
I've seen significant performance differences between 32 and 64bit export versions... I can't confirm this is the case here now, but are you guys seeing major performance differences between 32/64bit versons of the game when exported?
@rokasv What you just sent is 3.0, not 3.02. I tried it anyway and I still get ~7 FPS. Given what LikeLakers2 is seeing I'm gonna guess this is a hardware dependent bug. I am running this on my low powered laptop with an integrated intel GPU.
The project I sent gives me 60 fps on 3.0, 3.0.1, 3.0.2, but 5 fps on 3.0.3, 3.0.4, 3.0.5, 3.0.6 and 3.1 alpha.
I find it unlikely it's a hardware related bug, if the versions of the software matter. Assuming not hardware-dependent code was introduced in the minor version updates of the engine. If I have the chance at some point to test it on another machine, I'll do it.
@meld-cp the exact same results show up with 32-bit versions
@rokasv That may be true for your machine, but on LikeLakers2's machine it is always 60 FPS and on mine it is always 7 FPS. So clearly, while on your machine there is a difference in versions, there is also a substantial difference between machines. It is also important to note that for both LikeLakers2 and I there is no difference between versions.
To be clear, I don't think the bug is in Godot, I think that there are probably driver related quirks that are causing the issue.
The important part here isn't the specific FPS, it's the drop of the FPS between versions.
Can you try this: change the var usedSize = nWinSize*2 line (change the 2 constant) / or the window size just until you get 50ish fps in 3.0.2 and then try the same setting on 3.0.3 and report the findings please?
So me and @rokasv have been talking for some time (it was originally about the viewports, but we eventually switched over to the framerate issue), and he had me try different scales (see his comment just above this one) and giving him the FPS. Here's my findings (3.0.2, 3.0.3, 3.0.6, and v3.1.alpha.custom_build.aeddb30) :
So we kept testing for a while, and then it dawned on me that GPU memory was a thing. So I tested a little on 3.0.3, and recorded what Task Manager showed my GPU was using
So that would explain away why I have basically perfect 60s except in extreme tests, and yet rokasv has abysmal performance -- his 1050 doesn't have the same amount of dedicated RAM that my 1060 does! He's hitting his GPU's ram limits and thus it has to go for the shared memory, which results in horrible performance.
Now the only question that remains, and both me and him hope someone here can help: Why is his performance not tanking on 3.0.2, but tanking on any other version?
I'd recommend bisecting for the commit that causes regression since it's clear that issue arises somewhere between 3.0.2 and 3.0.3. I can't do it because my GPU is so outdated it has horrible performance regardless of versions.
Nice testing guys, it should make it easier to pinpoint what causes the regression (and maybe also what causes the performance drop with Viewports generally - but that might not be a "bug" per se).
It's important to note that 3.0.3 is the first release that @hpvb did with a new buildsystem, and on Windows in particular using MinGW (GCC 8 + LTO) instead of MSVC 2017. So the performance change could be linked to the compiler/options used.
It's important to note that 3.0.3 is the first release that @hpvb did with a new buildsystem, and on Windows in particular using MinGW (GCC 8 + LTO) instead of MSVC 2017. So the performance change could be linked to the compiler/options used.
A simple way to test that would be for one of the affected users to compile both 3.0.2-stable and 3.0.3-stable and see if there's a performance drop. If yes, the issue is not with the buildsystem but with a specific commit that could be bisected.
This sounds like Godot is allocating a lot of resources per viewport. Maybe some of them can be shared and reused across multiple viewports (like rendertargets / depthbuffers) as long as they aren't needed in parallel or after a viewport finished rendering.
@akien-mga that's a good idea, though I don't have the environment set up for building the thing and the procedure doesn't appear to be that straightforward
Could you or someone here do the compilation of 3.0.2-stable and 3.0.3-stable and link the builds for testing?
@rokasv We could probably do this later today over Discord when I'm not busy, if you'd like. I already have a build environment set up.
Message me on Discord if/when you want to do this.
@akien-mga I was having issues with minGW in an unrelated project, too.
It turned out that MinGW under windows does not ship the libm math library found on all Linux systems, but instead tries to link these functions from msvcrt. IIRC, this library has stopped at a somewhat ancient C standard, and thus does not define a lot of important functions like cbrt in my case. These functions, instead, are shipped by MinGW and are slow. (E.g., cbrt does some 5-10 steps of a newton approximation instead of using something similar to this implementation)
I was able to solve my problem by linking against OpenLibm instead.
Edit: writeup I found helpful
@rokasv Here's a build of 3.0.3-stable made with the same buildsystem as 3.0.2-stable (official release): https://github.com/GodotBuilder/godot-builds/releases/download/3.0.3-stable-ci/godot.windows.opt.tools.64.exe
@hpvb See https://github.com/godotengine/godot/issues/23400#issuecomment-434737757
@akien-mga confirming that the 3.0.3 build you linked gives the same desirable results as does 3.0.2, whereas the 3.0.3 from https://downloads.tuxfamily.org/godotengine/3.0.3/ gives poor performance
Viewport consums graphics memory very much. I get 1fps in 1500*1000 in godot-3.0.2. When I make windows smaller, cost of graphics memory will below 2GB, and fps will be up to 60.
Platform: win10, gtx760 2gb
I can't reproduce this issue with my Kabylake-G AMD GPU on Windows, it seems like this is not a straight up Mingw issue (even if msvc seems to solve this issue for nvidia users). I'll make a 3.0.6 build with Openlibm for testing purposes for Windows/NVidia users to see if that helps.
OK, I've done a build with openlibm here: https://tmm.cx/nextcloud/s/dwxdSZsG7WfG9WK It is much larger as it has debug symbols in it in case someone wants to try to profile it.
I've also noticed something interesting where it seemed that 3.0.3 would default to the intel GPU (running at around 15fps at 1500x1000) whereas 3.0.2 would run on the dedicated GPU by default. If you're testing in a multi-gpu system (either a laptop or a desktop with both an iGPU and an external GPU) please ensure that you're testing on the dGPU!
@hpvb just tested it and it seems you're right that the fix comes down to changing the default GPU
The linked 3.0.6 runs poorly by default, but it does default to the integrated graphics card. Forcing the editor to use the dedicated GPU solves the issue and the performance becomes comparable to the default of 3.0.2.
I think I've narrowed down the problem and I can explain the difference between 3.0.2 and 3.0.3. That is a buildsystem issue but a very small one. We're not exporting the NvOptimusEnablement symbol properly in 3.0.3 onwards due to a bug with how we set it on mingw. I'll fix this.
Most helpful comment
So me and @rokasv have been talking for some time (it was originally about the viewports, but we eventually switched over to the framerate issue), and he had me try different scales (see his comment just above this one) and giving him the FPS. Here's my findings (3.0.2, 3.0.3, 3.0.6, and
v3.1.alpha.custom_build.aeddb30) :So we kept testing for a while, and then it dawned on me that GPU memory was a thing. So I tested a little on 3.0.3, and recorded what Task Manager showed my GPU was using
So that would explain away why I have basically perfect 60s except in extreme tests, and yet rokasv has abysmal performance -- his 1050 doesn't have the same amount of dedicated RAM that my 1060 does! He's hitting his GPU's ram limits and thus it has to go for the shared memory, which results in horrible performance.
Now the only question that remains, and both me and him hope someone here can help: Why is his performance not tanking on 3.0.2, but tanking on any other version?