Godot: Timer stops emitting "timeout" after running for ~8 days

Created on 20 May 2020  路  9Comments  路  Source: godotengine/godot

Godot version:
3.1.2 stable

OS/device including version:
Windows 10

Issue description:
I've built a godot project basically running static 2D image/slideshow on request. There are multiple threads written in gdscript there with UDP server handling requests to show images or slideshow.

After running the project for ~8 days, Timer seemed to stop working all of a sudden.

Steps to reproduce:

  1. everything was good until about keeping the godot project running ~8 days
  2. when issue happened, a long running thread was doing this to play slideshow
         while active:
        emit_signal("update_image", ...) # it's a CONNECT_DEFERRED signal
        slideshow_cur_img += 1  # what will happen if this overflows?
        OS.delay_msec(1000 * slideshow_interval)
  1. an existing timer suddenly stopped working. The _on_timer_timeout was never run again.
    timer = Timer.new()
    timer.set_wait_time(sync_interval_sec)
    timer.connect("timeout", self, "_on_timer_timeout")
    add_child(timer)
    timer.start()
  1. new timer also stops working as well. The code below hung there.
        yield(get_tree().create_timer(wait_sec), "timeout")

While issue happened, all other parts of the godot project continued working.

Minimal reproduction project:
I don't have it now.

bug core

Most helpful comment

Does your application happen to be running at 24 FPS? I think I know the cause.

38880 should fix this, but I haven't tested it since that would take a long time.

All 9 comments

Does your application happen to be running at 24 FPS? I think I know the cause.

38880 should fix this, but I haven't tested it since that would take a long time.

Thanks @aaronfranke . However I didn't set FPS specifically. But that fix makes sense!

To be specific, the starting time of my application was:
2020-05-01 17:35:39

And the last successful timer was triggered at:
2020-05-09 03:07:49

The timer wait_time was set to 3min and the next one never came.

I will have a custom build myself and hopefully I can report back a few weeks later :P

I'm not sure #38880 would fix it, from what I understand @starryalley was not using a Timer with a wait time higher than FLT_MAX (~3.4e+38), but instantiating new Timers each time with a wait time likely within reasonable float bounds.

I think the problem is here, since idle_process_time it's still a float:https://github.com/godotengine/godot/blob/00949f0c5fcc6a4f8382a4a97d5591fd9ec380f8/scene/main/scene_tree.h#L255

I'm not sure #38880 would fix it, from what I understand @starryalley was not using a Timer with a wait time higher than FLT_MAX (~3.4e+38), but instantiating new Timers each time with a wait time likely within reasonable float bounds.

For more information, both an existing repeated Timer (instantiated at program start only once) and newly instantiated Timer stopped working. The wait time for the repeated timer is 3 min, and 1 second for the newly created one.

@akien-mga 99.99% of the time, issues with floats aren't issues with being out of the absolute range, they are issues with being out of the accurate range. After 16777216, single-precision floats lose the ability to store all integers (16777215 -> 16777216 -> 16777218, since 16777217 is not a valid float), so if a program depends on incrementing by 1, it will fail at 16777216.

Similarly, if a program depends on incrementing by a smaller number, say, 1/24 per frame for 24 FPS, then it will fail earlier, approximately 24x as early, but in reality depends on how the number rounds to powers of two. If you increment by 1 millisecond, 1000 times per second, it fails after a few hours (this is the same problem as losing millimeter precision after a few km with vectors).

I don't know for sure if #38880 fixes the problem, but I know that using float for time is borked, so it should be replaced with double (or some other implementation). I'll update the PR to also do this for SceneTree get_idle_process_time().

There's a lot of things going on in your description. This really needs a min production project (even though I accept one is hard to make, maybe just isolate one of the things that breaks in a single project). Then we can modify the code to pretend it has been running a long time.

This _could_ simply be a QueryPerformanceCounter wraparound:
https://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows

uint64_t OS_Windows::get_ticks_usec() const {

    uint64_t ticks;
    uint64_t time;
    // This is the number of clock ticks since start
    if (!QueryPerformanceCounter((LARGE_INTEGER *)&ticks))
        ticks = (UINT64)timeGetTime();
    // Divide by frequency to get the time in seconds
    time = ticks * 1000000L / ticks_per_second;
    // Subtract the time at game start to get
    // the time since the game started
    time -= ticks_start;
    return time;
}

LARGE_INTEGER is signed 64 bit value. If clock frequency is high you _could_ get wraparound. Depends on the frequency and how long running I guess (I don't know what is typical on windows).

Also the expression

time = ticks * 1000000L / ticks_per_second;

could be problematic. I wasn't sure on the operator precedence so tried it in godbolt and sure enough it does the multiply before the divide, so that adds to the chance of an overflow. (It must do thinking about it, otherwise you'd end up with fractions getting rounded as integers! :grin: )

You could gain another 1000000 on the range by doing this step at 128 bit.

A little postcard maths:

2^63 = 9.22x10e+18
resolution dropped by multiply
2^63 / 1000000 = 9.22x10e12
typical clock freq, 1.4 million
9.22x10e12 / 1.4 million = 6588000 seconds
minutes = 109802
hours = 1830
days = 76

So depending on clock frequency, and how long the PC has been on (if QueryPerformanceCounter resets at boot?) this could be an issue. Whether it is responsible for the bug seen I don't know.

I think the problem is here, since idle_process_time it's still a float:

https://github.com/godotengine/godot/blob/00949f0c5fcc6a4f8382a4a97d5591fd9ec380f8/scene/main/scene_tree.h#L255

idle_process_time being a float isn't necessarily a problem, provided it is promoted to a double in calculations where it needs to be a double. And of course provided that idle_process_time itself isn't being wrongly calculated....

Can anyone check if it is fixed?
Related PR was already merged and cherrypicked.

Was this page helpful?
0 / 5 - 0 ratings