Godot: Iterating over large string incredibly slow

Created on 7 May 2018 · 10Comments · Source: godotengine/godot

Godot version:
3.x (tip)

OS/device including version:
Ubuntu

Issue description:
A for loop over large strings (eg one loop over a million characters) is much slower than looping over the same amount of characters in smaller string (eg ten thousand loops over 100 characters)

It is negligible to do ten thousands short strings (less than a second) but takes many minutes (in my case 15 minutes) to do one long string even though both tasks contain the same number of characters.

Steps to reproduce:

var time_stamp = 0

func print_time_stamp(t):
    var new_time_stamp = OS.get_unix_time()
    var elapsed = new_time_stamp - time_stamp
    var minutes = elapsed / 60
    var seconds = elapsed % 60
    var str_elapsed = "%02d : %02d" % [minutes, seconds]
    if time_stamp > 0:
        print("[%s] elapsed: %s"%[t, str_elapsed])
    else:
        print("[%s]"%t)
    time_stamp = new_time_stamp

func test_strings():
    print_time_stamp("starting test")
    var filename
    var data
    var f = File.new()
    filename = "res://char100.txt"
    f.open(filename, File.READ)
    data = f.get_as_text()
    f.close()
    print_time_stamp("loaded and ready to do 10000 x 100 character strings")
    var i = 0
    for p in range(0, 10000):
        for ch in data:
            i += 1
    print_time_stamp("Finished 10000 x 100 character strings [%s chars]"%i)

    filename = "res://char1million.txt"
    f.open(filename, File.READ)
    data = f.get_as_text()
    f.close()
    print_time_stamp("loaded and ready to do 1 x 1000000 character string")
    i = 0
    for p in range(0, 1):
        for ch in data:
            i += 1
    print_time_stamp("Finished 1 x 1000000 character string [%s chars]"%i)

result:
[starting test] elapsed: 00 : 00
[loaded and ready to do 10000 x 100 strings] elapsed: 00 : 00
[Finished 10000 x 100 strings [1000000 chars]] elapsed: 00 : 00
[loaded and ready to do 1 x 1000000 string] elapsed: 00 : 00
[Finished 1 x 1000000 strings [1000000 chars]] elapsed: 15 : 36

bug core

Source

dodgyville

Most helpful comment

(long comment, feel free to jump to the end)

I made a more comprehensive script, which implements a few different ways of iterating the string:

Click to see the first script

extends Node

var string = ""
var hardness = 17 # Tune for more detailed versus faster results
var length = 1 << hardness

func _ready():
    print("Running tests, hardness is {hardness}, strings are of {length} chars".format({
        hardness = hardness,
        length = length
    }))
    run("build_string")
    run("loop_iter")
    run("loop_no_char")
    run("loop_get_char")

    get_tree().quit()

func build_string():
    string = "1"
    for i in range(hardness):
        string = string + string
    assert(length == string.length())

func loop_iter():
    var count = 0
    for ch in string:
        count += 1
    assert(count == string.length())

func loop_no_char():
    var count = 0
    for i in string.length():
        count += 1
    assert(count == string.length())

func loop_get_char():
    var count = 0
    for i in string.length():
        var ch = string[i] # Added
        count += 1
    assert(count == string.length())

func run(test):
    var start = OS.get_ticks_msec()
    call(test)
    var time = OS.get_ticks_msec() - start
    print("{test}\t: {rate} msec/char (total: {time} msec)".format({
        test = test,
        time = time,
        rate = "%0.5f" % (float(time) / length)
    }))

Here are some results (Intel® Core™ i7-4790K CPU @ 4.00GHz):

Running tests, hardness is 17, strings are of 131072 chars
build_string    : 0.00002 msec/char (total: 3 msec)
loop_iter       : 0.10909 msec/char (total: 14299 msec) -- 14.3 sec
loop_no_char    : 0.00017 msec/char (total: 22 msec)
loop_get_char   : 0.10883 msec/char (total: 14265 msec) -- ~40 msec faster than loop_iter
Running tests, hardness is 19, strings are of 524288 chars
build_string    : 0.00002 msec/char (total: 12 msec)
loop_iter       : 0.43212 msec/char (total: 226553 msec) -- 3.77 min
loop_no_char    : 0.00016 msec/char (total: 86 msec)
loop_get_char   : 0.43198 msec/char (total: 226484 msec) -- ~70 msec faster than loop_iter

-- Release build, no debugger:
Running tests, hardness is 17, strings are of 131072 chars
build_string    : 0.00000 msec/char (total: 0 msec)
loop_iter       : 0.01970 msec/char (total: 2582 msec) -- 2.6 sec
loop_no_char    : 0.00008 msec/char (total: 11 msec)
loop_get_char   : 0.01981 msec/char (total: 2597 msec) -- 15 msec slower than loop_iter?
Running tests, hardness is 19, strings are of 524288 chars
build_string    : 0.00000 msec/char (total: 2 msec)
loop_iter       : 0.07768 msec/char (total: 40726 msec) -- 40.7 sec
loop_no_char    : 0.00007 msec/char (total: 37 msec)
loop_get_char   : 0.07765 msec/char (total: 40711 msec) -- 15 msec faster than loop_iter

Things of note:

Release build runs roughly 5 times faster, but that's expected with complex loops. Also, the results are a bit fuzzier, since abnormal timings don't average out completely for such brief periods.
loop_get_char caches string.length(), so it manages to get a slight edge over loop_iter.
String length is 4 times bigger in the hardness=19 tests.
loop_iter and loop_get_char time per char is ~4 times larger in the second test. (16x total time)
Therefore, loop_iter and loop_get_char's time per char is proportional to string length.
It is likely that time to access a given character is directly proportional to string's length (and that would be a bug).

To confirm this notion, I made a script which repeatedly takes the first character of a long string:

Click to see the second script

extends Node

var samples = 10000

func _ready():
    print("Running {samples} samples".format({samples = samples}))
    test(build_string(15))
    test(build_string(17))
    test(build_string(19))
    test(build_string(21))
    get_tree().quit()

func build_string(hardness):
    var string = "1"
    for i in range(hardness):
        string = string + string
    return string

func test(string):
    var start = OS.get_ticks_msec()

    for i in range(samples):
        # var ch = string[i % string.length()] # Gives same results
        var ch = string[0]

    var time = OS.get_ticks_msec() - start
    print("{length} chars\t: {rate} msec/char (total: {time} msec)".format({
        length = string.length(),
        samples = samples,
        time = time,
        rate = "%0.5f" % (float(time) / samples)
    }))

The results seem to confirm the theory:

Running 10000 samples
32768 chars     : 0.05510 msec/char (total: 551 msec)
131072 chars    : 0.21620 msec/char (total: 2162 msec)
524288 chars    : 0.86390 msec/char (total: 8639 msec)
2097152 chars   : 3.42740 msec/char (total: 34274 msec)
-- Release build, no debugger:
Running 10000 samples
32768 chars     : 0.01050 msec/char (total: 105 msec)
131072 chars    : 0.04130 msec/char (total: 413 msec)
524288 chars    : 0.15450 msec/char (total: 1545 msec)
2097152 chars   : 0.74510 msec/char (total: 7451 msec)

At this point, I think somehow String::operator[] manages to copy the string underneath, thus creating a linear dependency on string length. This is expected when the operator is used for writing to the string (as this invokes copy-on-write), but is not good when using it for reading.

bojidar-bg on 7 May 2018

👍2

All 10 comments

char1million.txt

dodgyville on 7 May 2018

char100.txt

dodgyville on 7 May 2018

Hi @dodgyville

I tested your script with the given files and the output I get is:

[starting test]
[loaded and ready to do 10000 x 100 character strings] elapsed: 00 : 00
[Finished 10000 x 100 character strings [1000000 chars]] elapsed: 00 : 01
[loaded and ready to do 1 x 1000000 character string] elapsed: 00 : 00
[Finished 1 x 1000000 character string [1000000 chars]] elapsed: 02 : 35

My operating system is a Windows 8.1 64 Bit and my CPU is an Intel Core i5-4690 with 3.50 GHz. I tested it with Godot 3.0.2.

There's no difference in time if I assign the content of the files directly to variables so you don't need to load the file content first. But in this case the editor works really really slow and crashs sometimes. So it seems that the Godot editor can't handle that large strings? Maybe a bug?^^

But even in my case it's quite strange that it takes that much time...
If I try the same thing directly in C++, the iteration is finished immediately after running the program, even if I first build up the whole string with 1000000 characters.

ghost on 7 May 2018

(long comment, feel free to jump to the end)

I made a more comprehensive script, which implements a few different ways of iterating the string:

Click to see the first script

extends Node

var string = ""
var hardness = 17 # Tune for more detailed versus faster results
var length = 1 << hardness

func _ready():
    print("Running tests, hardness is {hardness}, strings are of {length} chars".format({
        hardness = hardness,
        length = length
    }))
    run("build_string")
    run("loop_iter")
    run("loop_no_char")
    run("loop_get_char")

    get_tree().quit()

func build_string():
    string = "1"
    for i in range(hardness):
        string = string + string
    assert(length == string.length())

func loop_iter():
    var count = 0
    for ch in string:
        count += 1
    assert(count == string.length())

func loop_no_char():
    var count = 0
    for i in string.length():
        count += 1
    assert(count == string.length())

func loop_get_char():
    var count = 0
    for i in string.length():
        var ch = string[i] # Added
        count += 1
    assert(count == string.length())

func run(test):
    var start = OS.get_ticks_msec()
    call(test)
    var time = OS.get_ticks_msec() - start
    print("{test}\t: {rate} msec/char (total: {time} msec)".format({
        test = test,
        time = time,
        rate = "%0.5f" % (float(time) / length)
    }))

Here are some results (Intel® Core™ i7-4790K CPU @ 4.00GHz):

Running tests, hardness is 17, strings are of 131072 chars
build_string    : 0.00002 msec/char (total: 3 msec)
loop_iter       : 0.10909 msec/char (total: 14299 msec) -- 14.3 sec
loop_no_char    : 0.00017 msec/char (total: 22 msec)
loop_get_char   : 0.10883 msec/char (total: 14265 msec) -- ~40 msec faster than loop_iter
Running tests, hardness is 19, strings are of 524288 chars
build_string    : 0.00002 msec/char (total: 12 msec)
loop_iter       : 0.43212 msec/char (total: 226553 msec) -- 3.77 min
loop_no_char    : 0.00016 msec/char (total: 86 msec)
loop_get_char   : 0.43198 msec/char (total: 226484 msec) -- ~70 msec faster than loop_iter

-- Release build, no debugger:
Running tests, hardness is 17, strings are of 131072 chars
build_string    : 0.00000 msec/char (total: 0 msec)
loop_iter       : 0.01970 msec/char (total: 2582 msec) -- 2.6 sec
loop_no_char    : 0.00008 msec/char (total: 11 msec)
loop_get_char   : 0.01981 msec/char (total: 2597 msec) -- 15 msec slower than loop_iter?
Running tests, hardness is 19, strings are of 524288 chars
build_string    : 0.00000 msec/char (total: 2 msec)
loop_iter       : 0.07768 msec/char (total: 40726 msec) -- 40.7 sec
loop_no_char    : 0.00007 msec/char (total: 37 msec)
loop_get_char   : 0.07765 msec/char (total: 40711 msec) -- 15 msec faster than loop_iter

Things of note:

Release build runs roughly 5 times faster, but that's expected with complex loops. Also, the results are a bit fuzzier, since abnormal timings don't average out completely for such brief periods.
loop_get_char caches string.length(), so it manages to get a slight edge over loop_iter.
String length is 4 times bigger in the hardness=19 tests.
loop_iter and loop_get_char time per char is ~4 times larger in the second test. (16x total time)
Therefore, loop_iter and loop_get_char's time per char is proportional to string length.
It is likely that time to access a given character is directly proportional to string's length (and that would be a bug).

To confirm this notion, I made a script which repeatedly takes the first character of a long string:

Click to see the second script

extends Node

var samples = 10000

func _ready():
    print("Running {samples} samples".format({samples = samples}))
    test(build_string(15))
    test(build_string(17))
    test(build_string(19))
    test(build_string(21))
    get_tree().quit()

func build_string(hardness):
    var string = "1"
    for i in range(hardness):
        string = string + string
    return string

func test(string):
    var start = OS.get_ticks_msec()

    for i in range(samples):
        # var ch = string[i % string.length()] # Gives same results
        var ch = string[0]

    var time = OS.get_ticks_msec() - start
    print("{length} chars\t: {rate} msec/char (total: {time} msec)".format({
        length = string.length(),
        samples = samples,
        time = time,
        rate = "%0.5f" % (float(time) / samples)
    }))

The results seem to confirm the theory:

Running 10000 samples
32768 chars     : 0.05510 msec/char (total: 551 msec)
131072 chars    : 0.21620 msec/char (total: 2162 msec)
524288 chars    : 0.86390 msec/char (total: 8639 msec)
2097152 chars   : 3.42740 msec/char (total: 34274 msec)
-- Release build, no debugger:
Running 10000 samples
32768 chars     : 0.01050 msec/char (total: 105 msec)
131072 chars    : 0.04130 msec/char (total: 413 msec)
524288 chars    : 0.15450 msec/char (total: 1545 msec)
2097152 chars   : 0.74510 msec/char (total: 7451 msec)

bojidar-bg on 7 May 2018

👍2

Hi @bojidar-bg

it seems that there's no copy operation in case of String::operator[].
The String class inherits from the Vector class and thus has access to the appropriate operator[] function which has the following syntax:

inline const T& operator[](int p_index) const;

So the character at index p_index is returned by const reference. In case of write operations there is a copy-on-write, which is fine.

Maybe it's because of size().
While accessing a character with String::operator[] this method casts an uint32_t pointer to an uint32_t pointer (why???), dereferences this uint32_t pointer and casts it implicitly to an int which is returned as a copy (seems all really strange to me).

In case of one million characters this is done one million times.

The other operations like reinterpret_cast or nullptr checks shouldn't have a big impact.

ghost on 7 May 2018

Actually, there are two problems, one in Variant::get and the other in Variant::iter_next.
In Variant::get, substr is used to make a copy of the one character of the string, so that the string remains immutable. This manages to be slow for large strings.
In Variant::iter_next, operator[] is used to get the character, but it is somehow slow as well.

I'm not sure why in both of those cases some operation which is linear in the string's size occurs. It might or might not be a copy-on-write, but copy-on-write seem like the most likely cause.

bojidar-bg on 7 May 2018

On a tangent in case anyone needs a work around, reading the data byte-by-byte from a file and converting it to a string is a temporary hack that may work for you instead of getting one slow string with f.get_as_text()

        f.open(filename, File.READ)
        var ch
        while not f.eof_reached():
            ch = PoolByteArray([f.get_8()]).get_string_from_ascii()

dodgyville on 8 May 2018

👍1

20411 brings this down to much more acceptable levels.

I think there may be some possibilities in sharing data for substrings.
Perhaps expanding StrRange into something COW aware, or even going one deeper and looking at sharing data for Vector subsequences?
But I think that's a much longer adventure for a another time.

ibrahn on 24 Jul 2018

Oh, and because numbers are always fun and bojidar-bg's test script is neat:

With #20411 on Intel Core i5-4570 CPU @ 3.20GHz

Running tests, hardness is 20, strings are of 1048576 chars
build_string    : 0.00002 msec/char (total: 23 msec)
loop_iter       : 0.00046 msec/char (total: 478 msec)
loop_no_char    : 0.00016 msec/char (total: 166 msec)
loop_get_char   : 0.00061 msec/char (total: 637 msec)

ibrahn on 24 Jul 2018