Cataclysm-dda: Crash when something falls into a giant sinkhole.

Created on 13 Mar 2019  路  11Comments  路  Source: CleverRaven/Cataclysm-DDA

Describe the bug
Was near a giant sinkhole when a gracken was chased into it. Game crashed as soon as it stepped in.

To Reproduce
Steps to reproduce the behavior:

  1. Make a player near a giant sinkhole.
  2. Spawn a gracken and a zombie.
  3. Have the gracken be chased into the sinkhole.

Expected behavior
Didn't expect a crash.

Screenshots
If applicable, add screenshots to help explain your problem.
image
Had this error report, then clicked OK and got this one:
image

Tried to reproduce and got this one:
image

Versions and configuration(please complete the following information):

  • OS: win10
  • Game Version: 0.D 480-gca6fef6
  • Graphics version: tiles
  • Mods loaded:
    "dda",
    "no_npc_food",
    "filthy_morale",
    "novitamins",
    "Medieval_Stuff",
    "makeshift",
    "More_Survival_Tools",
    "no_mutagen",
    "growable-pots",
    "mutant_npcs"

Additional context
Crash Log: crash.log

This is my first bug report, let me know if I messed up. Thanks!

Edit: No crash with Z-levels turned on at world gen.

(S2 - Confirmed) <Crash / Freeze> Monsters Z-levels

Most helpful comment

I finally understood the problem, although it took me quite a good deal of time. It basically breaks down to a user-after-free problem.

I used the save file from @scrubs2009 in #29681 which was really helpful because I could trigger the bug in just 3 keys press, 100% of the time (and also the save file was small so I could easily spot the offending monster). So thanks a lot scrubs! :)

TL;DR

When you change your z-level, all monsters on the level starts doing their moves in game::monmove().
This function is basically a big loop for each monster.

On the first iteration of this loop (or if the map is shifted), it builds the monster factions (basically, zombies are in the "zombies" faction, dogs in the "dogs" factions, etc.).

It happens that a monster might fall down a z-level because it was (or has moved at some point) on a trap. As the monster is not anymore on the same z-level as the player, the monster is simply deleted by the game, but it is not deleted from its monster faction.

Next, when monster::plan() is called, all monsters from the different factions are used. Even if a monster has been deleted from the map, it is still present (technically, its pointer) in the faction and is reused, but the c++ polymorphism fails - leading to a crash - because the monster object is now just a dangling pointer.

Analysis

So everything starts when the player actually change its z-level (by going down in this case) . Note that I really do think that the bug probably can not be triggered if z-level is on.

            case ACTION_MOVE_DOWN:
                if( !u.in_vehicle ) {
                    vertical_move( -1, false );
                }
                break;

So when you go down, the monster on the level you are going to have their turns and start doing their stuff. The majority of the things going on there are happening in game::monmove().

This function is basically a loop, going on for each monster in your z-level. It start by filling the monster factions on the first iteration, which will be used later in monster::plan():

    // Make sure these don't match the first time around.
    tripoint cached_lev = m.get_abs_sub() + tripoint( 1, 0, 0 );

    mfactions monster_factions;
    const auto &playerfaction = mfaction_str_id( "player" );
    for( monster &critter : all_monsters() ) {
        // The first time through, and any time the map has been shifted, recalculate monster factions.
        if( cached_lev != m.get_abs_sub() ) {
            // monster::plan() needs to know about all monsters on the same team as the monster.
            monster_factions.clear();

            // [snip] build the monster faction here [snip]
        }

Now in the same function (in the above big loop iterating on all monsters) we have this other loop:

        while( critter.moves > 0 && !critter.is_dead() ) {
            critter.made_footstep = false;

            // Controlled critters don't make their own plans
            if( !critter.has_effect( effect_controlled ) ) {
                // Formulate a path to follow
                critter.plan( monster_factions );
            }
            critter.move(); // Move one square, possibly hit u
            critter.process_triggers();
            m.creature_in_field( critter );
        }

Now if the monster move onto a trap (during critter.move()above), it might fall a z-level down (this is what happens in the save), which leads to the following stack trace:

>   Cataclysm.exe!trapfunc::ledge(Creature * c, const tripoint & p) Line 1000   C++
    [External Code] 
    Cataclysm.exe!trap::trigger(const tripoint & pos, Creature * creature) Line 176 C++
    Cataclysm.exe!map::creature_on_trap(Creature & c, bool may_avoid) Line 8033 C++
    Cataclysm.exe!monster::move() Line 521  C++
    Cataclysm.exe!game::monmove() Line 4649 C++
    Cataclysm.exe!game::do_turn() Line 1498 C++
    Cataclysm.exe!SDL_main(int argc, char * * argv) Line 689    C++
    Cataclysm.exe!main_getcmdline(...) Line 177 C

The monster is then removed from the current z-level map (it is not deleted yet, just removed from the map).

We then proceed to the next monster on the current z-level (next iteration of the big loop). Remember that the faction lists are not changed since they are only changed during the first iteration and if the map is shifted.

On the next iteration we start with the big loop once again:

    for( monster &critter : all_monsters() ) {

This time, the all_monsters() function will proceed to actually delete (in the c++ sense) the missing monster.

// full stack trace leading to the dtor
>   Cataclysm.exe!monster::`scalar deleting destructor'(unsigned int)   C++
    Cataclysm.exe!std::_Ref_count_obj<monster>::_Destroy() Line 913 C++
    Cataclysm.exe!std::_Ref_count_base::_Decref() Line 113  C++
    Cataclysm.exe!std::_Ptr_base<monster>::_Decref() Line 339   C++
    Cataclysm.exe!std::shared_ptr<monster>::~shared_ptr<monster>() Line 567 C++
    Cataclysm.exe!std::shared_ptr<monster>::operator=(std::shared_ptr<monster> && _Right) Line 555  C++
    Cataclysm.exe!game::non_dead_range<monster>::iterator::valid() Line 13240   C++
    Cataclysm.exe!game::non_dead_range<monster>::iterator::operator++() Line 377    C++
    Cataclysm.exe!game::monmove() Line 4596 C++
    Cataclysm.exe!game::do_turn() Line 1498 C++
    Cataclysm.exe!SDL_main(int argc, char * * argv) Line 689    C++
    Cataclysm.exe!main_getcmdline(...) Line 177 C
    [External Code] 

It's far from being obvious but the for loop with all_monsters() calls the game::non_dead_range<monster>::iterator::operator++():

    iterator &operator++() {
        do {
            ++iter;
        } while( iter != items.end() && !valid() );
        return *this;
    }

Which then calls game::non_dead_range<monster>::iterator::valid():

bool game::non_dead_range<monster>::iterator::valid()
{
    current = iter->lock();
    return current && !current->is_dead();
} 

And the above assignment just delete the monster. I was able to confirm that the problematic monster was deleted here (valid pointer before the call, invalid after; and the pointer is later used in monster::plan).

Next, in the big loop, the current monster (which is a valid one) will call monster::plan():

            if( !critter.has_effect( effect_controlled ) ) {
                // Formulate a path to follow
                critter.plan( monster_factions );

As you may notice, the monster factions are passed in argument to monster::plan(). At least one of the monster, in our case, is an invalid one because it has been deleted earlier.

The crash itself has already been explained in the previous posts above.

Fix

I don't have a precise idea yet on how to fix this bug, I just finished debugging the crash.

I guess the monster factions could be rebuilt when one of the monster has disappeared from the map.

All 11 comments

Guess if #28545 has something to do with this?

Probably related: could't get a crash in a world with experimental Z-levels turned on.

Confirmed

image

This bug is interesting but not necessarily really easy to solve. I also wonder why it hasn't manifested more than that (it seems the code involved here is at least 4 years old).

Start with the code here in monster::plan():

for( monster *const mon_ptr : fac.second ) {
    monster &mon = *mon_ptr;
    float rating = rate_target( mon, dist, smart_planning );

We got a monster from a (hostile) faction and call rate_target() passing the monster reference as the first argument. Now for rate_target() defined here:

float monster::rate_target( Creature &c, float best, bool smart ) const
{
    const int d = rl_dist( pos(), c.pos() );

As you can see rate_target() takes a Creature reference (this is the important point!) so basically in this case a Monster class instance is downcasted to a Creature instance.

The Creature class doesn't offer an implementation of the pos() method (as it is called in c.pos()) as it is pure virtual (defined here):

virtual const tripoint &pos() const = 0;

So, in this precise case, calling c.pos() on a Creature instance is an error because it ands up on a pure virtual call (it is caught by the runtime and raises an abort()) and somehow polymorphism doesn't apply here (creature.pos() should have called the right implementation in monster)....

As of now, the list of classes derived from Creature are:

  • Character
  • monster
  • player
  • npc
  • standard_npc

This probably requires a thorough investigation.

What about making rate_rarget accept a Creature pointer? Then it can dispatch to monster::pos().

In general, Creature is treated as pure virtual, I'm not sure why it was even allowed to get this far (passing Creature as a reference).

Did you try casting to monster?

Thank you both for your answers!

Did you try casting to monster?

I could have "upcasted" (and I guess it would work) but there's a risk that this function might be called with something else than a monster (e.g an NPC or the player). to answer your question, no I haven't tried, even for a test. I'll do!

What about making rate_rarget accept a Creature pointer? Then it can dispatch to monster::pos().

In general, Creature is treated as pure virtual, I'm not sure why it was even allowed to get this far (passing Creature as a reference).

I haven't tried yet. Will do!

I'm still puzzled why polymorphism doesn't work here... We discussed this with @ifreund yesterday but we couldn't come to a definitive conclusion :)

There are functions which check what given Creature reference really is and cast it to whatever necessary in given context, e.g.:

https://github.com/CleverRaven/Cataclysm-DDA/blob/71b58ffe6a173d496eb83558447ce30702e6d819/src/game.cpp#L5424-L5493

Did another round of tests today, without any luck...

Keeping the reference and using a dynamic_cast to monster. It works for some monster, but not for (at least) one:

float monster::rate_target( Creature &c, float best, bool smart ) const
{  
    const auto mon = dynamic_cast<monster*>(&c); // null (at least once)
    tripoint c_pos;
    if (mon) {
        c_pos = mon->pos();
    }
    else {
        c_pos = c.pos();  // crash
    }

Obviously, calling is_monster() (in the calling function) gives true but once in rate_target it yields false:

>? mon.is_monster()  // in caller
true
>? c.is_monster() // in rate_target
false

Changing rate_target() to take a pointer rather than a reference (and obviously changing all callers). Still the same thing, at least one monster fails:

float monster::rate_target( Creature *c, float best, bool smart ) const
{  
    const auto mon = dynamic_cast<monster*>(c); // null (for some monster)
    tripoint c_pos;
    if (mon) {
        c_pos = mon->pos();
    }
    else {
        c_pos = c->pos(); // crash
    }

I read some literature online to get an idea what could trigger a pure virtual call and it always seems to be related to constructors and destructors calling a virtual function.

I spent a good deal of time in the ctor() and dtor() for monster and Creature but couldn't see anything fishy... I'm starting to get out of idea.

Repro file

Note: I'm attaching a save file which has a good chance to trigger the bug. It's not a 100% chance but in at least 50% of the time you'll hit the problematic code and one of the monsters given by the caller will trigger the bug.

You'll start on a staircase (going up), then don't do anything besides pressing "<" to go upstairs. It might trigger.

I don't think the config really matters here.

Zip file SHA-1 hash: A7C5003525E384D858D572DD49BFF5A0B7543DAB
Okeelanta.zip

I finally understood the problem, although it took me quite a good deal of time. It basically breaks down to a user-after-free problem.

I used the save file from @scrubs2009 in #29681 which was really helpful because I could trigger the bug in just 3 keys press, 100% of the time (and also the save file was small so I could easily spot the offending monster). So thanks a lot scrubs! :)

TL;DR

When you change your z-level, all monsters on the level starts doing their moves in game::monmove().
This function is basically a big loop for each monster.

On the first iteration of this loop (or if the map is shifted), it builds the monster factions (basically, zombies are in the "zombies" faction, dogs in the "dogs" factions, etc.).

It happens that a monster might fall down a z-level because it was (or has moved at some point) on a trap. As the monster is not anymore on the same z-level as the player, the monster is simply deleted by the game, but it is not deleted from its monster faction.

Next, when monster::plan() is called, all monsters from the different factions are used. Even if a monster has been deleted from the map, it is still present (technically, its pointer) in the faction and is reused, but the c++ polymorphism fails - leading to a crash - because the monster object is now just a dangling pointer.

Analysis

So everything starts when the player actually change its z-level (by going down in this case) . Note that I really do think that the bug probably can not be triggered if z-level is on.

            case ACTION_MOVE_DOWN:
                if( !u.in_vehicle ) {
                    vertical_move( -1, false );
                }
                break;

So when you go down, the monster on the level you are going to have their turns and start doing their stuff. The majority of the things going on there are happening in game::monmove().

This function is basically a loop, going on for each monster in your z-level. It start by filling the monster factions on the first iteration, which will be used later in monster::plan():

    // Make sure these don't match the first time around.
    tripoint cached_lev = m.get_abs_sub() + tripoint( 1, 0, 0 );

    mfactions monster_factions;
    const auto &playerfaction = mfaction_str_id( "player" );
    for( monster &critter : all_monsters() ) {
        // The first time through, and any time the map has been shifted, recalculate monster factions.
        if( cached_lev != m.get_abs_sub() ) {
            // monster::plan() needs to know about all monsters on the same team as the monster.
            monster_factions.clear();

            // [snip] build the monster faction here [snip]
        }

Now in the same function (in the above big loop iterating on all monsters) we have this other loop:

        while( critter.moves > 0 && !critter.is_dead() ) {
            critter.made_footstep = false;

            // Controlled critters don't make their own plans
            if( !critter.has_effect( effect_controlled ) ) {
                // Formulate a path to follow
                critter.plan( monster_factions );
            }
            critter.move(); // Move one square, possibly hit u
            critter.process_triggers();
            m.creature_in_field( critter );
        }

Now if the monster move onto a trap (during critter.move()above), it might fall a z-level down (this is what happens in the save), which leads to the following stack trace:

>   Cataclysm.exe!trapfunc::ledge(Creature * c, const tripoint & p) Line 1000   C++
    [External Code] 
    Cataclysm.exe!trap::trigger(const tripoint & pos, Creature * creature) Line 176 C++
    Cataclysm.exe!map::creature_on_trap(Creature & c, bool may_avoid) Line 8033 C++
    Cataclysm.exe!monster::move() Line 521  C++
    Cataclysm.exe!game::monmove() Line 4649 C++
    Cataclysm.exe!game::do_turn() Line 1498 C++
    Cataclysm.exe!SDL_main(int argc, char * * argv) Line 689    C++
    Cataclysm.exe!main_getcmdline(...) Line 177 C

The monster is then removed from the current z-level map (it is not deleted yet, just removed from the map).

We then proceed to the next monster on the current z-level (next iteration of the big loop). Remember that the faction lists are not changed since they are only changed during the first iteration and if the map is shifted.

On the next iteration we start with the big loop once again:

    for( monster &critter : all_monsters() ) {

This time, the all_monsters() function will proceed to actually delete (in the c++ sense) the missing monster.

// full stack trace leading to the dtor
>   Cataclysm.exe!monster::`scalar deleting destructor'(unsigned int)   C++
    Cataclysm.exe!std::_Ref_count_obj<monster>::_Destroy() Line 913 C++
    Cataclysm.exe!std::_Ref_count_base::_Decref() Line 113  C++
    Cataclysm.exe!std::_Ptr_base<monster>::_Decref() Line 339   C++
    Cataclysm.exe!std::shared_ptr<monster>::~shared_ptr<monster>() Line 567 C++
    Cataclysm.exe!std::shared_ptr<monster>::operator=(std::shared_ptr<monster> && _Right) Line 555  C++
    Cataclysm.exe!game::non_dead_range<monster>::iterator::valid() Line 13240   C++
    Cataclysm.exe!game::non_dead_range<monster>::iterator::operator++() Line 377    C++
    Cataclysm.exe!game::monmove() Line 4596 C++
    Cataclysm.exe!game::do_turn() Line 1498 C++
    Cataclysm.exe!SDL_main(int argc, char * * argv) Line 689    C++
    Cataclysm.exe!main_getcmdline(...) Line 177 C
    [External Code] 

It's far from being obvious but the for loop with all_monsters() calls the game::non_dead_range<monster>::iterator::operator++():

    iterator &operator++() {
        do {
            ++iter;
        } while( iter != items.end() && !valid() );
        return *this;
    }

Which then calls game::non_dead_range<monster>::iterator::valid():

bool game::non_dead_range<monster>::iterator::valid()
{
    current = iter->lock();
    return current && !current->is_dead();
} 

And the above assignment just delete the monster. I was able to confirm that the problematic monster was deleted here (valid pointer before the call, invalid after; and the pointer is later used in monster::plan).

Next, in the big loop, the current monster (which is a valid one) will call monster::plan():

            if( !critter.has_effect( effect_controlled ) ) {
                // Formulate a path to follow
                critter.plan( monster_factions );

As you may notice, the monster factions are passed in argument to monster::plan(). At least one of the monster, in our case, is an invalid one because it has been deleted earlier.

The crash itself has already been explained in the previous posts above.

Fix

I don't have a precise idea yet on how to fix this bug, I just finished debugging the crash.

I guess the monster factions could be rebuilt when one of the monster has disappeared from the map.

That explains why those crashes only really happen when changing Z levels! Great! I'm glad lab towers will hopefully soon be a viable place to live in without worrying about crashes trapping you on the top floor. I do have one extra thing to note. Updating the game through the launcher, even if already on the current version, fixes whichever crash you were having before you updated. If I had to guess (and this is a complete guess) I would think maybe the launcher's process of updating, copying, and deleting old files clears the monster faction lists to repopulate them with new monsters, therefore getting rid of the now broken one.

Either way, great job and I'm happy I could help.

Was this page helpful?
0 / 5 - 0 ratings