Triplea: AI hangs while seemingly processing a trigger

Created on 2 Dec 2018  路  11Comments  路  Source: triplea-game/triplea

Engine version

1.10.0.0.13156

My Operating System

Debian on i386

Map name

Total War: December 1941

Can you describe how to trigger the error?

Run the game. After a few rounds the game gets stuck during an AI turn while processing a trigger (at least this is the last entry in the history).

I noticed this behaviour in multiple different games.

Do you have the exact error text? Please copy/paste if so

There is no error message. The AI processing seemingly gets stuck without any messages.

Instead of this error, what should have happened?

The AI should finish its turn after a while.

Any additional information that may help

Problem

Most helpful comment

I took a brief look at this. Not sure how helpful the following analysis is, but I'm leaving it here while it's fresh in my head.

I loaded the attached save game and broke in with the debugger. It appears the AI is running some battles through the battle calculator during NCM. Most of those battles seem to run fine, but it eventually gets to "Japan attack ExiledAllies in Celebes". That's the one that hangs in the sense that it never ends--it just keeps running round after round (e.g. I broke in during round 51,527).

At this point, the engine is popping and executing actions from the execution stack until it gets to the last one, which was created by the following code:

https://github.com/triplea-game/triplea/blob/610f0cc75f5c87df89a8c8b972a3c4e9783ba2a9/game-core/src/main/java/games/strategy/triplea/delegate/MustFightBattle.java#L878-L902

When that last action gets executed, isOver is still false, so it ends up pushing the entire loop back onto the stack once again (line 897), and everything repeats ad infinitum. I didn't debug why isOver is failing to get set to true, but that seems to be key.

I went back as far as 1.10.0.0.12970, and the hang still occurs with the attached save game. I couldn't go back any farther than that because 12970 was the last of the Checkstyle member name fixes. So any build before that would be incompatible with the save game.

All 11 comments

I took a brief look at this. Not sure how helpful the following analysis is, but I'm leaving it here while it's fresh in my head.

I loaded the attached save game and broke in with the debugger. It appears the AI is running some battles through the battle calculator during NCM. Most of those battles seem to run fine, but it eventually gets to "Japan attack ExiledAllies in Celebes". That's the one that hangs in the sense that it never ends--it just keeps running round after round (e.g. I broke in during round 51,527).

At this point, the engine is popping and executing actions from the execution stack until it gets to the last one, which was created by the following code:

https://github.com/triplea-game/triplea/blob/610f0cc75f5c87df89a8c8b972a3c4e9783ba2a9/game-core/src/main/java/games/strategy/triplea/delegate/MustFightBattle.java#L878-L902

When that last action gets executed, isOver is still false, so it ends up pushing the entire loop back onto the stack once again (line 897), and everything repeats ad infinitum. I didn't debug why isOver is failing to get set to true, but that seems to be key.

I went back as far as 1.10.0.0.12970, and the hang still occurs with the attached save game. I couldn't go back any farther than that because 12970 was the last of the Checkstyle member name fixes. So any build before that would be incompatible with the save game.

@ssoloff Did you happen to dump any of the info about the battle like what units each side has left? I'll try to eventually take a look at the save game.

@ron-murhammer I did not think to do that, but I still have the debugging environment set up. Here's what some selected fields from the MustFightBattle for "Japan attack ExiledAllies in Celebes" looks like at round 3277:

Field | Value
:-- | :--
attackerLostTuv | 0
attackingUnits | [japaneseInfantry owned by Japan, japaneseInfantry owned by Japan]
attackingUnitsRetreated | []
attackingWaitingToDie | []
bombardingUnits | []
defenderLostTuv | 0
defendingAa | []
defendingUnits | [americanHeavyStrategicBomber owned by Usa, americanAirTransport owned by Usa]
defendingUnitsRetreated | []
defendingWaitingToDie | []
isOver | false
killed | []
offsensiveAa | []
whoWon | NOTFINISHED

Please let me know if I'm looking at the totally wrong information than what was requested. I'm still not sure where the sweet breakpoint hotspots are for debugging battles, so I could be viewing this state at a totally inappropriate time.

I did note something new and interesting this time in the debugger. It appears this same battle is running concurrently on two battle calculator threads. Here's some traces I added that include the thread name (format is [<thread-name>] (<round>): <battle title>):

[ProAi ConcurrentOddsCalculator Worker-1] (3241): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3242): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3243): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-0] (3276): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3244): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3245): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3246): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3247): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3248): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-0] (3277): Japan attack ExiledAllies in Celebes
[ProAi ConcurrentOddsCalculator Worker-1] (3249): Japan attack ExiledAllies in Celebes

When I first saw this I thought that the same MustFightBattle instance getting run on two threads at the same time is what's causing isOver to never seem to be true (e.g. it might get set to true on one thread but then reset back to false by the battle step running on the other thread).

However, upon closer inspection in the debugger, they don't appear to be the same MustFightBattle _instances_. They're just two different instances of the same battle running at the same time.

A little more information after setting a breakpoint in Fire#rollDice()... For some reason, both the attacking units and the defending units in this battle end up with a power of zero and a roll target of zero. Because the total unit power is zero, no dice are ever rolled, and no casualties are ever selected. LL dice are enabled, ~if that makes a difference.~ (EDIT: It doesn't. The zero power calculations are actually taking place in DiceRoll#getUnitPowerAndRollsForNormalBattles(); we never make it into the actual LL code.)

Okay, this is starting to make sense. Unit strength is going negative when territory effects are accounted for. The negative strength is subsequently normalized to zero.

So the territory effects in Celebes ([TerritoryEffect{name=Island}, TerritoryEffect{name=Mountain}, TerritoryEffect{name=Jungle}]) seem to be preventing any of the attackers or defenders from being able to inflict damage.

@ron-murhammer Has code ever existed to detect this condition? That is, that a battle is basically unresolvable because both sides have zero power? Just wondering if this is something that's never come up (hard to believe) or a regression because said code has been accidentally removed.

The Battle Calculator is perhaps an easier way to reproduce this bug.

I set up the Battle Calculator to fight a battle between two Japanese infantry and one British air transport and one British strategic bomber on Celebes (with the territory effects enabled). Running it, the progress window is displayed forever. Even clicking Cancel won't kill the battle. The only way out was to kill the process.

I reproduced the Battle Calculator scenario in 1.10.0.0.13220, 1.9.0.0.13066, and 1.9.0.0.12226. I couldn't try any older stable engine versions because TWW 2.7.7.2 appears to use some newer map XML that is not compatible with stable versions prior to 12226.

@ssoloff Great investigation.
On the cancel button issue:
It's not surprising that cancel doesn't really work well.
In the end, all the ConcurrentOddsCalculator does is invoking the cancel method of individual OddsCalculators which only cancels the individual OddsCalculator on the next run of an individual battle. Because the battle never ends, the calculator won't cancel.
For the ConcurrentOddsCalculator at least it could make sense to expose the individual futures to be able to cancel them, using an interrupt if necessary, or somehow change the default OddsCalculator#cancel implementation to throw an interrupt if the battle takes too long. (Maybe wrap battle execution in another future or something.)
This however might cause issues with cleaning up the GameData to be reused in the future, we really need some sort of "SumulatedGameData" an object that wraps a GameData without cloning it but provides some sort of "write protection layer" that stores the changes that should be performed on thw GameData, but doesn't really pipe them through.
This way we wouldn't need to worry about cleaning stuff up and could just drop the object + we could easily clean up the ConcurrentOddsCalculator code.

@ssoloff I figured it was something like that. I think there are methods to deal with attacker or defender having 0 strength but probably not both. I'll take a look. My general thought is if both attacker and defender end up with 0 strength then attacker should have to retreat or units die if unlimited rounds.

Now, this is not very direct, but by v1 intended rules you are obliged to retreat if you are left with only transports in attack, which can be read as being obliged to retreat if you are left with attack 0 units only. So, this is actually an un-implemented item of the default ruleset.

However, this says nothing about the case you are left with attack 0 units and cannot retreat, as that would never happen, in the referring map.

@sumpfralle This should now be fixed in the latest pre-release. I tested both using your save game which now continues on and testing the situation @ssoloff describes using the battle calc.

Cool - thank you!
btw: it was a pleasure for me to follow your exchange of thoughts while investigating the issue.
Keep on having fun!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

drockken picture drockken  路  6Comments

DanVanAtta picture DanVanAtta  路  5Comments

DanVanAtta picture DanVanAtta  路  5Comments

ron-murhammer picture ron-murhammer  路  6Comments

panther2 picture panther2  路  5Comments