Triplea: UnitScroller ConcurrentModificationException

Created on 24 Jan 2020  路  10Comments  路  Source: triplea-game/triplea

How can the problem be recreated?

  • Unknown
  • Observed during an AI vs AI game with the action panel side bar collapsed
java.util.ConcurrentModificationException
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1660)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at org.triplea.java.collections.CollectionUtils.getMatches(CollectionUtils.java:44)
    at games.strategy.engine.data.UnitCollection.getMatches(UnitCollection.java:192)
    at games.strategy.triplea.ui.unit.scroller.UnitScrollerModel.getMoveableUnits(UnitScrollerModel.java:34)
    at games.strategy.triplea.ui.unit.scroller.UnitScroller.lambda$drawUnitAvatarPane$4(UnitScroller.java:167)

When the problem is fixed, what should have happened instead?

  • No error message

Which Engine Version are you using?

e8379e10091463f0832265b814a1886e67ac4ad8

Problem

All 10 comments

I can confirm that I just saw this error as well during an all AI game on revised.

I apologize if this is off-topic, but does the engine have good handling reg. concurrency? Such that the game state is not read or written to concurrently by the AI, the game step, the UI, etc.? And maybe also always establishes a "happens-before" relationship as needed? If this comment is off-topic and/or other, please delete this comment.

EDIT: Concurrency tends to be a tricky topic which trips up many/most new as well as experienced developers in my experience, for multiple different good reasons. I believe I have more experience, understanding and knowledge of it than most reg. multiple aspects, at least reg. the JVM, though I don't have much opportunity to contribute to the engine at the moment. I might be able to offer some minor advice or suggestions if you would like it, however.

There was recently this pull request which might be related to this issue, though I haven't looked into it:
https://github.com/triplea-game/triplea/commit/65cf02c9a5d23a68615d2edf71b1dfa867d487f6#diff-2fb707b9c3af7d3afbc2ebc9ad9bf7b0

@MelvinWM Generally, yes. As the game data has a read/write lock that can be used to avoid concurrency issues. I actually think I saw this error happen even before that PR, its just that the issue that PR fixed happen way more frequently than this error.

(note that I made an edit to one of my comments).

@ron-murhammer Thank you. Given the existing approach of having a single mutable game state instance, using a lock might well make good sense. And a read/write lock instead of a regular lock should help decrease blocking. My impression is that you (the current developers) have done a very good or excellent job taking over from the early development that has been done (and early development can definitely be far from easy, of course). The current programming interface for using the lock seems to me a bit error-prone, though it can be difficult to avoid or prevent such possible issues reg. concurrency, at least while using shared mutable state.

Possible bugs that I can see as such:

  1. Great care should be taken to always take the game state lock when necessary. This is more obvious reg. state of type GameState, where the instance is likely the main game state and shared. And less obvious with game state parts of type Unit, List<Unit>, Territory, etc. For those instances that are not part of the main game state and confined to a single thread (for instance deeply copying from parts of the main game state) or that are immutable, it should be fine. But for those mutable instances that are part of the main game state, using the lock will likely be necessary in the different places that state is used AFAICT.
  2. The developer has to be careful to use the write lock and not the read lock if any kind of mutations to the game state is made.
  3. The developer should always use try/finally to ensure release of the lock.
  4. The developer should always release the corresponding kind of lock when acquiring (ie. write if write). (might be easy to catch early when testing, however).
  5. The developer should remember to release the lock. (might be easy to catch early when testing, however).

Possible bugs 4 and 5 are likely not a problem in practice, since I would expect them to be caught early. 3 is likely not an issue in practice either, while 2 might be a little bit of an issue. And possible bug 1 is likely the biggest issue relatively speaking in practice AFAICT. There seems to be a number of places in at least the UI where parts from the main game state are referenced in fields, like a Territory instance or a list of Unit instances.

(there is of course also the risk of overuse or misuse of locking, which might lead to unnecessary blocking, as well as buggy interacting locks possibly causing liveness issues such as deadlocks, though I would guess that the game state lock is only very, very rarely used together with other locks, so I would guess that this is unlikely to be an issue).

I cannot help but wonder whether issue 1 might be the cause here. Some state, such as a list of units, that should have been protected by a lock whenever used, might not be protected in a specific place in the UI and/or elsewhere. That kind of bug might be very difficult to prevent in general, especially due to the typical difficulty of reproducing concurrency errors. But, if you can reproduce it here, it might be worth looking into.

If this was a greenfield project early on, it might have made sense to store immutable IDs for the different kinds of units instead of the mutable objects themselves, though that might have challenges in itself of course. An immutable game state might also make sense, if going the whole way, though that might likewise have challenges.

As for possible bugs 3, 4 and 5, it might be possible to create a nicer interface for preventing those parts using function literals, but I would guess that the type system in Java and checked exceptions might prevent a nice interface for that, at least for some cases. The basic concept would be to create a library function, which calls the argument block function in try/finally with the appropriate acquire/release calls. But those possible bugs are likely not that big of an issue either way.

I hope this comment might be useful, though it might be a bit long :slightly_smiling_face: .

@MelvinWM Agree with your analysis. In this case, I'm guessing the code needs to at least grab a read lock.

I don't think I agree a good fix here is a read lock. The swing thread is already single threaded. If there is a concurrent modification, it means the last rendering result is going to be changed anyways. Doing that at a slower cadence does not fundamentally help. IMO making an immutable copy of the list of units and rendering that is probably a nicer way to go, particularly considering there would be a re-rendering that is queued behind that initial rendering operation.

I'll note as well sonar does point out places where we fail to release locks, it's also likely the case that many locks are not needed to begin with, and the comment around the game locking code has always struck me as concerning/hacky:

  • This class is not terribly good for multithreading as it locks globally on all calls, but that

  • is ok, as this code is meant more for when you are considering your ambitious multi-threaded code

    • a mistake, and you are trying to limit the damage.

Said another way, I suspect the bug is probably more that we're using ArrayList.java instead of a copy-on-write thread-safe variant. Adding a lock around it rather than making more immutable strikes me as a more complicated solution.

@DanVanAtta The part reg. Sonar checking for unreleased locks sounds really good, and a clear benefit of automated checking like Sonar. Looking at LockUtil, it seems like it has some runtime detection and logging reg. these issues as well.

Reg. the current solution for concurrency and mutable game state.

I think the solution with having a game state lock is not the nicest solution, but it IS a relatively simple solution - whenever you want to read from and/or write to the game state, you first take the lock. It can be a bit/somewhat error-prone, and it can also end up blocking at various times, but given that there is mutable state shared across threads, it is as such a working solution that is not complex. And avoiding complex solutions is a very, very good idea regarding concurrency.

It might make good sense to look into having more immutable state, though there is partially the existing code base to consider (it might be a lot of work), and partially the runtime efficiency, since this is a game engine. And given that it is a game engine, the requirements reg. correctness are not as high as they typically otherwise would be. Of course, immutable state can in certain cases be more efficient, since you never need to copy it apart from a reference to it, and in certain cases (depending on data structure and usage) "updates" can internally share the unmodified parts (such as for persistent collections and the like). I don't know whether it is really feasible to support something like persistent collections in this case, however.

Concurrency and memory consistency

(Warning, minor rant :smiley: ).

Reg. at which points it might be superfluous to take the lock when reading (and writing) mutable state shared across threads, extreme care should be taken reg. this. A major reason for this is that if you do not ensure that memory consistency reg. the Java Memory Model is kept, you can end up with really weird and surprising runtime behaviour (for instance due to the JIT making assumptions reg. memory consistency when doing optimizations). The weird behaviour that can happen is not nearly as bad as undefined behaviour in C and C++ (where in theory absolutely anything and everything horrible can happen, see for instance http://catb.org/jargon/html/N/nasal-demons.html and https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html ), but it is still weird. See for instance http://jcip.net/ , such as http://jcip.net/listings/Holder.java :

package net.jcip.examples;

/**
 * Holder
 * <p/>
 * Class at risk of failure if not properly published
 *
 * @author Brian Goetz and Tim Peierls
 */
public class Holder {
    private int n;

    public Holder(int n) {
        this.n = n;
    }

    public void assertSanity() {
        if (n != n)
            throw new AssertionError("This statement is false.");
    }
}

The AssertionError might be thrown in certain cases if the object is not published properly. And there might be other weird behaviour that can happen if there are memory consistency bugs. (As a side-note, "Java: Concurrency in Practice" is a really good book, though it is a bit old by now).

The safest approach would therefore be to seek to always take the lock when reading from and/or writing to the shared mutable state.

Many very intelligent, smart and experienced developers have tried reasoning about these topics and issues without fully reading up on the relevant theory and understanding it and have ended up getting pain from it, and even experts in concurrency do take great care when mutable memory is shared across threads. It is also one of the main reasons for the popularity of functional programming, since sharing fully immutable state across threads is generally much, much easier to reason about, work with, etc. than mutable state. (of course, for those that develop libraries, they have to ensure that their immutable data structures are still actually thread-safe when doing internal optimization tricks, see for instance https://www.scala-lang.org/api/2.12.10/scala/collection/immutable/List.html and the note reg. mutable state; this bug hid there for several years as far as I know, though it does seem like they fixed it in Scala 2.13: https://www.scala-lang.org/api/2.13.1/scala/collection/immutable/List.html ).

Reg. approaches for fixing the bug

Reg. using copying, or thread-safe collections, I think you are right for those cases where the data is (deeply) immutable, however, in multiple cases, the data types (such as Unit, TripleAUnit, Territory, etc.) in the collections are typically mutable. And that means that you both have to ensure that the collection is thread-safe itself or is handled thread-safely if it is shared across threads, and also have to ensure that the instances contained by it are thread-safe or are handled thread-safely. There is also aspects such as at how updates to game state are shared/propagated, as well as if you are using (deep) copying, how frequently and how much copying you are doing. I am not certain, but copying for instance the whole game state frequently might be both time-consuming, memory-intensive, as well as put pressure on the GC, given that this is a game engine, though I might well be overestimating the memory consumption reg. the game state. Though it also depends on how much of the game state is copied, if only a small fraction is overall copied, it is less of an issue.

Ideas reg. handling the blocking and error-proneness

I believe I know too little of the architecture to really give good input reg. that, including reg. game state handling and concurrency, but I do wonder whether one might be able to set up a system where the UI-system has its own game state. The UI game state would then receive updates from the main game state applied in the UI thread, and the UI code could then simply read directly from the UI game state without any need for locking, which would prevent the issues reg. blocking and error-proneness. However, I can easily imagine this not being straight-forward in practice for a variety of reasons, and it may well have some other issues, such as the UI game state possibly lagging behind the main game state. It would also require that applying changes (not computing them) is fast enough for them to be applied in the UI thread. And it would also increase how much memory is used in total as far as I can see.

I also do not know how much of an issue blocking might be reg. the game engine in practice, I have not investigated or tested it.

Took a look at this specific exception and came up with three different levels of fix that each result in this exception no longer being reproducible:

  1. Unit scroller should not even be rendered during an all-AI game: https://github.com/triplea-game/triplea/pull/5966
  2. Reduced concurrency contention by moving computations to be same-thread rather than delayed and done on EDT: https://github.com/triplea-game/triplea/pull/5967
  3. Made collection computations generically thread-safe: https://github.com/triplea-game/triplea/pull/5968
Was this page helpful?
0 / 5 - 0 ratings

Related issues

General-Dru-Zod picture General-Dru-Zod  路  5Comments

Khobai picture Khobai  路  9Comments

DanVanAtta picture DanVanAtta  路  8Comments

DanVanAtta picture DanVanAtta  路  5Comments

ron-murhammer picture ron-murhammer  路  5Comments