Triplea: EDT Lockup detection and reporting

Created on 31 Jul 2018  路  5Comments  路  Source: triplea-game/triplea

Following: https://github.com/triplea-game/triplea/issues/3661 where EDT was locked up and we could not repro, I was wondering if it's a good idea to monitor the EDT thread. If say it does not allow new tasks within a reasonable amount of time, like 3~10s, then we consider it crashed and record a thread dump to file. Then on restart, or if we can interrupt the EDT, we'd let the user know of the lock up and to send us the thead dump file. Perhaps with that, then next time we'll have the needed info to solve something like #3661 .

Googling around, looks like there could be ways to detect an EDT lock-up: http://www.java2s.com/Code/Java/Event/MonitorstheAWTeventdispatchthreadforeventsthattakelongerthanacertaintimetobedispatched.htm

Discussion

All 5 comments

@DanVanAtta
What actually does "EDT" stand for, is it this:
https://en.wikipedia.org/wiki/Event_dispatching_thread
?
Thank you.

@panther2 Exactly

@DanVanAtta I don't think having such a deadlock detector would be useful to an extent we'd like to have.
The underlying problem is the real thing to tackle in my opinion, there are too many places in the code where blocking operations like slow IO operations are executed on the EDT, therefore freezing the UI.
What we should do from here is trying to find expensive operations in the code and add assertions that they're not being executed on the EDT and see if something breaks.

@panther2 edt = event dispatching thread, it's a special thread in Swing where all UI rendering work is done. If that thread is locked up then all UI buttons, etc... will not work

@RoiEXLab even if we do not detect a deadlock per-say, wouldn't it still be useful to find slow/expensive operations running on EDT? Seems like we are discussing two different directions in solving the same problem, one: detecting known expensive code running on EDT, two: detective unknown expensive code running on EDT.

Offhand seems like both are ideal. The first is a culture shift to add a lot of "This is not the EDT" assertions, the second if we can get something reasonably well I like as it'll monitor for any problems we miss.

@RoiEXLab hmm, going back to the original problem, the issue is not a slow IO operation, it's a lock on the EDT that is never released.

The underlying problem is the real thing to tackle in my opinion,

The underlying problem is actually not known, that is the thing, presumably it's not IO as the EDT is indefinitely locked. The end goal is really to automate the 'run a thread dump' task so we do not need to rely on developers to reproduce a lock situation where they can get a thread dump.

Trying to keep inappropriate operations off the EDT is a good goal too, but it won't alone solve the problem presented here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

General-Dru-Zod picture General-Dru-Zod  路  5Comments

DanVanAtta picture DanVanAtta  路  5Comments

DanVanAtta picture DanVanAtta  路  8Comments

DanVanAtta picture DanVanAtta  路  6Comments

Cernelius picture Cernelius  路  8Comments