Lila: bad performance with FF66 and FF67dev

Created on 9 May 2019  路  14Comments  路  Source: ornicar/lila

Starting with Lichess v2 I experience some severe issues.

symptoms:

  • sluggish performance, mostly noticable when scrolling through a game's moves. It does not feel smooth at all but used to before v2
  • occasional display issues, mostly when using the analysis board. When requesting computer analysis, the graph used to slowly build up as the analysis comes in. Now sometimes it doesn't draw or it draws but there is a "gap" in the beginning right around where the precomputed opening analysis ends and server analysis starts
  • occasional stalls. Firefox would stop responding for a few seconds (not only the lichess tab) and sometimes crash completely if you wouldn't reload the site. This is usually accompanied by severe display issues. For example pieces would not be drawn correctly or at all with only the board and the move highlights being displayed.

I uploaded a video comparing Chromium vs Firefox here. (valid 7 days - anyone got a better idea?) Notice the glitches in the evaluation graph as I hover over it in Firefox.

steps to reproduce and setup used:

FF66 with and without addons, FF67dev without addons and all default settings, hardware acceleration enabled (performance is as bad without), Windows 10, Intel IGP (HD520, latest drivers)

With that, just use the site. Sometimes it is possible to reproduce some effects by going to about:support in Firefox and hitting "Trigger Device Reset" under the "Graphics"-section (present in the Developer Edition)

possible cause:

No idea really - I suspect it has something to do with hardwarce acceleration though. I noticed that GPU usage in Chromium is about double that of Firefox when scrolling through the moves. Not when scrolling through the page though. Also Firefox takes severe hits in FPS (according to the inbuilt devtools) whereas Chromium doesn't.

might be related to #5052 and #5072

bug

All 14 comments

second point was fixed in edd8f07068977a9bf36034a6ca0ff83be286c336 (not yet live)

@niklasf thanks for the quick reply.

Small update on my part:

I use a _setup with 2 monitors_. _2560x1440 main display_ (OS scaling enabled) and _1280x1024 external screen_. When I use lichess on that external screen, the glitches in the evaluation graph (see video) are not visible. When I only use _half the screensize of the external screen, lichess feels very smooth_ and responsive, just as it does in Chromium in all situations. Using fullscreen is better than on the main display, but not perfect. _Dragging the browser window to the main screen makes Lichess very sluggish again. It also triggered a stall as described above, with the board being either completely black or otherwise glitchy_. Might this be a Firefox issue after all? But then again, why did it not happen in v1? Also lichess is the only site with such severe issues for me at the moment.

Another update: about:support in Firefox shows

Failure Log
(0) CP+[GFX1-]: (gfxWindowsPlatform) Detected device reset: 1
(1) CP+[GFX1-]: (gfxWindowsPlatform) scheduled device update.
(2) CP+[GFX1-]: LayerManager::EndTransaction skip RenderLayer().
(3) CP+[GFX1-]: A content-only TDR is detected.

Lichess was the only site I used after a browser restart and making sure the failure log was empty before.

about:support in Firefox shows

These messages indicate that the graphics driver crashed and it was seamlessly restarted, together with a reset of the GPU. If this happens obviously graphics won't be smooth.

This is a tricky situation. This is Intel's fault as it is their driver that is breaking, but getting Intel to fix it might be pretty hard (where even to report bugs?). If you can reliably reproduce this error by dragging the browser window, feel free to report it in Firefoxes' Bugzilla. We might be able to analyze which part of the rendering is crapping out Intel's driver, which then might at least give some clue what part of the lichess redesign is related to it.

Another thing to try is to see if the oldest Firefox supported by lichess had the issue. If it did not, the mozregression tool can be used to pinpoint the Firefox change that triggers the bug in the Intel drivers. Which might or might not give further clues :-/

Unfortunately due to the distance between "lichess HTML" and "Intel Graphics driver" these kind of problems are very hard to fix.

And obviously, if Intel publishes new drivers, you want to try them right away...

Thanks @gcp aswell. As for the drivers - due to lack of a newer driver, I tried an old one (~1.5 years) without any success. Also I am not quite sure how to conveniently try an older Firefox version. However, I tried a newer one (Nightly 68.0a1 (2019-05-09)) and that did work very well at first glance. Also it did not crash or had rendering issues while testing. I noticed another thing though.

Performance on the analysis board apparently depends on game length (in terms of moves). I tried rapidly scrolling through the moves and also hovering over the evaluation graph.

| game | performance |
| --- | --- |
| game 1 |FF66/67: _slow_, FF Nightly: _rather smooth_, Chrome: _perfect_|
| game 2 | FF66/67: _extremely slow_, FF Nightly: _slow_, Chrome: _perfect_|

These subjective impressions are backed by the respective browsers performance measures. Average and minimum FPS is the lowest in 66 and 67dev and almost twice as bad as nightly for me. No idea how helpful these reports are - if this is getting out of hand or there are better ways for me to contribute, let me know.

Also I am not quite sure how to conveniently try an older Firefox version.

We have a semi-automatic tool that lets you try old Firefox versions to see which one fixed/broke something: https://mozilla.github.io/mozregression/ This might help understand what improved things in Nightly too (my guess would be the enabling of WebRenderer).

There are two things here now that are at risk of being conflated, if they're not the same: the tendency of your drivers to crash on lichess and the scrolling performance in the analysis board. I assume that if you test the scrolling and it is extremely slow, you don't necessarily get those "device reset" "TDR is detected." messages in about:support? I'm talking about release Firefox here as you indicated that you don't see those GPU driver crash messages in Nightly.

Specifically, referring to your original post, 2) and especially 3) seem like the cause of broken drivers and that's likely not something we (Mozilla) nor lichess can do much about, but 1) might be a different problem.

I tested the two games you posted. Scrolling through them is very smooth in both cases for me. But when hovering the analysis graph, Firefox does clearly lag while Chrome does not. I'll file a bug against Firefox for this. (https://bugzilla.mozilla.org/show_bug.cgi?id=1550511) That said, this existed on the old lichess site I believe, so it doesn't explain why the scrolling is slow for you.

Thanks!

  1. Tried mozregression (cool tool) to no avail since I cannot reliably reproduce the crash. It gave me a chance to quickly test older versions though.
  2. 2.

I assume that if you test the scrolling and it is extremely slow, you don't necessarily get those "device reset" "TDR is detected." messages in about:support

Correct!

  1. Nightly also crashed once now (but with other FF running in the background - so maybe one killed the other via the driver?)

  2. All Firefox builds starting from 58 (not tested releases prior to that) are very fast and responsive (including the lag while hovering!) _if_ you resize the UI in a manner so that the notation is beneath the board and not next to it. That is also why it was so fast when I used half of my external screen. It is as fast on my main screen if I resize it that way. So I assume that there is an issue with the responsive UI in that configuration that ulimately manages to kill the driver. So it might still be an issue with that but at least it seems we know what causes it on lichess.

Thanks, this is really useful for us too.

Analysis of the problem with the evaluation graph:
https://bugzilla.mozilla.org/show_bug.cgi?id=1550511#c5

The lichess v2 layout uses grid and flex, new CSS techniques that v1 likely did not use yet. The combination of nesting those with an SVG graph inside (the analysis graph) hits a bad case: any change in the SVG (by hovering the cursor and highlighting a different move) causes the layout of the entire page to be recomputed because it's not understood that the SVG isn't changing in size itself.

@ornicar @niklasf The above comment contains an optimization suggestion that should make the site faster in all browsers. If this is not fixable on lichess' side because it's too deep in Highcharts, I'll try talking to the Highcharts people as well.

Looks like we can address the bad case in Firefox, but it will obviously take quite a while before any versions with these optimizations roll out.

@ana-ka Something interesting to test now is to check if switching tabs from "Computer Analysis" to "Crosstable" or "FEN-PGN" makes the scrolling smooth. If it is still slow, can you perhaps help us capturing a profile? https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem

@gcp

@ana-ka Something interesting to test now is to check if switching tabs from "Computer Analysis" to "Crosstable" or "FEN-PGN" makes the scrolling smooth.

It is still very sluggish in both FF66 with my addons and dev/nightly without any.

If it is still slow, can you perhaps help us capturing a profile? https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem

Done. Here is what I did:

  1. Opened up game 2 from above
  2. Made sure, that I am in fullscreen/maximized window with notation next to the board
  3. Switched bottom selection to FEN & PGN
  4. With my cursor over the board, I scrolled through the moves as rapidly as possible from the starting position to the final one

Results:
FF66 with all my addons enabled
Nightly with all defaults

I used the default profiler values. If you need something else, let me know.

Quick update: Just tested my Linux install (Manjaro, Kernel 5.0.15-1) to verify that it is not Windows/driver specific. Scrolling performance is comparably bad there with the notation next to the board and silky smooth with the notation beneath. Maybe the reason why @gcp cannot reproduce is because he has a very powerful machine? I am on a 2 core i7 laptop with HT on.

When testing on Linux, Chrome is as smooth as on Windows, Firefox seems very slighty faster but that might be missleading. I just did a quick run with game 2 from above. It is definitely slower than Chrome. I am talking about the scrolling through the moves. The laggy hovering over the evaluation graph is another thing which of course happens on Linux, too. So in the end the driver crashes on Windows might be Intels or my fault. Also they appear to happen everywhere on lichess, not just with the analysis board open. I didn't notice them on any other site and I cannot reliably trigger them. Seems random to me. The bad performance though cannot be blamed on drivers or windows. That seems to be an issue with FF/lichess.

As usual - if you need any more info or if I can be of any assistance, let me know.

Maybe the reason why @gcp cannot reproduce is because he has a very powerful machine?

I tested this on a tablet (in landscape mode), scrolling is extremely smooth so I really can't reproduce it. In any case the profile should be enough to see what's going on even if it doesn't reproduce here.

(Just checking: this isn't because the engine is running doing infinite analysis, is it? Cfr. issue #5072)

(Just checking: this isn't because the engine is running doing infinite analysis, is it? Cfr. issue #5072)

Nope, local engine is off.

Edit: Drivers crash on other sites, too. Lichess seems to be especially bad though. This is getting a bit out of hand so maybe lets not talk about that anymore. At least not in this context. Maybe we can discuss this issue elsewhere - bugzilla or something. Really strange - this didn't happen weeks/months ago.

Is the latest Firefox doing any better?

Was this page helpful?
0 / 5 - 0 ratings