Sway: Automatic VRR management

Created on 6 Mar 2020 · 21Comments · Source: swaywm/sway

With https://github.com/swaywm/sway/pull/5063, users have a way to unconditionally enable VRR. However this can cause some flickering on some monitors. Flickering seems to happen on higher-end monitors which have a larger VRR range. It seems like big enough refresh rate variations will cause flickering.

I think there are two ways to fix this. Both options aren't necessarily exclusive.

Do like xf86-video-amdgpu: add a way for clients to opt-in

This would mean designing a Wayland protocol to let clients opt-in to VRR. Only clients with a mostly-fixed frame rate would opt-in. This excludes clients such as video players and web browsers, and includes games.

The compositor would only enable VRR if a client is fullscreen and has opted in.

The protocol should probably operate at the wl_surface level. We'll want to let Mesa opt-in if possible. Not sure how to let clients override Mesa's decision (this is also an issue with the current X11 implementation).

This is not great, because we miss some use-cases (power savings for web browsers, lowering the refresh rate to the video's frame rate for video players) and it relies on Mesa's VRR blacklist (I don't really like the blacklist).

Avoid flickering in the compositor

Extract the VRR range from the EDID
Add a configurable maximum refresh rate variation (perform experiment to find a good default value)
Delay page-flips so that we prevent the refresh rate from varying too much
- Make the page-flip happen later if a client commits a new buffer too early
- Force a page-flip if no client commits a new buffer

This is a little bit more involved. However this allows us to take advantage of VRR for power savings and video playback. Also this doesn't require any cooperation with clients or Mesa, the compositor can do everything on its own.

enhancement proposal

Source

emersion

❤11

Most helpful comment

Honestly, I think the second proposal is the only one making sense to me. And rather than having the compositor doing that, the kernel should be doing it since it is as easy to implement the logic there as in the compositor.

As for "Force a page-flip if no client commits a new buffer", why is that needed? The HW or driver should do that transparently for you anyway, otherwise they are broken because the KMS interface does not mandate compositors to push new frames constantly.

mupuf on 9 Mar 2020

👍3

All 21 comments

Does enabling/disabling adaptive sync work via for_window criteria (as a temporary measure, I agree with you that a blacklist is probably not the way to go)?

for_window [app_id="firefox"] adaptive_sync off?

aidanharris on 7 Mar 2020

The second certainly does seem like the more preferable option, if it works in practice.

Not sure how to let clients override Mesa's decision

Probably would need to be an EGL and Vulkan extension, if Mesa wanted this at all.

ascent12 on 7 Mar 2020

Does enabling/disabling adaptive sync work via for_window criteria (as a temporary measure, I agree with you that a blacklist is probably not the way to go)?

This would only be desirable if we go for the first option. In this case, indeed having a way for users to override adaptive sync for each app would be nice.

emersion on 8 Mar 2020

mupuf on 9 Mar 2020

👍3

And rather than having the compositor doing that, the kernel should be doing it since it is as easy to implement the logic there as in the compositor.

I asked about this last XDC, and kernel devs wanted user-space to do it. I'll try asking again to see if that's the consensus.

Doing automatic page-flips at arbitrary times in the kernel could surprise user-space. It might just be fine, not sure.

As for "Force a page-flip if no client commits a new buffer", why is that needed?

Let's say a full-screen client renders at 60FPS, and then suddenly decides to stop completely. To avoid flicker, we need to perform page-flips to slowly ramp-down to the lowest possible refresh rate.

emersion on 10 Mar 2020

All of what I am saying is assuming what I would call a sane HW design: the HW will repeat a frame if the kernel asks to (write to a register) or if vblank_max_ns is exceeded (this value being set by the kernel based on the EDID). This enables soft-queuing of the frame and not having to wait for the exact time to push the 'trigger' bit.

Doing automatic page-flips at arbitrary times in the kernel could surprise user-space. It might just be fine, not sure.

Automatic flips should happen only to repeat a frame that has already been presented. This is something the userspace gets no feedback of AFAIK, and this is what the HW should be doing anyway (once vblank has been stretched to the max and no new frame came in).

Let's say a full-screen client renders at 60FPS, and then suddenly decides to stop completely. To avoid flicker, we need to perform page-flips to slowly ramp-down to the lowest possible refresh rate.

Right, but the kernel has all the information needed to do the ramp down on its own, just set vblank_max_ns to min(edid_vblank_max_ns, vblank_max_ns_cur + vblank_max_change_ns). A similar solution can be used for ramping up ;)

Now, I think the ramping up and down could be improved by providing expected flip times for a frame (and maybe it expiration date?) so as to let the kernel select the frame closest to the actual flip time.

mupuf on 10 Mar 2020

👍1

One more thought: We should add VRR_REFRESH_LIMIT (maximum change per frame), VRR_REFRESH_MIN (minimum refresh speed, clamped by the EDID mode) and VRR_REFRESH_MAX (maximum refresh speed, clamped by the EDID mode) as connector properties.

This way, the userspace would be in full control without needing to keep the CPU busy to guarantee pushing the next frame at the exact right time.

mupuf on 10 Mar 2020

https://lists.freedesktop.org/archives/dri-devel/2020-March/258940.html

emersion on 11 Mar 2020

Hi,
I've been using adaptive sync for a few years and would like to add my thoughts from a user's perspective. I used it on Windows (which I don't have anymore) years ago and then only tried it out on X once support landed, so my experience on those platforms might not be up to date.

First of all, to what's been discussed on the list: My monitor has a VRR range from 40 up to 144 Hz and I don't have flickering with the current Sway implementation. It can also show the refresh rate in a HUD, so it's easy to see and if you need something to test, you could ping me.

About having A-sync enabled on the "desktop":

Nobody should be tortured with a 30 or 40 Hz mouse cursor
The mouse cursor shouldn't disturb playback of any VRR content

E.g. if I play a 60 FPS video in the background and move my mouse, Sway renders at 144 Hz and as 144%60>0, the video begins to stutter. The same is true for games with an overlay cursor or anything else. Having the refresh rate follow the mouse cursor unconditionally basically breaks the benefit of what VRR wants to achieve: smooth playback of content. But if it doesn't, we have problem 1 again. That's why I think it makes more sense for full screen or needs more complex rules. I used to think it made sense to always enable VRR years ago, but changed my mind here.

Next, I consider flickering a hardware issue. After all, the display's EDID says something like "I can run from 35 to 144 Hz" and not "I can run 35-144 but please just change 10 Hz at a time". I've used 3 adaptive sync compatible setups with supported ranges from 35-90, 56-144 and 40-144 Hz. None of them flickered when going from min to max and back quickly. There seems to be a reason why some LCDs would support a constant 30 Hz rate without flickering but their adaptive sync range still starts above 40 Hz. I could currently buy only 13 different models that claim their minimum AS rate is 30 Hz. 156 Models with 40 Hz minimum and 474 models with 48 Hz min.

My theory is that most screens aren't actually capable of going through their full range but they still advertise it because a) AMD has a feature called LFC on Windows (see below), and b) because A-sync is only enabled in specific cases (full screen, whitelisted) on Windows and xf86-video-amdgpu, which mitigates those issues.

Force a page-flip if no client commits a new buffer

This is what LFC (low framerate compensation) on AMD/Windows does. LFC doubles/triples/... frames in order to stay within adaptive sync ranges. E.g. if the display supports 56-144 Hz a-sync range, but framerate is 40 FPS, then 80 Hz would be sent to the screen, which is obviously better than showing unsynced 40 FPS on a 56 Hz rate.
There is also some inertia in LFC, it tends to enable frame doubling early and disable it late, so it kind of prefers higher refresh rates.

It might be worth asking AMD if they plan LFC in amdgpu as well, maybe flickering issues would be resolved then without further changes. LFC is a useful feature after all, because it extends the usable VRR range so IMHO that'd make more sense than the compositor working around display hardware issues, but that's really just an opinion.
Users could also build their own EDID with a higher minimum rate which is tested and proven working and override it at boot/module load time. I've done this with VRR ranges and it works without any software changes.

One more thing to the current sway VRR state: I've noticed Firefox "pollutes" the refresh rate across all workspaces of an output. So if it plays a video or animation, and I change to a different workspace, set another container to full screen or even have swaylock in the foreground, the display refreshes at FF's rate. I haven't noticed that with other programs, once they're invisible, they don't affect refresh rate anymore. Maybe a blacklist would make sense after all.

grmat on 14 Mar 2020

👍1

Harry agreeds:

There are no slew rate registers in current AMD HW but I also think
slewing would best be done in kernel space, either directly in HW by HW
that supports it or in SW for HW that doesn't support it.

The mouse cursor shouldn't disturb playback of any VRR content

Harry has pointed this out too.

For fullscreen apps, it's pretty clear that we want to wait for the client to submit a frame. If we show multiple surfaces (regular desktop), it's less clear. We can't predict whether the client will submit a frame or not.

This is what LFC (low framerate compensation) on AMD/Windows does. LFC doubles/triples/... frames in order to stay within adaptive sync ranges. E.g. if the display supports 56-144 Hz a-sync range, but framerate is 40 FPS, then 80 Hz would be sent to the screen, which is obviously better than showing unsynced 40 FPS on a 56 Hz rate.

Wayland can't predict the refresh rate of clients. We'd probably need a protocol to let clients queue multiple frames and attach a timestamp ("don't display this frame prior to this timestamp").

One more thing to the current sway VRR state: I've noticed Firefox "pollutes" the refresh rate across all workspaces of an output. So if it plays a video or animation, and I change to a different workspace, set another container to full screen or even have swaylock in the foreground, the display refreshes at FF's rate. I haven't noticed that with other programs, once they're invisible, they don't affect refresh rate anymore. Maybe a blacklist would make sense after all.

Hm. Firefox requests frame callbacks even when it's not rendering. But we shouldn't send these frame callbacks if the window isn't visible, this might be a Sway bug here.

emersion on 18 Mar 2020

Wayland can't predict the refresh rate of clients. We'd probably need a protocol to let clients queue multiple frames and attach a timestamp

This would work for something steady like video but generally, when playing games or any other content that reacts to user input (even if it's just scrolling a web page) "pre-rendered" frames are advised against because it introduces significant lag for the benefit of a smoother playback. VRR is for smooth playback and I wouldn't want my compositor to introduce lag.

In my (unprofessional) opinion: the compositor shouldn't do this at all. It updates if there's something new and the kernel driver determines if the display needs an additional refresh or not. I mean, that's what they already do with the minimum refresh rate anyways. Also, I've just looked into the kernel and they already have LFC (I wasn't aware of that). I think that should be the solution to the flicker issue.

LFC also has the potential to add latency, e.g. if the driver decided to display the last frame again and the next frame comes in 1ms after that. Then this frame has to wait for e.g. ~6 ms (if 144 Hz is the upper bound), but that's still better than pre-rendering frames and show each frame with a delay of 1-2 full refreshes.

this might be a Sway bug here

Do you want me to open an issue for that? Need any more info from me?

grmat on 18 Mar 2020

This is what LFC (low framerate compensation) on AMD/Windows does. LFC doubles/triples/... frames in order to stay within adaptive sync ranges. E.g. if the display supports 56-144 Hz a-sync range, but framerate is 40 FPS, then 80 Hz would be sent to the screen, which is obviously better than showing unsynced 40 FPS on a 56 Hz rate.

Wayland can't predict the refresh rate of clients. We'd probably need a protocol to let clients queue multiple frames and attach a timestamp ("don't display this frame prior to this timestamp").

Maybe a protocol for clients to tell the compositor the predicted/preferred framerate would be better, and use that information set up frame doubling if necessary. Or detect a stable framerate over a certain number of frames and use that as the basis for future frame doubling and restart the calculation when a frame outside of the range arrives.

progandy on 18 Mar 2020

It might be worth asking AMD if they plan LFC in amdgpu as well, maybe flickering issues would be resolved then without further changes. LFC is a useful feature after all, because it extends the usable VRR range so IMHO that'd make more sense than the compositor working around display hardware issues, but that's really just an opinion.

LFC is already supported in amdgpu (the kernel driver), and has been for some time. I've used it without problems in X, and I just tested it with Sway (and it worked correctly). It did have bugs in the past (which resulted in flickering) but they should be fixed by now.

aqxa1 on 2 Apr 2020

I just want to add that I get periodic stuttering with VRRTest and Retroarch with adaptive_sync enabled (that doesn't occur when using native X). With VRRTest the stutter seems to occur every 8 seconds or so, with a 144hz refresh rate, target fps of 139, actual fps of 129. The discrepancies between target and actual don't seem to be relevant as they are similar with X as well, however.

If this should be considered a separate bug, I'm happy to open a new issue.

aqxa1 on 4 Apr 2020

Started testing out VRR last night and here are some remarks.

Using 5700XT, Mesa-git and 5.6 kernel, displays are Asus MG278 and LG27UK650-W. Tests run on workspace with only wallpaper, also the only workspace assigned to the screen.

The Asus has FPS counter I've observed VRR is working on it. Refresh rate drops to 40Hz immidiately after movement on the screen and after a couple of seconds rises to around 62Hz. Is there a reason for this behaviour, why is it not keeping down to 40Hz? When moving cursor around the screen to keep refresh rate at 144Hz there are black frames every five seconds or so, this happens also when gaming.

The LG does not have FPS counter so tests are a bit limited. Cursor movement is noticeably uneven compared to non-VRR, the display has extended and basic FreeSync modes, both have the same problem. Also tested with browser and scrolling is very unpleasant compared to 60Hz non-VRR. No black frames on this display.

asiantuntija on 11 Apr 2020

@grmat If this issue persists on sway/wlroots master, please open a separate issue. It's hard to track issues if they only exist in comments.

kennylevinsen on 20 Apr 2020

@YaLTeR is implementing this for GNOME here: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620

emersion on 11 Dec 2020

@YaLTeR is implementing this for GNOME here: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620

AFAICS this is frame_callback and compositing time delaying we already have within sway, I don't see VRR specific parts in this MR.

Emantor on 11 Dec 2020

The innovation is the automatic delay estimation:

We don't know how long compositing will take, and we also don't know how
late we can swap buffers to still make it for the presentation. So we
estimate these values.

The estimate for the dispatch duration is a maximum of the last 16
frames' dispatch durations. Usually even taking just the last frame's
dispatch duration as the estimate works well enough, but I found that
screen-capturing with OBS Studio increases dispatch duration variability
enough to cause frequent missed frames when using that method. Taking a
maximum of the last 16 frames smoothes out this variability.

To estimate the buffer swap deadline we keep track of the flip time and
check if the presentation succeeded or was delayed. If it succeeded, a
minimum is taken (this deadline was good enough) and if it was delayed a
maximum is taken (this deadline was bad enough). In the stable case (all
presentations take exactly the same time and always succeed) such an
algorithm would never change the estimate (while we want it as low as
possible), however in real testing it is lowered to a good value almost
immediately as the result of duration variations and delayed frames due
to computations on Shell startup.

Compositing duration is naturally quite variable and the estimates
aren't perfect. To take this into account, an additional constant 2 ms
is added to the max render time.

How does it perform in practice? On my desktop with 144 Hz monitors I
get a max render time of 4–5 ms instead of the default 4.9 ms (I had
1 ms manually configured in sway) and on my laptop with a 60 Hz screen I
get a max render time of 4.8–5.5 ms instead of the default 14.7 ms (I
had 5–6 ms manually configured in sway). Weston [1] went with a 7 ms
default.

The main downside is that if there's a sudden heavy batch of work in the
compositing, which would've made it in default 14.7 ms, but doesn't make
it in reduced 6 ms, there is a delayed frame which would otherwise not
be there. Arguably, this happens rarely enough to be a good trade-off
for reduced latency. One possible solution is a "next frame is expected
to be heavy" function which manually increases max render time for the
next frame. This would avoid this single dropped frame at the start of
complex animations.