str:
Also shows up when scrolling through GitHub comments that contain code snippets.
Tested cnn on my configuration (Linux, Intel iGPU).
Got very slow rendering and scrolling, but not because of purple bars (clipping).
For me it's the grey bars (presumably, GPU_TAG_PRIM_BORDER_CORNER).
Contrary to FF's 30-50 ms of GPU time, Servo spends about 10ms there. Overall experience is much smoother, but still far from ideal. Note that the display is 4K, and the video is a basic "HD Graphics 520". More profiling is needed to locate the exact culprit.
One of the possible research directions to nail down the fill-rate issues is to re-use the already rendered layers (thus, doing the invalidation like FF currently does outside of WR).
I suggest investigating the issue fully before deciding that invalidation is the solution. Invalidation only handles a subset of cases: for example, it can only handle zooming by turning the content blurry. If the only way to get good performance here is to use invalidation, so be it, but I don't see a reason offhand why GPUs should struggle to render border corners.
He's a tracking issue for Gecko being worse than servo https://bugzilla.mozilla.org/show_bug.cgi?id=1391844
cnn-slow-gpu.zip Here's a single frame yaml recording of cnn from Gecko that runs slowish.
With this recording I get 20ms of GPU time on my MacBook Pro's GeForce GT 750M but if I switch to my integrated Intel Iris Pro it drops down to 2ms
But it still runs slow on the Intel gpu (27fps) so maybe the timing accounting is just different.
I'll do some profiling of CNN in both Servo and Gecko today, and report any findings here.
Running the yaml frame capture from @jrmuizel on my laptop @ 1920x1080, I see:
~2ms CPU time (should be better, but not awful).
~10ms GPU time (this seems really high).
The primitive count, batching, vertex counts etc all seem reasonable - again, they can be improved but they aren't too bad.
It appears all the GPU time is going into clearing targets - the grey bars below represent time spent clearing render targets:

If I log out all target clears in that capture, we can see:
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 0) TypedRect(1275脳1027 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 1) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 2) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 3) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 4) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 5) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 6) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 7) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 8) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 9) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 10) TypedRect(1103脳910 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 11) TypedRect(1455脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 12) TypedRect(1455脳943 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 13) TypedRect(1455脳928 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 14) TypedRect(1455脳928 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 15) TypedRect(1455脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 16) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 17) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 18) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 19) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 20) TypedRect(1456脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 21) TypedRect(1456脳1073 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 22) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 23) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 24) TypedRect(1756脳1044 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 25) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 26) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 27) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 28) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 29) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 30) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 31) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 32) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 33) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 34) TypedRect(1455脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 35) TypedRect(1456脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 36) TypedRect(1274脳971 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 37) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 38) TypedRect(1274脳971 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 39) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 40) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 41) TypedRect(1274脳1067 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 42) TypedRect(1274脳1067 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 43) TypedRect(1455脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 44) TypedRect(1141脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 45) TypedRect(1418脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 46) TypedRect(1455脳984 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 47) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 48) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 49) TypedRect(1441脳1037 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 50) TypedRect(1323脳909 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 51) TypedRect(1101脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 52) TypedRect(1455脳873 at (0,0))
draw_alpha_target clear_target_rect (TextureId { name: 427, target: 35866 }, 53) TypedRect(1336脳406 at (0,0))
draw_color_target clear_target None
So we have 53 A8 targets being allocated, each one ~1000x1000 resolution. The clears being issued to those 53 targets are where all the time is being spent.
I need to look into it further to see what the display list looks like, but it seems like it's just going to be a bug either in Gecko or WR, creating a huge number of clip masks when they aren't really needed.
Hopefully this should be relatively simple to fix up once I find the root cause of those (clearly redundant) clears...
@jrmuizel The perf difference you see between your nVidia and Intel GPUs is probably also explained by the results above - it's likely that the clear performance may be very different on those GPUs, and especially noticeable when clearing 53 large targets per frame :)
@jrmuizel Fix is here https://github.com/servo/webrender/pull/1613

Improvements with this fix:
These results seem much more reasonable for that test page :)
For reference, running the real CNN page on Servo, the CPU times in backend / compositor are 0.8ms and 1.0ms. This is probably a combination of (a) Servo missing some items that Gecko is drawing (b) Gecko providing sub-optimal display lists in some cases.
Running Gecko on cnn.com with the patch above still gives quite high GPU times at 4k resolution (~6-8ms on my machine), but it's easily scrolling at 60 fps at 4k.
The remaining work seems to be that there are a lot more draw calls coming from the gecko display lists. There's a couple of things to look at here. There is some plane splitting going on in the gecko implementation, which seems likely to be a bug - I can't see any reason why that should be happening (and it causes a lot of batch breaks). Second, there is some work I'm planning to do in WR over the next couple of weeks to unify several of the shaders, which should help a lot with CPU and GPU time, due to reduced batching overheads.
What do you mean when you say plane splitting?
The plane splitting that gets invoked when you have 3d transformed divs that require splitting to display correctly. I can't imagine there's anything on that page which actually requires it, I imagine we're accidentally triggering it in WR when we don't need to.
cc @metajack
This seems OK to me now - I see ~4ms GPU time when scrolling on CNN in Gecko now. Feel free to re-open if there are still issues.
Most helpful comment
@jrmuizel Fix is here https://github.com/servo/webrender/pull/1613
Improvements with this fix:
These results seem much more reasonable for that test page :)