Feature request summary
Per @addyosmani's request, I'm following up with a feature request after today's discussion here, https://twitter.com/scottjehl/status/1278372113716576258 which duplicates Léonie Watson's earlier tweet
It would be useful to know when Assistive Tech is able to interact with and communicate page content so that that timing can be factored into existing metrics that represent page "readiness". Measurements that note or approximate the time when the accessibility tree is built would be interesting to see and understand, and as Léonie's tweet mentions, the time of the first accessibility API query as well.
It'd also be interesting to know which existing metrics are not relevant to Assistive Tech (for example, is FCP happening before the accessibility tree is created due to blocking JS?). Or, in an SSR scenario, is the accessibility tree created initially one way and later "hydrated" into a much different state?
What is the motivation or use case for changing this?
To see metrics that measure when a page is usable to all users, including those using AT.
How is this beneficial to Lighthouse?
Accessibility is part of Lighthouse scoring criteria.
One thing that will be interesting is the way different browsers build the Accessibility Tree (AcT) and then handle Acc API calls.
Chrome is a good illustration of the difference these things can make. As I understand it, Chrome used to build the AcT in the content process, then proxy Acc API calls in from the application process. Now it builds the AcT in the content process then caches the entire thing within the application process, where it can be queried via the Acc API.
In the first incarnation there was a performance hit on every Acc API call, but in the current incarnation although there can be a noticeable performance hit whilst the initial AcT is being built and cached (notably on large and/or JS heavy pages) once the cached AcT is available everything thereafter is pretty performant.
Firefox on the other hand proxies Acc API calls from the application process to the AcT in the content process, but it uses intelligent caching to bundle related information into the information that's returned, so the overall number and frequency of API calls is reduced.
Performance can sometimes seem sluggish with an AT in Firefox, but usually only with large and/or JS heavy pages.
The AT itself is also an important part of the puzzle, though likely that will emerge as a measurable and comparable metric if we're able to measure the first time to AcT interaction/first Acc API call. NVDA for example is considerably more performant in Firefox than Jaws is, and that has as much to do with the screen reader itself as the browser.
Ultimately, I'd love to see a cross-browser metric (or a few) around the AT for reasons @LJWatson points out—the differences between how different engines create the AT and the implications for how we build are anything but widely known and understood.
But, that's a long and different process. At least having some information available in Lighthouse would start to surface the issue a bit more and provide a starting point.
It's the whole chicken vs an egg problem: I suspect cross-browser metrics will be super helpful here and will highlight opportunities for improvement, but until we have _something_ like this exposed somewhere, it's hard to know to what extent. Lighthouse feels like it could be a good place to start.
Just a note that there's a side thread over at webpagetest where Pat has some feedback that could be useful here. https://github.com/WPO-Foundation/webpagetest/issues/1369
cc @anniesullie we'd love to hear what the Chrome Speed Metrics team is working on in this area if there are any plans to do this in the future :)
The AcT (Accessibility Tree) is a very defined thing and we have decent observability on it. Not as good as the DOM tree, but pretty good. I like the idea of getting a Time To A11y Tree First Built metric. Though keep in mind the tree will be changing as scripts load in and content is added to the page.
Adding another possibility to the brainstorm, I can imagine a metric that considers the how quickly the AcT settles into its "final" position. (defining "final" TBD, much like "fully loaded") It could be computed much like Speed Index, assuming there's a decent calculation for determining tree similarity.
A note on the instrumentation that currently exists:
puppeteer actually has some great work, culminating in the accessibility.snapshot() method. Behind the scenes, it uses Accessibility.getFullAXTree from the devtools protocol, plus some more work to flesh out a solid picture of the AcT.
The protocol (and thus pptr) don't have events that indicate "AccessibilityTreeChanged", so right now in order to understand how it changes, it'd need to be polled. Hopefully what @LJWatson said about the perf hit indicates that polling would be decently performant. Regardless, we're in a lab scenario so no user perceivable impact anyhow. :) If this exploration works out, perhaps some "change" events could be added to the protocol so the approach could be optimized a bit.
I think some prototyping here is the next best step.
With some straightforward puppeteer scripting, someone can make a basic _Time To First AcT_ metric and also explore the _AcT Speed Index_ idea. Once built, there's always a good amount of metric validation necessary to understand how well the numbers we get track the intent of the metric. Testing on a variety of webpages/webapps is key here.
I'm happy to give some guidance if anyone has questions about the protocol underpinnings here.
It could be computed much like Speed Index, assuming there's a decent calculation for determining tree similarity.
This is the part that gives me the most pause. This won't be nearly as simple as "sum all the color values". Some cursory googling for "tree similarity algorithms" wasn't encouraging.
Do you think something TTI-like would be too noisy? I'm thinking of some kind of settling metric like "time until N seconds between accessibility tree changes"?
It could be prototyped for different values of N and run across a large number of sites multiple times to assess stability.
This is a really great reply, @paulirish . Thanks for considering.
I think any visibility in Lighthouse for accessibility tree timing would be great, since it'd help spread awareness of how architectural decisions in page delivery impact AT users' experience. I particularly like the idea of using a "settled" state to represent a sort of "Accessible-Ready" metric in Lighthouse, assuming that represents when things become reliable to use, but I'm not sure if there are parallels here to say, visual rendering, where there are actually earlier moments that are meaningful to AT than when the whole thing is ready. I'll defer to the experts on the particulars there. Excited for progress here.
Do you think something TTI-like would be too noisy? I'm thinking of some kind of settling metric like "time until N seconds between accessibility tree changes"?
I think that would devolve in some common cases regarding carousels.
I think that would devolve in some common cases regarding carousels.
Speed Index has this same problem but benefits from the fact that "Visual complete idle" isn't used for the end time. I wonder if TTI itself could be used as the "AcT Complete" snapshot and then the tree similarity magic could walk back from there?
EDIT: Of course we would need to validate that TTI is actually later than AcT complete :)
Most helpful comment
One thing that will be interesting is the way different browsers build the Accessibility Tree (AcT) and then handle Acc API calls.
Chrome is a good illustration of the difference these things can make. As I understand it, Chrome used to build the AcT in the content process, then proxy Acc API calls in from the application process. Now it builds the AcT in the content process then caches the entire thing within the application process, where it can be queried via the Acc API.
In the first incarnation there was a performance hit on every Acc API call, but in the current incarnation although there can be a noticeable performance hit whilst the initial AcT is being built and cached (notably on large and/or JS heavy pages) once the cached AcT is available everything thereafter is pretty performant.
Firefox on the other hand proxies Acc API calls from the application process to the AcT in the content process, but it uses intelligent caching to bundle related information into the information that's returned, so the overall number and frequency of API calls is reduced.
Performance can sometimes seem sluggish with an AT in Firefox, but usually only with large and/or JS heavy pages.
The AT itself is also an important part of the puzzle, though likely that will emerge as a measurable and comparable metric if we're able to measure the first time to AcT interaction/first Acc API call. NVDA for example is considerably more performant in Firefox than Jaws is, and that has as much to do with the screen reader itself as the browser.