Lighthouse: Ideas for trimming the LHR

Created on 5 Feb 2019  路  11Comments  路  Source: GoogleChrome/lighthouse

The LHR is getting big, often 1MB or more. Some unnecessary things we could trim:

  • [ ] a11y per-node explanation are on every details item, often repeated verbatim on multiple nodes, and we don't use them in the report
  • [ ] tap-targets includes all possible bad targets. We should limit the number.
  • [ ] full page screenshot could be recaptured with lower quality (or reduced height) https://github.com/GoogleChrome/lighthouse/pull/11689

Feel free to add to the list.

P1.5

Most helpful comment

Little hack for a sunburst viz of an LHR's size....

  1. Go to https://vasturiano.github.io/sunburst-chart/example/large-data/
  2. Copy an LHR into your clipboard.
  3. lhr = <paste>
  4. run this in console or snippets:
isPlainObject = function (obj) {
    return Object.prototype.toString.call(obj) === '[object Object]';
};

function calcObjSize(obj) {
 // recurse if array or object.
 if (Array.isArray(obj) || isPlainObject(obj)) {
    return Object.entries(obj).map(([key, value]) => {
     const node = {name: key};
     const nodeValue = calcObjSize(value, {});
     node[typeof nodeValue === 'number' ? 'value' : 'children'] = nodeValue;
     return node;
   });
 } else {
   return JSON.stringify(obj, null, 2).length;
 }
}

data = {
  children: calcObjSize(lhr),
  name: 'lhr',
};


document.querySelector('#chart').innerHTML = '';

    Sunburst()
      .data(data)
      .color(d => color(d.name))
      .minSliceAngle(.4)
      .showLabels(false)
      .tooltipContent((d, node) => `Size: <i>${node.value}</i>`)
    (document.getElementById('chart'));

example:

image

All 11 comments

Timings are ~9KB

image

We could fix the numbers so we don't have 100000 digits pass the decimal :)
"measure" -> "m" (actually, is that property even necessary?)

Could we embed the i18n data in the report renderer, or would that get complicated real quick?

  • a11y per-node explanation are on every details item, often repeated verbatim on multiple nodes, and we don't use them in the report

This was an explicit bug report when we removed it the first time around and had to add it back because for some nodes the information is specific to that node (color contrast for example). https://github.com/GoogleChrome/lighthouse/issues/5402, we should be careful about removing it.

Could we embed the i18n data in the report renderer, or would that get complicated real quick?

There's enough blubber elsewhere in the LHR to remove that I'd want to do this last. No need to hamper the beautiful simplicity and flexibility of i18n yet IMO :D

Also worth clarifying here what we're worried about. Is it API transfer sizes? Is it storage size on disk? In a database? JSON parse times? Biggest gzipped wins will probably be different from biggest uncompressed wins, and some strategies might reduce LHR size but not really help certain use cases.

Example: I can think of many ways in which we could greatly shrink total API bytes by splitting the LHR into its dynamic and static components just for transport.

This was an explicit bug report when we removed it the first time around and had to add it back because for some nodes the information is specific to that node (color contrast for example).

arg, I forgot about that. I wonder if there's some deduping we could do then...some of the strings are quite long and occur multiple times.

Also worth clarifying here what we're worried about. Is it API transfer sizes? Is it storage size on disk? In a database? JSON parse times?

I think all the above. gzip is certainly worth keeping in mind, but we also have a certain responsibility to people saving these somewhere :)

Speaking of which, if you do --output json, we JSON.stringify(lhr, null, 2) by default, so a lot of that size is whitespace. We might consider not doing that (people can always beautify it themselves if a human needs to see it) or doing some middle-ground JSON pretty print like we do with saved traces (one line per trace event instead of one line per trace event property + braces)

Little hack for a sunburst viz of an LHR's size....

  1. Go to https://vasturiano.github.io/sunburst-chart/example/large-data/
  2. Copy an LHR into your clipboard.
  3. lhr = <paste>
  4. run this in console or snippets:
isPlainObject = function (obj) {
    return Object.prototype.toString.call(obj) === '[object Object]';
};

function calcObjSize(obj) {
 // recurse if array or object.
 if (Array.isArray(obj) || isPlainObject(obj)) {
    return Object.entries(obj).map(([key, value]) => {
     const node = {name: key};
     const nodeValue = calcObjSize(value, {});
     node[typeof nodeValue === 'number' ? 'value' : 'children'] = nodeValue;
     return node;
   });
 } else {
   return JSON.stringify(obj, null, 2).length;
 }
}

data = {
  children: calcObjSize(lhr),
  name: 'lhr',
};


document.querySelector('#chart').innerHTML = '';

    Sunburst()
      .data(data)
      .color(d => color(d.name))
      .minSliceAngle(.4)
      .showLabels(false)
      .tooltipContent((d, node) => `Size: <i>${node.value}</i>`)
    (document.getElementById('chart'));

example:

image

Nice!! Where does partSizes come from though? I'm getting

VM50:20 Uncaught ReferenceError: partSizes is not defined
    at <anonymous>:20:13

Oh it was renamed calcObjSize 馃憤

I'm seeing images + the diagnostic hidden audits taking up ~75+% of the size. Maybe we should focus on a diagnostic audit solution and image deduping?

diagnostic hidden audits

from a cnn one I was looking at the network-requests.js was huge, but almost entirely from their gigantic URLs (and having 200 of them).

I was thinking we could stop including query strings for that audit...or enough of the query string that each one is still unique. In many cases they're ad URLs, so they likely aren't available for/worth tracking down after the fact anyways.

what's so bad about big LHRs anyway?

Is it API transfer sizes? Is it storage size on disk? In a database? JSON parse times?
I think all the above

To Paul's point, I'm not sure I buy the argument that all of those things are important :)

what's so bad about big LHRs anyway?

Let's close in favor of more specific, future issues, which I'll bet you will start happening quickly as lightbrary, lighthouse-ci, and/or more web.dev history spin up and somebody has to start looking at disk quota :P

Was this page helpful?
0 / 5 - 0 ratings