Lighthouse: Incorrect main document used when retrieving page load errors

Created on 21 Oct 2020  Â·  5Comments  Â·  Source: GoogleChrome/lighthouse

Provide the steps to reproduce

  1. Run LH programmatically on page that redirects a similarly matching URL (e.g. same startsWith(request.url)), where the redirect response does not return text/html as its mime type.

_More Information_

I'm not sure how common this for non-authenticated redirects, but in our case we are running LH on authenticated pages using cookie-based login (similar to this doc, but with Node API).

The redirect response when hitting the requestedUrl (prior to auth) does not return HTML, and so the content-type response header is empty. It then redirects to the authentication server, and finally back to the originally requested document this time with the correct mime type text/html.

What is the current behavior?

LH reports error NOT_HTML even though document loaded in the final URL after redirecting is text/html.

I don't see any documentation on this, but according @brendankenny in #11430, it should be the final matching URL (that is an html document) when loading the page.

The error occurred starting in version 6.2.0 of LH from #11042.

I have a fix locally by using the _last_ matching HTML document (as opposed to the first) in findMainDocument(records, finalURL).

A naive one-line fix that works on my site in test is:

// should probably search backwards using loop
const mainResource = records.slice().reverse().find(request => finalURL.startsWith(request.url) &&
        URL.equalWithExcludedFragments(request.url, finalURL));

As for the logic when finalUrl is not supplied, I think the comment on line 452 still applies.

What is the expected behavior?

LH doesn't fail when valid redirect eventually responds with HTML on a matching finalUrl.

Environment Information

  • Affected Channels: Node
  • Lighthouse version: >=6.2.0
  • Chrome version: N/A
  • Node.js version: 12.9.2
  • Operating System: MacOS (and Docker Linux Container)

Related issues

  • #11042
  • #11430
P1.5 bug needs-discussion

Most helpful comment

Hey @patrickhulce, I should have a PR up soon. All good from my end to contribute and CLA has been signed.

All 5 comments

Thanks for filing @bfrost2893! A change to findMainDocument is definitely an ultimate solution, but is a sneakily massive change that affects many performance metrics (see https://github.com/GoogleChrome/lighthouse/issues/8984). We'll likely do that work in a major version bump in 8.0.0 far from now.

The more intermediate fix is better resolving the ultimate document just for this error detection similar to how our redirects audit catches client-side redirects.

Makes sense re: findMainDocument. It sounds like using the "final document" instead of the "main document" causes some perf metrics to appear much faster, ignoring the cost of redirects?

I tested locally and confirmed it fixes our issue with a new function NetworkAnalyzer.findFinalDocument() based off the redirects audit as you said. Is this the kind of fix you were thinking?

// network-analyzer.js

class NetworkAnalyzer {
  // ...

  static findFinalDocument(networkRecords, finalURL) {
    const documentRequests = [];
    let networkRecord = networkRecords.find(record => record.url === finalURL);
    while (networkRecord) {
      documentRequests.push(networkRecord);
      networkRecord = networkRecord.redirectDestination;
    }

    if (!documentRequests.length) {
      return undefined;
    }

    return documentRequests[documentRequests.length - 1];
  }
}

and then in GatherRunner

// gather-runner.js

class GatherRunner {
  // ...

  static getPageLoadError(passContext, loadData, navigationError) {
    const {networkRecords} = loadData;
    /** @type {LH.Artifacts.NetworkRequest|undefined} */
    let mainRecord;
    try {
      mainRecord = NetworkAnalyzer.findMainDocument(networkRecords, passContext.url);
    } catch (_) {}

    // Modification: retrieve `finalRecord` using `NetworkAnalyzer.findFinalDocument()`
    let finalRecord; 
    try {
      finalRecord = NetworkAnalyzer.findFinalDocument(networkRecords, passContext.url);
    } catch (_) {}

    const networkError = GatherRunner.getNetworkError(mainRecord);
    const interstitialError = GatherRunner.getInterstitialError(mainRecord, networkRecords);

    // Modification: pass `finalRecord` instead of `mainRecord`
    const nonHtmlError = GatherRunner.getNonHtmlError(finalRecord);

    // ... rest of function
  }
}

Should getNetworkError() and getInterstitialError() ever expect finalRecord as opposed to mainRecord?

Is this the kind of fix you were thinking?

Roughly, yes :) interested in a PR if you've come this far already? 😃

Should getNetworkError() and getInterstitialError() ever expect finalRecord as opposed to mainRecord?

I can't currently think of a case where it would matter. Those errors are for fatal page errors so the redirect problem (the difference between your case and current findMainDocument) shouldn't apply.

@patrickhulce sorry for the delay. Figuring out the legal approval on my end before signing the CLA and submitting a contribution 🙂

Hey @patrickhulce, I should have a PR up soon. All good from my end to contribute and CLA has been signed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mohanrohith picture mohanrohith  Â·  3Comments

shellscape picture shellscape  Â·  3Comments

devrsi0n picture devrsi0n  Â·  3Comments

nl-igor picture nl-igor  Â·  3Comments

sanprieto picture sanprieto  Â·  3Comments