startsWith(request.url)), where the redirect response does not return text/html as its mime type._More Information_
I'm not sure how common this for non-authenticated redirects, but in our case we are running LH on authenticated pages using cookie-based login (similar to this doc, but with Node API).
The redirect response when hitting the requestedUrl (prior to auth) does not return HTML, and so the content-type response header is empty. It then redirects to the authentication server, and finally back to the originally requested document this time with the correct mime type text/html.
LH reports error NOT_HTML even though document loaded in the final URL after redirecting is text/html.
I don't see any documentation on this, but according @brendankenny in #11430, it should be the final matching URL (that is an html document) when loading the page.
The error occurred starting in version 6.2.0 of LH from #11042.
I have a fix locally by using the _last_ matching HTML document (as opposed to the first) in findMainDocument(records, finalURL).
A naive one-line fix that works on my site in test is:
// should probably search backwards using loop
const mainResource = records.slice().reverse().find(request => finalURL.startsWith(request.url) &&
URL.equalWithExcludedFragments(request.url, finalURL));
As for the logic when finalUrl is not supplied, I think the comment on line 452 still applies.
LH doesn't fail when valid redirect eventually responds with HTML on a matching finalUrl.
Related issues
Thanks for filing @bfrost2893! A change to findMainDocument is definitely an ultimate solution, but is a sneakily massive change that affects many performance metrics (see https://github.com/GoogleChrome/lighthouse/issues/8984). We'll likely do that work in a major version bump in 8.0.0 far from now.
The more intermediate fix is better resolving the ultimate document just for this error detection similar to how our redirects audit catches client-side redirects.
Makes sense re: findMainDocument. It sounds like using the "final document" instead of the "main document" causes some perf metrics to appear much faster, ignoring the cost of redirects?
I tested locally and confirmed it fixes our issue with a new function NetworkAnalyzer.findFinalDocument() based off the redirects audit as you said. Is this the kind of fix you were thinking?
// network-analyzer.js
class NetworkAnalyzer {
// ...
static findFinalDocument(networkRecords, finalURL) {
const documentRequests = [];
let networkRecord = networkRecords.find(record => record.url === finalURL);
while (networkRecord) {
documentRequests.push(networkRecord);
networkRecord = networkRecord.redirectDestination;
}
if (!documentRequests.length) {
return undefined;
}
return documentRequests[documentRequests.length - 1];
}
}
and then in GatherRunner
// gather-runner.js
class GatherRunner {
// ...
static getPageLoadError(passContext, loadData, navigationError) {
const {networkRecords} = loadData;
/** @type {LH.Artifacts.NetworkRequest|undefined} */
let mainRecord;
try {
mainRecord = NetworkAnalyzer.findMainDocument(networkRecords, passContext.url);
} catch (_) {}
// Modification: retrieve `finalRecord` using `NetworkAnalyzer.findFinalDocument()`
let finalRecord;
try {
finalRecord = NetworkAnalyzer.findFinalDocument(networkRecords, passContext.url);
} catch (_) {}
const networkError = GatherRunner.getNetworkError(mainRecord);
const interstitialError = GatherRunner.getInterstitialError(mainRecord, networkRecords);
// Modification: pass `finalRecord` instead of `mainRecord`
const nonHtmlError = GatherRunner.getNonHtmlError(finalRecord);
// ... rest of function
}
}
Should getNetworkError() and getInterstitialError() ever expect finalRecord as opposed to mainRecord?
Is this the kind of fix you were thinking?
Roughly, yes :) interested in a PR if you've come this far already? 😃
Should getNetworkError() and getInterstitialError() ever expect finalRecord as opposed to mainRecord?
I can't currently think of a case where it would matter. Those errors are for fatal page errors so the redirect problem (the difference between your case and current findMainDocument) shouldn't apply.
@patrickhulce sorry for the delay. Figuring out the legal approval on my end before signing the CLA and submitting a contribution 🙂
Hey @patrickhulce, I should have a PR up soon. All good from my end to contribute and CLA has been signed.
Most helpful comment
Hey @patrickhulce, I should have a PR up soon. All good from my end to contribute and CLA has been signed.