Lighthouse: Fatal trace logic error - expected start event, got X

Created on 28 Mar 2019  路  27Comments  路  Source: GoogleChrome/lighthouse

I attached the assets from a LR run that resulted in this error message.

assets.txt

I'd provided the lhr too, but it's on a remote machine, too big and scp keeps crashing (side note, what???)

Some notes:

  • could not get this to happen locally, neither via node lighthouse nor a local WRS. Only seems to happen in production LR environments
  • provided assets was from running against bloomberg.com. @exterkamp has a whole list of of other URLs that it happens on too.
  • does not happen in prod LR
  • happens on ~5% of requests in canary LR

Related issues

7691

needs-priority

Most helpful comment

Hi, I am also getting this error when I run the Lighthouse test from the CI. It never happens when I run the test locally.
This error appears in the Diagnostics of performance metrics for:

  1. Reduce JavaScript execution time
  2. Minimize main-thread work

Is there a work-around I can put in place?

All 27 comments

The issue here is that there's a v8.callFunction child event that is trying to be larger than its V8.Execute parent. It's only by 1 microsecond, so maybe if we wanted to be forgiving we could try to adjust event end times within some allowable threshold?

that definitely seem weird, you would think the trace system wouldn't allow let that be possible (logging an event isn't responsible for its own timing, right?). But I also vaguely remember dealing with something like this back when we were figuring out timestamp sorting.

@paulirish is trace viewer flexible in this way?

when we were figuring out timestamp sorting.

Yeah I remember having trouble there, but that was when they were equal though. This time it is actually 1 microsecond after its parent should have ended.

but that was when they were equal though.

Ah, I was thinking about when we were looking at tts, since in theory if two things are in the same thread their tts should put them in the same order that ts does, but somehow it turns out that's not always true (or wasn't at the time).

Even less of an analogous situation :) but I think I remember trace viewer (or devtools timeline model?) just rolling with it since it sometimes happens, so I'm curious if it does the same for this.

The issue here is that there's a v8.callFunction child event that is trying to be larger than its V8.Execute parent. It's only by 1 microsecond, so maybe if we wanted to be forgiving we could try to adjust event end times within some allowable threshold?

Could you point out the relevant trace events here? My first time looking at this trace and dunno how to find this myself

Could you point out the relevant trace events here? My first time looking at this trace and dunno how to find this myself

I popped the trace into main-thread-tasks-test.js and put a debugger on the throw line :)

I deleted the setup but can split it back out again if you need the timestamps?

I deleted the setup but can split it back out again if you need the timestamps?

i gots it

image

just the good parts:

{
"traceEvents": [
  {"pid":25,"tid":25,"ts":0,"ph":"M","cat":"__metadata","name":"thread_name","args":{"name":"CrRendererMain"}},
  {"pid":2,"tid":2,"ts":5369216088,"ph":"I","cat":"disabled-by-default-devtools.timeline","name":"TracingStartedInBrowser","s":"t","tts":126876,"args":{"data":{"frameTreeNodeId":1,"persistentIds":true,"frames":[{"frame":"CC0F3EE5A17C1A7B522189106CB19588","url":"about:blank","name":"","processId":25}]}}},
  {"pid":25,"tid":25,"ts":5369233648,"ph":"R","cat":"blink.user_timing","name":"navigationStart","tts":553722,"args":{"frame":"CC0F3EE5A17C1A7B522189106CB19588","data":{"documentLoaderURL":"https://www.bloomberg.com/","isLoadingMainFrame":true,"navigationId":"8E68D13D5E2F56F4F5DC6A966190ED1A"}}},
  {"pid":25,"tid":25,"ts":5369487132,"ph":"R","cat":"loading,rail,devtools.timeline","name":"firstPaint","tts":780878,"args":{"frame":"CC0F3EE5A17C1A7B522189106CB19588","data":{"navigationId":"8E68D13D5E2F56F4F5DC6A966190ED1A"}}},
  {"pid":25,"tid":25,"ts":5369487134,"ph":"R","cat":"loading,rail,devtools.timeline","name":"firstContentfulPaint","tts":780981,"args":{"frame":"CC0F3EE5A17C1A7B522189106CB19588","data":{"navigationId":"8E68D13D5E2F56F4F5DC6A966190ED1A"}}},
  {"pid":25,"tid":25,"ts":5370649595,"ph":"B","cat":"v8","name":"V8.Execute","tts":1298190,"args":{},"meta":"current task.parent"},
  {"pid":25,"tid":25,"ts":5370649600,"ph":"E","cat":"v8","name":"V8.Execute","tts":1298194,"args":{"runtime-call-stats":{}},"meta":"event"},
  {"pid":25,"tid":25,"ts":5370649595,"ph":"X","cat":"v8","name":"v8.callFunction","dur":6,"tdur":6,"tts":1298189,"args":{},"meta":"current task"}
]}

The surrounding v8 events might be relevant, but the above does give the same error.

diff --git a/lighthouse-core/computed/main-thread-tasks.js b/lighthouse-core/computed/main-thread-tasks.js
index 8efb0f32..207f436e 100644
--- a/lighthouse-core/computed/main-thread-tasks.js
+++ b/lighthouse-core/computed/main-thread-tasks.js
@@ -124,6 +124,9 @@ class MainThreadTasks {
         tasks.push(newTask);
         currentTask = newTask;
       } else {
+        while (currentTask.parent && currentTask.event.ph === 'X') {
+          currentTask = currentTask.parent;
+        }
         if (currentTask.event.ph !== 'B') {
           throw new Error(
             `Fatal trace logic error - expected start event, got ${currentTask.event.ph}`);

@patrickhulce does this make any sense? passed all the tests + error doesn't happen

Kinda, but if it's definitely a fatal trace logic error for trace events to overlap like that. Later logic will generate nonsensical numbers if there are two events executing at once like this or the events aren't directly nested within their parents. We would have to force the X event to end at the timestamp of the B event if we go this route.

We should definitely add a test for this case too so something would've caught it I can get on that :)

Hi, I am also getting this error when I run the Lighthouse test from the CI. It never happens when I run the test locally.
This error appears in the Diagnostics of performance metrics for:

  1. Reduce JavaScript execution time
  2. Minimize main-thread work

Is there a work-around I can put in place?

We are still seeing intermittent Fatal trace logic error - expected start event, got X when testing our React site from the Lighthouse 5.0.0 system on web.dev.

Thanks for letting us know @andystalick! 5.0 shipped with a few of the workarounds but was still relatively conservative. It sounds like we'll have to be a bit more aggressive in our recovery from Chrome errors to fully take care of this.

@patrickhulce Seeing this on LH 5.1 too

Thanks @benschwarz there was a lot more discussion around these cases in https://github.com/aslushnikov/tracium/issues/8 which fixes will be making their way over for our tracehouse refactor in the next few weeks.

In case this info helps: we at Sauce only experiencing this issue (Fatal trace logic error - expected start event, got X) when running a Lighthouse test on Windows using Chrome 75 (see performance.json), on MacOS it works flawless (see performance.json).

To give some context: we are using Puppeteer and Lighthouse to trace the browser when being automated via WebDriver and we experience the error above only for Chrome v75 on Windows. Looking into the flame chart of a simple trace (e.g. for a page like https://this-page-intentionally-left-blank.org/) we see more than just ordering issues, e.g.

Chrome v74 Windows (working_tracelog.txt):
Screen Shot 2019-06-19 at 3 32 41 PM

Chrome v75 Windows (borked_tracelog.txt):
Screen Shot 2019-06-19 at 3 33 05 PM

I see the same issue with when using Tracium to get all computed main thread tasks. If remove the following 3 events from the borked tracelog I get proper results back:

// ...
{"pid":4048,"tid":3652,"ts":67225000,"ph":"X","cat":"v8","name":"LocalWindowProxy::Initialize","dur":19000,"args":{"IsMainFrame":true}},
// ...
{"pid":4048,"tid":3652,"ts":67258000,"ph":"X","cat":"disabled-by-default-devtools.timeline","name":"RunTask","dur":7000,"args":{}},
// ...
{"pid":4048,"tid":3652,"ts":67276000,"ph":"X","cat":"disabled-by-default-devtools.timeline","name":"RunTask","dur":29000,"args":{}},
// ...

I will look more into this to see if I can make Lighthouse more forgiving without impacting the performance results.

Thanks @christian-bromann! This effort is mostly underway (#9174 #9230), but your info does confirm what we're seeing in that sometimes a few trace events are just irreconcilable and the only option is to drop some of them to get back to a sensible state.

So I've 'googled' the error that I'm getting from pagespeed insights and I've found myself onto this page. Is this the same error we would be getting on the page and if so is it just a waiting game before this gets resolved for the pagespeed insights site?

@cjboranp yes it's likely the same error and it's a waiting game before #9491 is merged and deployed to PSI.

@patrickhulce is there an effort to fix this in Chrome directly? I logged an issue in Chromium because this problem also messes up the performance tab in DevTools, e.g. here

@christian-bromann not that I'm aware of. You've already filed in the correct place though for it to be looked into.

We are also experiencing this issue.

For information, the problem is still present on version 5.2.0!
(screenshot)

@Thanaen the fixes for these issues are in 5.6.0, so that would be expected.

Oh, okay!
Sorry for the inconvenience!

We expect this has been fixed by #9785 and #9786. The relative silence since that release went out is reassuring too, so we'll go ahead and close this out but if anyone runs into this in >=6.0.0 and can provide a trace, we'd love to hear about it :)

Was this page helpful?
0 / 5 - 0 ratings