Browser-laptop: browser unresponsive when `write: broken pipe` errors occur while pulling the latest state

Created on 31 Jul 2018  Â·  8Comments  Â·  Source: brave/browser-laptop

Description

While I was pulling the latest state using a new profile, Brave suddenly became unresponsive after attempting to switch into a different preference page. I tried clicking around a few times and noticed that the entire browser become unresponsive. I could still open a tab, but it was blank and nothing was being loaded. I took a look at the terminal and noticed the following:

unresponsive
unresponsive
unresponsive
unresponsive

After a few seconds, the terminal was spammed with A LOT of the following errors:

ERROR[07-31|15:12:27.976] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.977] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.978] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.978] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.979] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.980] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.980] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:27.997] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:28.002] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:28.003] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe
ERROR[07-31|15:12:28.007] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe

Once the errors finished spamming by terminal, Brave become responsive/usable and started pulling the latest blocks:

INFO [07-31|15:12:29.272] Imported new block headers count=2048 elapsed=2.526s    number=3575999 hash=983476…203135 ignored=0
INFO [07-31|15:12:31.974] Imported new block headers count=2048 elapsed=2.658s    number=3578047 hash=e02b8a…0800ca ignored=0
INFO [07-31|15:12:34.430] Imported new block headers count=2048 elapsed=2.416s    number=3580095 hash=52e428…97cfee ignored=0
INFO [07-31|15:12:36.811] Imported new block headers count=2048 elapsed=2.337s    number=3582143 hash=b6e631…79fe1a ignored=0

Steps to Reproduce

  1. using the ETH Wallet PR, start Brave using a new profile
  2. enable ETH Wallet using about:preferences#ethwallet and restart the browser

Sometimes when you're pulling the latest state, you'll run into this issue.

Actual result:

Entire browser becomes unresponsive when ERROR[07-31|15:12:28.007] write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe are being spammed in the terminal due to issues with geth.

Expected result:

Even though geth fails to connect/pull the latest state, it should never cause the entire browser to become completely unresponsive.

Reproduces how often:

I've seen it happen a few times but the % isn't high. However, as mentioned below under the Additional Information section, @LaurenWags also ran into the same issue.

Brave Version

about:brave info:

Brave: 0.25.0 
V8: 6.7.288.46 
rev: b85dfa16ae78413d47b0ef76fd2e5971b2a5f44b 
Muon: 7.1.6 
OS Release: 17.7.0 
Update Channel:  
OS Architecture: x64 
OS Platform: macOS 
Node.js: 7.9.0 
Brave Sync: v1.4.2 
libchromiumcontent: 67.0.3396.103

Reproducible on current live release:

Currently not reproducible on the live release as this feature hasn't been released.

Additional Information

@LaurenWags ran into this as well when we were debugging her connectivity issues. Once Brave connected to the nodes and started pulling the latest state, the browser became unresponsive for about 10s before the terminal was spammed with several write tcp 127.0.0.1:8546->127.0.0.1:52489: write: broken pipe errors.

Qtest-plan-specified bug featurETH-Wallet prioritP2 release-noteexclude releasblocking reverted

All 8 comments

Currently running commit https://github.com/brave/browser-laptop/commit/50840f4b9424f539ffd3d58dd10459369d21829f and my browser started to show unresponsive in the terminal, but recovered (no mention of the broken pipe). During this time All opened tabs went white and any new tab I opened was also white. Brave did recover after about a minute, during which time the geth process skyrocketed and consumed a ton of CPU.

2-3 minutes later, browser went unresponsive again (still no mention of broken pipes), and all tabs are white again. Brave has not recovered from this state (been about 20 minutes).

When I quit Brave, I got a ton of messages about the broken pipe - same as in the description.

I suspect the browser might become unresponsive, because the Eth-Wallet app is equally chatty on errors when this happens. I haven't been able to repro this state myself tho.

@flamsmark the current theory is that geth is starving the OS out of some resources (maybe descriptors?). I looked it up, it does not seem like geth has any options to limit the number of connections but we can limit the number of peers to something smaller.

Reducing peers may be good in general, even helpful ameliorating this bug, but an EPIPE from write(2) means the write-caller should stop looping, close the fd for the write end of the pipe, and arrange for higher level recovery. Sounds like that is not happening. In what code does that write that's logged as getting EPIPE live, can someone debug and get a stack and source coordinates?

Verified on Windows x64 with
• 0.23.100 5e197a1a877c9967aecc838b9a7ca528ee5b46bd
• Muon 8.0.8
• libchromiumcontent 68.0.3440.84

@mrose17 We've probably reduced the probability/severity of this issue, but I'd prefer to actually solve it at the root cause.

fwiw I see unresponsive and the browser freezing up even when geth is not running:

GETH: spawned
Failed to configure static nodes peers ENOENT: no such file or directory, open '/home/user/.config/brave-development/ethereum/ropsten/geth/static-nodes.json'
GETH exit: Code: 2 | Signal: null
GETH close: Code: 2 | Signal: null

a few moments later...

unresponsive

I checked that geth had indeed exited

i believe that https://github.com/brave/browser-laptop/pull/15029 should resolve most, if not all, of these cases.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

eljuno picture eljuno  Â·  3Comments

briannyeko picture briannyeko  Â·  3Comments

antiroyalty picture antiroyalty  Â·  3Comments

octohedron picture octohedron  Â·  3Comments

jkup picture jkup  Â·  3Comments