Windows build of our electron app is consistently failing. yarn install --frozen-lockfile failed to download https://registry.yarnpkg.com/date-fns/-/date-fns-2.12.0.tgz and https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz. Failure to download @material-ui/icons was reported with ESOCKETTIMEDOUT. However, I expected the buildhost to have a well-provisioned network as it was a Github Actions runner. Linux build was working fine.
I assumed that high-latency disk IO may be a reason and managed to get a test-case that reproduces the issue reliably: ESOCKETTIMEDOUT is reliably triggered on Linux when small and realistic delay (8ms) is injected to disk IO system calls.
ESOCKETTIMEDOUT being reported because of slow disk IO is very confusing behavior, as it sounds like temporary network error while the root cause is different. It does not match my understanding of _"Developing Javascript projects shouldn't leave the door open to surprises"_ motto, so I'm reporting this test-case as a separate issue despite possible duplicates in the issue tracker. :slightly_smiling_face:
Command
ubuntu $ docker run --privileged --rm -ti --net=host node:12-buster /bin/bash
docker # apt update && apt install -y strace
docker # mkdir ~/prj && cd ~/prj
docker # strace -f -o /dev/null -e trace=stat -e inject=stat:delay_exit=8000 yarn add @material-ui/icons@^4.5.1
What is the current behavior?
yarn add v1.22.4
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
info There appears to be trouble with your network connection. Retrying...
error An unexpected error occurred: "https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz: ESOCKETTIMEDOUT".
info If you think this is a bug, please open a bug report with the information provided in "/root/prj/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/add for documentation about this command.
What is the expected behavior?
If I run exactly same command with delay_exit=1 (0.001ms) instead of delay_exit=8000 (8ms), I get the expected behavior:
yarn add v1.22.4
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
warning " > @material-ui/[email protected]" has unmet peer dependency "@material-ui/core@^4.0.0".
warning " > @material-ui/[email protected]" has unmet peer dependency "react@^16.8.0".
warning " > @material-ui/[email protected]" has unmet peer dependency "react-dom@^16.8.0".
[4/4] Building fresh packages...
success Saved lockfile.
success Saved 3 new dependencies.
info Direct dependencies
鈹斺攢 @material-ui/[email protected]
info All dependencies
鈹溾攢 @babel/[email protected]
鈹溾攢 @material-ui/[email protected]
鈹斺攢 [email protected]
Done in 53.01s.
Steps to Reproduce
First, strace adds some overhead on it's own and it may affect reproducibility. E.g. yarn add @material-ui/icons@^4.5.1 is Done in 5.76s. in the very same environment without strace wrapper. That's why I compare strace-with-delay to strace-without-delay and don't compare it to "clean" run.
Second, I've taken stat() call from the following:
strace -f -o ~/yarn-trace yarn add @material-ui/icons@^4.5.1grep -F AccessAlarmsRounded.d.ts ~/yarn-trace. It had 5 openat() calls, 4 lstat() calls, 1 stat() call, 1 chmod() call. So I've taken stat(/usr/local/share/.cache/yarn/v6/.../AccessAlarmsRounded.d.ts) as a place to inject delay to.Third, I've taken 8ms delay assuming that there is single stat() system call per unpacked file and I was emulating HDD-based system having 125 IOPS performance. It's all a ballpark estimate: 1ms delay works on my system, 2ms fails with ESOCKETTIMEDOUT once but manages to install a package after retry, 4ms and 8ms fail reliably.
Fourth, as soon as TCP buffering is involved (see comment on TCP ZeroWindow later), available network bandwidth and size of socket buffer may be also a factor playing a role in bug reproducibility. I've reproduced the bug with these exact variable values with Ubuntu 16.04 laptop connected by 100 Mbit/s link in Russian St. Petersburg and on Linode VM in Newark (see below).
Fifth, your node build may interact with OS kernel a bit differently, e.g. it may use open() instead of openat(). So, if the test-case fails for you, try to increase the injected latency for the disk-related system call or change a disk-related system call. I reproduced the issue on Ubuntu 18.04 VM in Linode Newark availability zone, but I had to use openat as a latency-injection point instead of stat. 4 statx() and 3 openat() syscalls were made for the aforementioned filename at that VM.
Comments and assumptions
SQLite has "faster than FS" benchmarks showing that Windows had pretty bad performance (compared to Linux) while operating with lots of small files. Both date-fns and @material-ui/icons have thousands of files as well as packages mentioned in "Possible duplicates" section. That explains that Windows users are suffering way more from ESOCKETTIMEDOUT happening while installing packages with thousands of files.
@FredyC came to the same idea that high-latency HDD being used instead of low-latency SSD triggers the ESOCKETTIMEDOUT in https://github.com/yarnpkg/yarn/issues/6221#issuecomment-412844630
@Hinaser made an excellent comment describing packet capture https://github.com/yarnpkg/yarn/issues/5259#issuecomment-389096161 yarn probably stops reading from a socket (so client OS sends TCP ZeroWindow) and eventually closes the socket from the client side.
I assume that node or yarn is busy unpacking well-compressed tarball full of small files and does not restart reading from socket for long enough time, so ESOCKETTIMEDOUT is triggered. I assume that the code also does not disable socket timeout while putting stream in paused state.
I assume, the possible fix is to download .tgz to a temporary file with some timeouts for network interactions and to unpack it without any timeouts as disk can't write faster anyway. Unfortunately, I'm not familiar with yarn codebase to provide a good PR.
Environment
12.18.21.22.4 node:12-buster running on top of Ubuntu 16.04 or 18.04yarn-error.log is the following:
Arguments:
/usr/local/bin/node /opt/yarn-v1.22.4/bin/yarn.js add @material-ui/icons@^4.5.1
PATH:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Yarn version:
1.22.4
Node version:
12.18.2
Platform:
linux x64
Trace:
Error: https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz: ESOCKETTIMEDOUT
at ClientRequest.<anonymous> (/opt/yarn-v1.22.4/lib/cli.js:141375:19)
at Object.onceWrapper (events.js:421:28)
at ClientRequest.emit (events.js:315:20)
at TLSSocket.emitRequestTimeout (_http_client.js:709:9)
at Object.onceWrapper (events.js:421:28)
at TLSSocket.emit (events.js:327:22)
at TLSSocket.Socket._onTimeout (net.js:481:8)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
npm manifest:
No manifest
yarn manifest:
No manifest
Lockfile:
No lockfile
Possible duplicates:
[email protected] with 29090 files on Windows[email protected] with 4742 files on Windows 10[email protected] on macOS 10.13; 14 Mib rxjs-6.5.3 on Windows[email protected] on Windowsmaterial-design-icons on Windows[email protected] with 3115 files on Windows 10material-design-icons-3.0.1 with 89814 files on Windows 10lottie-react-native-2.3.2 with 6275 files on Windows 10rxjs-5.5.12 with 3661 files on Windows@material-ui/[email protected] with 15667 files on Windowsnpm-6.11.3 with 4086 files on Windows@carbon/[email protected] with 6165 files on Windows[email protected] with 1489 files on WindowsWindows 10.
yarn v1.22.4
node v10.19.0
WSL
Executed:
npm init react-app my-ap
Failed:
[2/4] Fetching packages...
error An unexpected error occurred: "https://registry.yarnpkg.com/rxjs/-/rxjs-6.5.4.tgz: ESOCKETTIMEDOUT".
I'm experiencing the same exact issue.
It's a massive issue for us, since it's blocking deploys to production.
Anyone has any idea how to solve this?
What changed since last week?
@AmirTugi the workaround that works for me is to do yarn config set network-timeout 300000 to raise timeout to 5m from 30s.
However, that's just a workaround, not a fix.
Right, I tried to raise it to 1000000 and it didn't work.
So that's an arbitrary work-around :)
We are facing this issue on our pipeline servers too.. Has anything changed on yarn?
me too, I had tries more than 30 times these days and always got timeout, which is so annoying
Even, I see a the same problem in the app center build services. Can anyone please look into this issue?
Is there any yarn status page available?
Even, I see a the same problem in the app center build services. Can anyone please look into this issue?
Is there any yarn status page available?
The previous failures I observed corresponded to npm outages shown on this page - https://status.npmjs.org/
@darkk did a wonderful description of the bug. I proposed a PR with his proposal fix in mind:
tgz file in cache - no uncompress at the same time. This part is still constrained by network timeouttgz file in cache folder to uncompress and do other stuff (compute checkums, build metadata ...)I'm also seeing this on my builds on a Digital Ocean (SSD) build server the last couple of days (since setting the build server up).
Most helpful comment
Even, I see a the same problem in the app center build services. Can anyone please look into this issue?
Is there any yarn status page available?