In a new server installation (on a VPS provided by OVH) running Ubuntu 18.x the Docker system works flawlessy and fast but docker-compose sometimes hangs indefinitely for no reason. And for every command, from version to ps, up, logs and so on.
I tried everything regarding DNS and name resolution, even disabling systemd-resolved.
When it works it is fast. When it hangs I can see the /tmp/_Mxyz123123
directory created.
Running with verbose hangs before any output. And then completes flawlessy.
Output of docker-compose version
docker-compose version 1.24.0, build 0aa59064
docker-py version: 3.7.2
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
Output of docker version
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:43:57 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
Output of docker-compose config
(Make sure to add the relevant -f
and other flags)
... not related to a specific docker-compose file
Sometimes hangs
Run fast.
No stacktrace but i managed to strace the behavior.
[...]
openat(AT_FDCWD, "/tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile> <0.000038>
fstat(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000020>
write(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, "# Generated automatically from M"..., 69632) = 69632 <0.000130>
fchmod(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, 0700) = 0 <0.000050>
write(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>, "type.o -o Modules/xxsubtype$(E"..., 43) = 43 <0.000029>
close(3</tmp/_MEIsYFQ2P/lib/python3.6/config-3.6m-x86_64-linux-gnu/Makefile>) = 0 <0.000021>
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2088645a10) = 3110 <0.000276>
rt_sigaction(SIGINT, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 <0.000051>
rt_sigaction(SIGKILL, {sa_handler=0x404840, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, 0x7ffe82f7d900, 8) = -1 EINVAL (Invalid argument) <0.000031>
rt_sigaction(SIGTERM, {sa_handler=0x404840, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 <0.000028>
wait4(-1,
[... here it hangs and then after 114s it continues output ...]
wait4(-1, docker-compose version 1.24.0, build 0aa59064
docker-py version: 3.7.2
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3110 <114.945565>
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3110, si_uid=0, si_status=0, si_utime=43, si_stime=6} ---
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, 8) = 0 <0.000094>
rt_sigaction(SIGKILL, {sa_handler=SIG_DFL, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f2087c55f20}, 0x7ffe82f7d900, 8) = -1 EINVAL (Invalid argument) <0.000150>
[...]
Linux vps[omit] 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 18.04.2 LTS
docker-compose installed with curl
I have lot of other docker-compose environments working and this is the first time I see this behavior. Any idea on what to check or where to look?
@xcash Thank you for the ticket. We are going to investigate. Do you have any minimal docker-compose.yaml
triggering that issue? Or does the compose file contain some specific elements (like volumes or dedicated network)?
@jcsirot No as it happens even when issuing a simple docker-compose version
without a docker-compose.yaml. It seems not related to the docker-compose.yml. It seems it hangs before even checking for it, like, as in the strace above, when checking for docker subsystem.
Btw the compose file mounts local volumes and uses default network. Uses a bitnami/nginx image.
This happens to me as well, running on a Google Compute Engine instance with the same docker versions as above. Also Ubuntu 18.04. Even docker-compose --help does this, so no docker-compose.yaml is involved.
Same behavior for me as well. I'm running on a Digital Ocean droplet running Ubuntu 18.04.2 LTS with 4.15.0-48-generic x86_64 GNU/Linux kernel. Installed via curl today.
Same problem for me, Linode with Ubuntu 18.04.2. Going back to version 1.23.0 fixed it, although I had to recreate the containers.
It seems like after the last update of Docker itself the problem is gone... can you guys (@hschalamonek @bitbuck3t @will-emmerson ) check it too?
@xcash I'm running on Docker version 18.09.6, build 481bc77
and it's still happening.
Ubuntu 18.04.2LTS x64 with everything up to date
as a work around: you must call docker-compose
command once and
after that issue your desired command.
@xcash was your strace
capture (from the original issue) captured with -fff
to follow forks/clones?
If I strace
locally (running 1.24.0
) which doesn't hang (there's a brief pause, ~1s) I do notice:
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f07acbaea10) = 16880
rt_sigaction(SIGINT, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGKILL, {sa_handler=0x404840, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, 0x7ffdb3f8c1f0, 8) = -1 EINVAL (Invalid argument)
rt_sigaction(SIGTERM, {sa_handler=0x404840, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16880
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16880, si_uid=1000, si_status=0, si_utime=60, si_stime=8} ---
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, 8) = 0
rt_sigaction(SIGKILL, {sa_handler=SIG_DFL, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, 0x7ffdb3f8c1f0, 8) = -1 EINVAL (Invalid argument)
rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, {sa_handler=0x404840, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f07acbe88e0}, 8) = 0
openat(AT_FDCWD, "/tmp/_MEI9VTJSw", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
e.g. at the point you have a hang I have a child which is spawned and then exits.
I think the /tmp/
stuff is the self-extracting stuff from https://github.com/pyinstaller/pyinstaller.
The first exec looks to be some sort of self exec, and it then execs /sbin/ldconfig
twice. What I saw was (please excuse my ascii...):
+-- /tmp/dc.16879:execve("./docker-compose-Linux-x86_64", ["./docker-compose-Linux-x86_64", "version"], 0x7fff52d107c0 /* 56 vars */) = 0
` /tmp/dc.16879:clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f07acbaea10) = 16880
`-- /tmp/dc.16880:execve("/home/ijc/development/Docker/compose/compose/docker-compose-Linux-x86_64", ["./docker-compose-Linux-x86_64", "version"], 0xfa0530 /* 58 vars */) = 0
|
+- /tmp/dc.16880:clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb8f4c91a10) = 16881
| `-- /tmp/dc.16881:execve("/sbin/ldconfig", ["/sbin/ldconfig", "-p"], 0x7fb8f425ef00 /* 2 vars */) = 0
|
`- /tmp/dc.16880:clone(child_stack=NULL, f lags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb8f4c91a10) = 16882
`-- /tmp/dc.16882:execve("/sbin/ldconfig", ["/sbin/ldconfig", "-p"], 0x7fb8efbbb168 /* 2 vars */) = 0
I wonder if there is something which is slowing /sbin/ldconfig
on your system unacceptably? e.g. an nfs or autofs mount in the paths it is hitting or something like that.
FWIW I was running strace -s 4096 -o /tmp/dc -fff ./docker-compose-Linux-x86_64 version
. You might also want to add -tt
to timestamp all the calls and see if there is an obviously slow bit.
@xcash I'm running on
Docker version 18.09.6, build 481bc77
and it's still happening.Ubuntu 18.04.2LTS x64 with _everything_ up to date
On that machine (the only one I had with that problem) it disappeared and we use the same version :(
# docker-compose version
docker-compose version 1.24.0, build 0aa59064
# docker version
Client:
Version: 18.09.6
API version: 1.39
Go version: go1.10.8
Git commit: 481bc77
Built: Sat May 4 02:35:57 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.6
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 481bc77
Built: Sat May 4 01:59:36 2019
OS/Arch: linux/amd64
Experimental: false
@xcash was your
strace
capture (from the original issue) captured with-fff
to follow forks/clones?
Sadly not.
e.g. at the point you have a hang I have a child which is spawned and then exits.
I think the
/tmp/
stuff is the self-extracting stuff from https://github.com/pyinstaller/pyinstaller.I've seen the self extraction stuff before that line. In the output I evicted from the post.
I wonder if there is something which is slowing
/sbin/ldconfig
on your system unacceptably? e.g. an nfs or autofs mount in the paths it is hitting or something like that.
Considering it is a VPS it's entirely possible something like that behind the scenes.
I tried issuing a docker command (not compose) while docker-compose was locking/locked and it worked seamlessy but it could've been an operation that didn't hit the filesystem..
FWIW I was running
strace -s 4096 -o /tmp/dc -fff ./docker-compose-Linux-x86_64 version
. You might also want to add-tt
to timestamp all the calls and see if there is an obviously slow bit.
Should this happen again i will use your strace options. :)
Thanks @xcash
Considering it is a VPS it's entirely possible something like that behind the scenes.
FWIW one thing I considered mentioning before was that some of the lower tier virtual machines on AWS (and I presume other clouds) are scheduled in such a way that if they use all their "credit" for a given period they essentially hang (the whole VM is descheduled) until they rebuild up sufficient credit.
I _think_ this is probably not the case for you, since only docker-compose
is hanging not the whole VM (AIUI, e.g. you see docker
cli commands working while in the hung state). Also hangs of 114s like you are seeing are so close to triggering Linux's default soft-lockup check (120s) that I think someone would have mentioned that if it were relevant.
So I think this is almost certainly not relevant to this case (at least to your particular instance), but worth mentioning just in case.
Should this happen again i will use your strace options. :)
Thank you!
BTW I think running /sbin/ldconfig
while you are in the "bad" state should also take a long time, which would be interesting to validate.
I am still having the same problem, with upgraded everything, but can confirm that running docker-compose
on its own does seem to fix it.
@will-emmerson please can you try collecting some logs with strace
(see my reply to @xcash above for the runes) while the problem is exhibiting itself.
I ran the following command
strace -s 4096 -o /tmp/dc -fff -tt docker-compose --version
and it created these files:
3220120 May 21 10:34 dc.29714
8457740 May 21 10:34 dc.29717
27423 May 21 10:33 dc.29718
27423 May 21 10:33 dc.29719
What should I do with them?
I also ran strace -c docker-compose --version
and it output this:
docker-compose version 1.24.0, build 0aa59064
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
62.83 0.046038 46038 1 wait4
20.47 0.014998 88 170 write
9.71 0.007118 82 87 unlink
2.79 0.002041 8 248 read
0.94 0.000689 4 191 openat
0.83 0.000609 3 231 98 stat
0.61 0.000447 56 8 munmap
0.36 0.000266 22 12 rmdir
0.31 0.000225 1 191 close
0.23 0.000166 7 24 getdents
0.22 0.000160 13 12 mkdir
0.21 0.000153 153 1 clone
0.20 0.000144 1 192 fstat
0.19 0.000140 2 87 fchmod
0.07 0.000053 1 90 lseek
0.02 0.000015 2 9 brk
0.01 0.000010 2 6 2 rt_sigaction
0.01 0.000007 0 17 mmap
0.00 0.000000 0 7 mprotect
0.00 0.000000 0 5 5 access
0.00 0.000000 0 1 getpid
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 readlink
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.073279 1593 105 total
I've never used strace before.
@will-emmerson easiest is probably to tar (or zip) them up and attach them here, thanks.
I'm not sure I want to upload a bunch of logs to a public place when I don't really know what's in them.
@will-emmerson hrm, yes, that's fair. please zip/tar them and mail them to ijc (at) docker.com.
I've sent them.
Received, thanks.
So your process tree is:
29714 `docker-compose --version`
`-- 29717 self exec of `docker-compose --version`
`-- 29718 exec `/sbin/ldconfig` → exec `/sbin/ldconfig.real`
`-- 29719 exec `/sbin/ldconfig` → exec `/sbin/ldconfig.real`
The two processes 29718 and 29719 execute fully in <1s. e.g. dc.29718 is
6678-will-emmerson$ sed -n -e '1p;$p' dc.29718
10:33:59.478903 set_robust_list(0x7fa0b3052a20, 24) = 0
10:33:59.483282 +++ exited with 0 +++
6678-will-emmerson$ sed -n -e '1p;$p' dc.29719
10:33:59.597443 set_robust_list(0x7fa0b3052a20, 24) = 0
10:33:59.602154 +++ exited with 0 +++
Which I think rules out something slowing ldconfig down, at least in your case.
It's the two main processes which take some time:
6678-will-emmerson$ sed -n -e '1p;$p' dc.29714
10:33:58.821291 execve("/usr/local/bin/docker-compose", ["docker-compose", "--version"], 0x7ffea9630aa8 /* 23 vars */) = 0
10:34:49.283590 +++ exited with 0 +++
6678-will-emmerson$ sed -n -e '1p;$p' dc.29717
10:33:59.145695 execve("/usr/local/bin/docker-compose", ["docker-compose", "--version"], 0xa6d630 /* 25 vars */) = 0
10:34:49.268734 +++ exited with 0 +++
In dc.29714
(the parent) there is a gap of ~48s after 10:34:01.454543:
10:33:59.145224 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd813295a10) = 29717
10:33:59.145455 rt_sigaction(SIGINT, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
10:33:59.145495 rt_sigaction(SIGKILL, {sa_handler=0x404840, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, 0x7ffd2ab7dcd0, 8) = -1 EINVAL (Invalid argument)
10:33:59.145521 rt_sigaction(SIGTERM, {sa_handler=0x404840, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
10:33:59.145545 wait4(-1, 0x7ffd2ab7dedc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10:34:01.163587 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.163639 wait4(-1, 0x7ffd2ab7dedc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10:34:01.279720 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.279774 wait4(-1, 0x7ffd2ab7dedc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10:34:01.305847 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.305904 wait4(-1, 0x7ffd2ab7dedc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10:34:01.333863 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.333915 wait4(-1, 0x7ffd2ab7dedc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10:34:01.396106 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.396157 wait4(-1, 0x7ffd2ab7dedc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
10:34:01.454491 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.454543 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 29717
10:34:49.268835 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=29717, si_uid=0, si_status=0, si_utime=44, si_stime=6} ---
10:34:49.268861 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, {sa_handler=0x404840, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, 8) = 0
10:34:49.268891 rt_sigaction(SIGKILL, {sa_handler=SIG_DFL, sa_mask=[KILL], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, 0x7ffd2ab7dcd0, 8) = -1 EINVAL (Invalid argument)
10:34:49.268916 rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, {sa_handler=0x404840, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fd8128a5f20}, 8) = 0
Which seems to be the parent waiting for the child.
In the child (dc.29717
) the gap is after 10:34:01.454593:
10:33:59.629565 openat(AT_FDCWD, "/dev/random", O_RDONLY) = 3
10:33:59.629613 poll([{fd=3, events=POLLIN}], 1, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
10:34:01.163675 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.163690 restart_syscall(<... resuming interrupted poll ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
10:34:01.279809 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.279823 restart_syscall(<... resuming interrupted restart_syscall ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
10:34:01.305939 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.305952 restart_syscall(<... resuming interrupted restart_syscall ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
10:34:01.333951 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.333965 restart_syscall(<... resuming interrupted restart_syscall ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
10:34:01.396192 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.396205 restart_syscall(<... resuming interrupted restart_syscall ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
10:34:01.454579 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
10:34:01.454593 restart_syscall(<... resuming interrupted restart_syscall ...>) = 1
10:34:49.023121 close(3) = 0
Aha! So the process is blocked in a poll
syscall (poll([{fd=3, events=POLLIN}], 1, -1)
) where the only passed fd is fd 3, which is a recently opened fd onto /dev/random
(openat(AT_FDCWD, "/dev/random", O_RDONLY) = 3
).
The /dev/random
device is well known for blocking when there is insufficient entropy on the system. And VPSs/cloud VMs are well known for having a general shortage of entropy.
There is no use of /dev/random
in compose (outside of a single unrelated test case), nor is there in docker-py
, so this must be in one of our dependencies.
Although I cannot reproduce the hang I _can_ see the use of /dev/random
in my own strace (my system must have plenty of entropy I guess).
The maddest bit is that after blocking waiting for /dev/random
to become readable it immediately closes it and opens /dev/urandom
(the non-blocking one) instead and reads from that.
Well, thank you for those logs, they have at least given me a string to pull on and see where it takes me...
No problem, it makes sense as it's a new VPS with very little use. And installing haveged
fixes the problem.
Not long before the /dev/random
dance I described I see references to cryptography.hazmat.bindings._openssl.so
in the traces.
With that knowledge this comment seems awfully telling:
https://github.com/pyca/cryptography/blob/master/src/_cffi_src/openssl/src/osrandom_engine.c#L123...L127
Which in turn seems to derive from:
https://github.com/pyca/cryptography/issues/4645
https://github.com/pyca/cryptography/pull/4656
So I feel pretty sure in declaring that this isn't a compose issue as such, but some combination of Python's behaviour and lack of entropy on some systems. Installing haveged or some other entropy daemon or otherwise ensuring you have sufficient entropy seems like the right answer.
I'm going to close since this does not appear to be a bug in compose, thanks for your help in getting to the bottom of the issue.
FWIW https://lwn.net/Articles/693189/ is an interesting article on a related subject (I'm not sure whether or not this issue is the one being described there, but the symptoms are certainly similar).
@ijc congrats for the troubleshoot Ian! :)
It makes sense even in my setup as the problem disappeared after that VPS went in production and thus started creating entropy.
Yeah! I had number 6 in entropy_avail
, after Haveged installation number increased to ~2500 and now docker-compose
commands are running pretty fast.
Very strange issue, @ijc thank you for your investigation that issue was so unexpectable but interesting to discover.
I don't understand the conclusion of this issue. At the end of it all, what can I do to not have my docker-compose commands from hanging??
I don't understand the conclusion of this issue. At the end of it all, what can I do to not have my docker-compose commands from hanging??
You can try my patch here to keey away from entropy
, pynacl
, libsodium
, paramiko
and anything else that you don't need: https://github.com/docker/compose/issues/6552#issuecomment-508124668 .
I don't understand the conclusion of this issue. At the end of it all, what can I do to not have my docker-compose commands from hanging??
@sebastialonso After reading this thread I solved it by installing haveged
(https://github.com/docker/compose/issues/6678#issuecomment-494398740) on my VPS to get more entropy.
Here is another link on the same subject: https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged
This isn't really fixed, you guys are just giving workarounds and patches... like the issue is still happening even on the latest version.
I don't mean anything in my post, but there's a difference between a closed issue and one that is hanging because the proper solution hasn't been found yet.
Even if the author got satisfied, it doesn't mean that we should ignore this issue. no?
Dont get me wrong.
Thanks @Phuker, but I can't find the file you mention on my machine.
@cfarvidson In my cloud servers I still don't have this issue, this is happening in my local machine. It seems really bad to me to install a third party software for a bug inside the docker-compose package, so I guess I'll just wait for some kind of answer.
Hi @M034B5
This isn't really fixed, you guys are just giving workarounds and patches... like the issue is still
happening even on the latest version.
This is a typical "won't fix" case since this is not an issue in Docker or Compose. It's a general system entropy issue.
If a software needs entropy and the system is not providing enough it's an OS issue. And everything you do to bypass that is a general workaround.
I don't mean anything in my post, but there's a difference between a closed issue and one that is hanging because the proper solution hasn't been found yet.
Indeed. But here the proper solution is having a good amount of entropy provided by the operating system. :) You are searching the "proper solution" in the wrong place.
Even if the author got satisfied, it doesn't mean that we should ignore this issue. no?
Dont get me wrong.
Don't get me wrong too and I'm not the author. I'm only providing what i think is the right point of view on this issue and why it's closed. :)
@sebastialonso
As a recap on this issue. The problem is that Docker requires system entropy.
Sometimes the OS is not providing enough "natural entropy" for a number of reasons.
In these cases you need to "fix the OS" generating more entropy and that's what haveged
is built for.
Without that, Docker waits for enough entropy and that depends on what the OS is doing in the background.
Thanks @xcash, I'll install this haveged tool to solve my issues.
If this was a problem for you please upvote https://github.com/docker/compose/issues/6931 so we can get better docs, which would have avoided lots of wasted time chasing down this problem.
No, it didn't. Turns out my entropy was already high enough before installing that package. I installed anyway, but everything stands the same.
Why should "docker-compose --help" require system entropy?
That's a good point, and that's precisely what I'm experiencing.
Could be related to the amount of disk/memory the docker daemon needs to
operate?
On Thu, 10 Oct 2019, 06:13 lazarcf, notifications@github.com wrote:
Why should "docker-compose --help" require system entropy?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/compose/issues/6678?email_source=notifications&email_token=AA2U4TI3XX7K32WZAUW73ZDQN3W23A5CNFSM4HJC7I62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA3QLBA#issuecomment-540476804,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA2U4TJV42B34CKLBI5DW63QN3W23ANCNFSM4HJC7I6Q
.
any update to the issue?
Installing haveged fixed this problem for me.
CentOS 7
docker 19.03.5
docker-compose 1.25.0
I encountered this issue, but realized I was in a Python 3.8 virtualenv, ran docker-compose up -d
outside of it and no longer had any weird stalling issues.
I want to also confirm that installing haveged
fixed our problem.
Using Digital Ocean's Droplet, with:
We used this article to install haveged
https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged
I hope this helps anyone just in case.
FWIW, installing jitterentropy-rngd on a small server running Ubuntu 20.04 (via apt install jitterentropy-rngd
) resolved this problem for me.
Now cat /proc/sys/kernel/random/entropy_avail
shows a reasonable amount of entropy available (even after a fresh boot) and the hangs from docker-compose --help
or docker-compose --version
are _significantly_ reduced.
Most helpful comment
Same behavior for me as well. I'm running on a Digital Ocean droplet running Ubuntu 18.04.2 LTS with 4.15.0-48-generic x86_64 GNU/Linux kernel. Installed via curl today.