On one node with a high load I found multiple autoupdater processes running. Could we use a lockfile to prevent this?
root@somenode:~# uptime
23:48:41 up 3 days, 9:09, load average: 3.03, 2.52, 2.28
root@somenode:~# ps
PID USER VSZ STAT COMMAND
1 root 1388 S /sbin/procd
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
5 root 0 SW< [kworker/0:0H]
7 root 0 SW< [khelper]
62 root 0 SW< [writeback]
65 root 0 SW< [bioset]
67 root 0 SW< [kblockd]
100 root 0 SW [kswapd0]
147 root 0 SW [fsnotify_mark]
178 root 0 SW< [ath79-spi]
292 root 0 SW< [deferwq]
363 root 0 SWN [jffs2_gcd_mtd3]
446 root 892 S /sbin/ubusd
447 root 772 S /sbin/askfirst ttyS0 /bin/ash --login
603 root 0 SW< [bat_events]
695 root 0 SW< [cfg80211]
830 root 1036 S /sbin/logd -S 16
835 root 1600 S /usr/sbin/haveged -w 1024 -d 32 -i 32 -v 1
943 root 1608 S /sbin/netifd
974 root 1392 S /usr/sbin/crond -f -c /etc/crontabs -l 5
994 root 1152 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
1003 root 788 S /usr/sbin/gluon-crond /lib/gluon/cron
1008 root 1108 S /usr/sbin/gluon-radvd -i br-client -p 2001:bf7:540::/64
1041 root 1132 S /usr/sbin/uhttpd -f -h /lib/gluon/status-page/www -r somenode -x /c
1051 root 916 S /usr/sbin/dnsmasq -x /var/run/gluon-wan-dnsmasq.pid -u root -i lo -p 54 -h -r
1125 root 3292 R /usr/bin/fastd --config - --daemon --pid-file /var/run/fastd.mesh_vpn.pid
1282 root 1392 S /usr/sbin/ntpd -n -p 1.ntp.services.bremen.freifunk.net -p 2.ntp.services.brem
1313 root 1112 S /usr/sbin/alfred -i br-client -b bat0
1334 root 1320 S /usr/sbin/batadv-vis -i bat0 -s
1340 root 812 S odhcp6c -s /lib/netifd/dhcpv6.script -t120 br-client
1386 root 1388 S udhcpc -p /var/run/udhcpc-br-wan.pid -s /lib/netifd/dhcp.script -f -t 0 -i br-
1434 root 812 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 br-wan
1640 nobody 924 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf -k
1902 root 2152 S /usr/bin/respondd -g ff02::2:1001 -p 1001 -c return require("gluon.announced")
2557 root 0 SW [kworker/0:3]
3010 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
3013 root 2080 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
3023 root 0 Z [sh]
3129 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
3131 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
3144 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
3145 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
4772 root 0 Z [sh]
4826 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
4827 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
5656 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
5658 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
5662 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
5663 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
5753 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
5757 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
5759 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
5760 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
8352 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
8354 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
8358 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
8359 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
9126 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
9128 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
9132 root 0 Z [sh]
9183 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
9184 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
10611 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
10616 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
10617 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
10618 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
11439 root 1388 S /bin/sh -c /usr/sbin/autoupdater
11444 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater
11445 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
11446 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
11462 root 1584 S /usr/sbin/hostapd -P /var/run/wifi-phy0.pid -B /var/run/hostapd-phy0.conf
11503 root 1576 S /usr/sbin/hostapd -P /var/run/wifi-phy1.pid -B /var/run/hostapd-phy1.conf
12031 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
12033 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
12038 root 0 Z [sh]
12099 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
12100 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
14067 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
14072 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
14074 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
14075 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
14929 root 0 SW [kworker/u2:2]
16847 root 1220 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
16944 root 0 SW [kworker/0:0]
17185 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
17187 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
17191 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
17192 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
17311 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
17316 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
17317 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
17318 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
17435 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
17437 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
17441 root 0 Z [sh]
17500 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
17501 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
18092 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
18094 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
18098 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
18099 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
18953 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
18960 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
18971 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
18972 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
19766 root 1168 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
21013 root 0 SW [kworker/0:1]
21349 root 1508 S {dhcpv6.script} /bin/sh /lib/netifd/dhcpv6.script br-client ra-updated
21350 root 1504 R {dhcpv6.script} /bin/sh /lib/netifd/dhcpv6.script br-client ra-updated
21363 root 1392 S -ash
21368 root 1388 R ps
21369 root 1508 S {dhcpv6.script} /bin/sh /lib/netifd/dhcpv6.script br-client ra-updated
21370 root 1036 R jshn -w
22019 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
22024 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
22025 root 0 Z [sh]
22045 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
22046 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
23324 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
23326 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
23330 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
23331 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
24215 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
24220 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
24221 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
24222 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
25814 root 0 DW [kworker/0:2]
26316 root 0 SW [kworker/u2:1]
26908 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
26911 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
26934 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
26937 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
29222 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
29227 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
29228 root 0 Z [sh]
29300 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
29301 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
29338 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
29340 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
29344 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
29345 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
30421 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
30428 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
30437 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
30438 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
30490 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
30495 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
30496 root 0 Z [sh]
30529 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
30530 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
30612 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
30617 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
30628 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
30629 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
31101 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
31106 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
31108 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
31109 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
31630 root 0 SW [kworker/u2:0]
31905 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
31912 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
31922 root 0 Z [sh]
31994 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
31995 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
32294 root 1388 S /bin/sh -c /usr/sbin/autoupdater --fallback
32299 root 2072 S {autoupdater} /usr/bin/lua /usr/sbin/autoupdater --fallback
32300 root 0 Z [sh]
32359 root 1388 S sh -c wget -T 120 -O- 'http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/s
32360 root 1396 S wget -T 120 -O- http://[2a03:b0c0:3:d0::19:4001]/ffhb-mirror/firmware/stable/s
Weird, wget is blocking, but the -T 120 should kill it after 2 minutes. Is this a normal Gluon build? The default OpenWrt config for busybox's wget doesn't support -T (and ignores it), but on Gluon, it should work.
Yes, it's the build 2015.1.2+bremen2 / gluon-v2015.1.2:
https://github.com/FreifunkBremen/gluon-site-ffhb/tree/v2015.1.2+bremen2
The -T options seems to be build in:
wget --help
BusyBox v1.22.1 (2015-11-07 22:12:39 CET) multi-call binary.
Usage: wget [-c|--continue] [-s|--spider] [-q|--quiet] [-O|--output-document FILE]
[--header 'header: value'] [-Y|--proxy on/off] [-P DIR]
[-U|--user-agent AGENT] [-T SEC] URL...
Retrieve files via HTTP or FTP
-s Spider mode - only check file existence
-c Continue retrieval of aborted transfer
-q Quiet
-P DIR Save to DIR (default .)
-T SEC Network read timeout is SEC seconds
-O FILE Save to FILE ('-' for stdout)
-U STR Use STR for User-Agent header
-Y Use proxy ('on' or 'off')
-T looks like "timeout between packages" not, timelimit for the whole operation. You might want to create a subshell and limit the time of this subshell with wget is executed or switch over to curl where a operation-timelimit is supported.
The full wget offers three timeout options:
--dns-timeout, --connect-timeout, and --read-timeout
@corny the full wget is not available here. using a subshell is IMHO a good idea.
To get more precise: What kind of subshell usage do you think of? Using ulimit probably is no good, because one can only limit the cpu-time, which probably isn't anything near 120 seconds even after hours of running wget.
I just tested something like the following:
( sleep 5 && pgrep -P $$ sleep > /dev/null && kill $(pgrep -P $$ sleep)) & sleep 50
This seems to do what we want: The last sleep is killed after 5 seconds if it wasn't terminated (by ^C) beforehand, in which case nothing happens after the first sleep finishes. (I used sleep as a long-running command because I couldn't reproduce the problem with the hanging autoupdater.)
If this is what you were thinking of, we could simply plug this into line 129 and line 192 of the autoupdater. But even the fact that there are two occasions probably indicate that there should be a separate function get_http(url, file) or something like that that handles calling wget appropriately.
What do you think? If I'm not running in the totally wrong direction, I'll gladly prepare a PR.
I'd rather like to do something in Lua instead of adding Shell code (nixio has a fork function which could be used to replace popen with something more powerful, if necessary).
Mostly unrelated to this issue, I've thought about either adding 'exec' to all command calls, or even changing the code not to use /bin/sh at all, to avoid having these unnecesssary shell processes...
I started implementing this in Lua, and the current version is 43 lines long. However, it only imitates io.popen() and not os.execute(). The latter actually seems to be harder to implement, because SIGCHLD is ignored by Lua and doesn't interrupt functions like nixio.nanosleep(timeout) or nixio.poll({}, timeout). My current implementation relies on having an intermediate process forward the stdout, but see for yourself if this solution would be acceptable.
To paraphrase @NeoRaider: As this isn't a regression, it won't be considered in 2016.1. After that, he wants to have the autoupdater rewritten in C. The approach with an additional Lua process is problematic, because we have no RAM to spare.
couldnt we just killall old autoupdater if new one is called ?
I've started working on an autoupdater rewrite that will address this issue.
Workarounds that should fix these issues have been added in freifunk-gluon/packages#147, a nicer fix will follow with the autoupdater rewrite.
Most helpful comment
I've started working on an autoupdater rewrite that will address this issue.