It was initially reported here https://github.com/kubernetes/kubernetes/issues/83903#issuecomment-542867717
but we really should know why these iptables errors are not showing up on the ipv4 jobs, ~is it possible that the different CI hardware is not fast enough to release the lock?~
Another flaky with iptables errors, this time wasn't ip6tables-restore , is the new kindnet that uses go-iptables
2019-10-16T18:20:33.983136705Z stdout F Node kind-worker2 has CIDR fd00:10:244:0:1::/80
2019-10-16T18:20:39.810650256Z stderr F panic: Can't synchronize rules after 3 attempts: running [/sbin/ip6tables -t nat -C KIND-MASQ-AGENT -j MASQUERADE -m comment --comment kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) --wait]: exit status 4: ip6tables: Resource temporarily unavailable.
2019-10-16T18:20:39.810729436Z stderr F
2019-10-16T18:20:39.810738416Z stderr F
2019-10-16T18:20:39.810745057Z stderr F goroutine 20 [running]:
2019-10-16T18:20:39.810754027Z stderr F main.main.func1(0xc00029e150)
2019-10-16T18:20:39.810761796Z stderr F /src/cmd/kindnetd/main.go:79 +0x93
2019-10-16T18:20:39.810768298Z stderr F created by main.main
2019-10-16T18:20:39.810789908Z stderr F /src/cmd/kindnetd/main.go:77 +0x32a
Job link https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance-parallel-ipv6/1184528903125864450
Jobs to track
It's not different hardware. It's a different OS.
All of the CI is on GCE VMs. Most of them are COS, these are Ubuntu.
IMO:
Are we sure what resource is unavailable?
Is it PIDs, FDs, ...?
it seems that based on your comments the HW and resources can be a red herring.
If we try a different angle, maybe the code for the ipv6 iptables implementation is slower that tne ipv4. We can see:
kube-proxy is using iptables v1.6.0
kindnet and cni plugins iptables v1.8.3 (legacy)
there is a PR in 1.17 upgrading the kube-proxy image for using 1.8.3
the diff is interesting,
# git log 'v1.8.3'...'v1.6.0' --oneline | grep lock
31e4b590 iptables-restore: free the table lock when skipping a table
80d8bfaa iptables: insist that the lock is held.
999eaa24 iptables-restore: support acquiring the lock.
b91af533 iptables: set the path of the lock file via a configure option.
72bb3dbf xshared: using the blocking file lock request when we wait indefinitely
24f81746 xshared: do not lock again and again if "-w" option is not specified
e8f857a5 xtables: Add an interval option for xtables lock wait
and regarding ipv6
> git log 'v1.8.3'...'v1.6.0' --oneline | grep ip6 | grep -v nft
bceb483e man: refer to iptables-translate and ip6tables
5839d7fe extensions: libip6t_mh: fix bogus translation error
d89cdfae ip6tables-save: Merge into iptables-save.c
074e307e ip6tables-restore: Merge into iptables-restore.c
9b127b71 extensions: libebt_ip6: fix ip6-dport negation
0800d9b4 ip6tables-translate: Fix libip6t_mh.txlate test
cd79556b xtables: Reserve space for 'opt' column in ip6tables output
b470b8e8 xtables-legacy: fix argv0 name for ip6tables-legacy
bb436ceb xtables-compat: ip6table-save: fix save of ip6 address masks
49f49933 extensions: libip6t_srh.t: Add test cases for psid, nsid, and lsid
155e1c0c extensions: libip6t_srh: support matching previous, next and last SID
992e17d2 xtables-compat: only fetch revisions for ip/ip6
5c8ce9c6 ebtables-compat: add 'ip6' match extension
bda1daa4 extensions: ip6t_{S,D}NAT: add more tests
88fa4543 extensions: ip6t_{S,D}NAT: multiple to-dst/to-src arguments not reported
79fa7cc2 libip6t_icmp6: xlate: remove leftover space
9cd3adbe iptables-restore/ip6tables-restore: add --version/-V argument
bb498c8b extensions: libip6t_ah: Fix translation of plain '-m ah'
837ca1e3 extensions: libxt_iprange: rename "ip saddr" to "ip6 saddr" in ip6tables-xlate
19b38b83 extensions: libip6t_SNAT/DNAT: add square bracket in xlat output when port is specified
c513cc3d ip6tables: Warn about use of DROP in nat table
3e8b1a7c extensions: libip6t_REJECT: Avoid to print the default reject with value in the translation
e7788119 extensions: libip6t_LOG: Avoid to print the default log level in the translation
9e14d433 iptables-translate: pass ipt_entry and ip6t_entry to ->xlate()
It is a different pool of nodes so they could be seeing more load. They also have a different kernel.
The fact that we hit this 3 times spread out over 3 minutes is supicious though.
I would wait for the new kube-proxy image with iptables 1.8.3,
https://github.com/kubernetes/kubernetes/pull/82966
at least there are several interesting commits in the new iptables versions
31e4b590 iptables-restore: free the table lock when skipping a table
80d8bfaa iptables: insist that the lock is held.
999eaa24 iptables-restore: support acquiring the lock.
b91af533 iptables: set the path of the lock file via a configure option.
72bb3dbf xshared: using the blocking file lock request when we wait indefinitely
24f81746 xshared: do not lock again and again if "-w" option is not specified
e8f857a5 xtables: Add an interval option for xtables lock wait
d89cdfae ip6tables-save: Merge into iptables-save.c
074e307e ip6tables-restore: Merge into iptables-restore.c
/cc @danwinship
we were missing the /run/xtables.lock flock file 😞
fixed now, hopefully won't see more issue here. we should still see about getting https://github.com/kubernetes/kubernetes/pull/82966
seems that the kindnet issues should be fixed now, but there were new jobs failing with kube-proxy errors holding the lock.
2019-10-21T09:46:27.735306963Z stderr F I1021 09:46:27.731956 1 trace.go:116] Trace[1894656157]: "iptables restore" (started: 2019-10-21 09:46:25.731523879 +0000 UTC m=+119.750710724) (total time: 2.000359876s):
2019-10-21T09:46:27.735356494Z stderr F Trace[1894656157]: [2.000359876s] [2.000359876s] END
2019-10-21T09:46:27.735368093Z stderr F E1021 09:46:27.732060 1 proxier.go:1433] Failed to execute iptables-restore: failed to acquire new iptables lock: timed out waiting for the condition
2019-10-21T09:48:15.922353008Z stderr F E1021 09:48:15.922087 1 proxier.go:736] Failed to ensure that nat chain POSTROUTING jumps to KUBE-POSTROUTING: error checking rule: exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
That just looks like a flaky test case at a glance.
Clashing with kube-proxy on holding the lock is expected as long as it's transient.
We should avoid actually writing when there's no changes to make.
That just looks like a flaky test case at a glance.
Clashing with kube-proxy on holding the lock is expected as long as it's transient.
We should avoid actually writing when there's no changes to make.
we are using this https://github.com/coreos/go-iptables/blob/master/iptables/iptables.go#L165
it shouldn't write if it exists, however I don't think this is related to kindnet, I will be worried if adding 3 iptables rules take more than 5 seconds XD
I'm not sure this is the optimal approach versus save, parse, and compare,
but it's been a while.
On Mon, Oct 21, 2019 at 10:03 AM Antonio Ojea notifications@github.com
wrote:
That just looks like a flaky test case at a glance.
Clashing with kube-proxy on holding the lock is expected as long as it's
transient.We should avoid actually writing when there's no changes to make.
we are using this
https://github.com/coreos/go-iptables/blob/master/iptables/iptables.go#L165
it shouldn't write if it exists—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/976?email_source=notifications&email_token=AAHADK2QHAEOOIUNJTKUYQTQPXOGVA5CNFSM4JCV7R2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB3BVLA#issuecomment-544610988,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK3FIMTR7W7N6MI67DLQPXOGVANCNFSM4JCV7R2A
.
I'm not sure this is the optimal approach versus save, parse, and compare, but it's been a while.
…
absolutely not optimal for scale, but is much simpler ... and remove the k8s dependency :)
is kindnet, we can always come back 🤷♂
the time holding the lock is negligible, ~1.2 ms per check (it checks 3 times) and is going to be pretty constant since it will always have the same amount of rules
`
pid 15811] 22:09:08.793425 open("/run/xtables.lock", O_RDONLY|O_CREAT, 0600 <unfinished ...>
[pid 1032] 22:09:08.793444 nanosleep({tv_sec=0, tv_nsec=2560000}, <unfinished ...>
[pid 15811] 22:09:08.793464 <... open resumed> ) = 3 <0.000023>
[pid 15811] 22:09:08.793494 flock(3, LOCK_EX|LOCK_NB) = 0 <0.000012>
[pid 15811] 22:09:08.793534 socket(AF_INET6, SOCK_RAW, IPPROTO_RAW) = 4 <0.000022>
[pid 15811] 22:09:08.793583 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 <0.000011>
[pid 15811] 22:09:08.793653 getsockopt(4, SOL_IPV6, IP6T_SO_GET_INFO, "nat\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [84]) =
0 <0.000015>
[pid 15811] 22:09:08.793721 getsockopt(4, SOL_IPV6, IP6T_SO_GET_ENTRIES, "nat\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [2658
4]) = 0 <0.000044>
[pid 15811] 22:09:08.793836 stat("/usr/lib/x86_64-linux-gnu/xtables/libip6t_standard.so", 0x7fffc3c14d30) = -1 ENOENT (No such file or directory)
<0.000013>
[pid 15811] 22:09:08.793895 stat("/usr/lib/x86_64-linux-gnu/xtables/libxt_standard.so", {st_mode=S_IFREG|0644, st_size=6208, ...}) = 0 <0.000016>
[pid 15811] 22:09:08.793946 open("/usr/lib/x86_64-linux-gnu/xtables/libxt_standard.so", O_RDONLY|O_CLOEXEC) = 5 <0.000027>
[pid 15811] 22:09:08.794011 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\6\0\0\0\0\0\0"..., 832) = 832 <0.000022>
[pid 15811] 22:09:08.794076 fstat(5, {st_mode=S_IFREG|0644, st_size=6208, ...}) = 0 <0.000019>
[pid 15811] 22:09:08.794160 mmap(NULL, 2101488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7fc615cca000 <0.000022>
[pid 15811] 22:09:08.794217 mprotect(0x7fc615ccb000, 2093056, PROT_NONE) = 0 <0.000027>
[pid 15811] 22:09:08.794277 mmap(0x7fc615eca000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0) = 0x7fc615eca000 <0.000024
>
[pid 15811] 22:09:08.794371 close(5) = 0 <0.000013>
[pid 15811] 22:09:08.794451 mprotect(0x7fc615eca000, 4096, PROT_READ) = 0 <0.000015>
[pid 15811] 22:09:08.794543 close(4) = 0 <0.000020>
[pid 15811] 22:09:08.794619 exit_group(0) = ?
https://github.com/kubernetes/kubernetes/pull/83387/files
On Mon, Oct 21, 2019 at 3:31 PM Antonio Ojea notifications@github.com
wrote:
the time holding the lock is negligible, ~12 ms per check (it checks 3
times) and is going to be pretty constant since it will always have the
same amount of rulespid 15811] 22:09:08.793425 open("/run/xtables.lock", O_RDONLY|O_CREAT, 0600
[pid 1032] 22:09:08.793444 nanosleep({tv_sec=0, tv_nsec=2560000}, [pid 15811] 22:09:08.793464 <... open resumed> ) = 3 <0.000023>
[pid 15811] 22:09:08.793494 flock(3, LOCK_EX|LOCK_NB) = 0 <0.000012>
[pid 15811] 22:09:08.793534 socket(AF_INET6, SOCK_RAW, IPPROTO_RAW) = 4 <0.000022>
[pid 15811] 22:09:08.793583 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 <0.000011>
[pid 15811] 22:09:08.793653 getsockopt(4, SOL_IPV6, IP6T_SO_GET_INFO, "nat\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [84]) =
0 <0.000015>
[pid 15811] 22:09:08.793721 getsockopt(4, SOL_IPV6, IP6T_SO_GET_ENTRIES, "nat\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [2658
4]) = 0 <0.000044>
[pid 15811] 22:09:08.793836 stat("/usr/lib/x86_64-linux-gnu/xtables/libip6t_standard.so", 0x7fffc3c14d30) = -1 ENOENT (No such file or directory)
<0.000013>
[pid 15811] 22:09:08.793895 stat("/usr/lib/x86_64-linux-gnu/xtables/libxt_standard.so", {st_mode=S_IFREG|0644, st_size=6208, ...}) = 0 <0.000016>
[pid 15811] 22:09:08.793946 open("/usr/lib/x86_64-linux-gnu/xtables/libxt_standard.so", O_RDONLY|O_CLOEXEC) = 5 <0.000027>
[pid 15811] 22:09:08.794011 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\6\0\0\0\0\0\0"..., 832) = 832 <0.000022>
[pid 15811] 22:09:08.794076 fstat(5, {st_mode=S_IFREG|0644, st_size=6208, ...}) = 0 <0.000019>
[pid 15811] 22:09:08.794160 mmap(NULL, 2101488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7fc615cca000 <0.000022>
[pid 15811] 22:09:08.794217 mprotect(0x7fc615ccb000, 2093056, PROT_NONE) = 0 <0.000027>
[pid 15811] 22:09:08.794277 mmap(0x7fc615eca000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0) = 0x7fc615eca000 <0.000024
>
[pid 15811] 22:09:08.794371 close(5) = 0 <0.000013>
[pid 15811] 22:09:08.794451 mprotect(0x7fc615eca000, 4096, PROT_READ) = 0 <0.000015>
[pid 15811] 22:09:08.794543 close(4) = 0 <0.000020>
[pid 15811] 22:09:08.794619 exit_group(0) = ?
```—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/976?email_source=notifications&email_token=AAHADK47EFJNXTYPZU7COSTQPYUTNA5CNFSM4JCV7R2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB4ADCQ#issuecomment-544735626,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK6YKXCK5EANJ2SPNW3QPYUTNANCNFSM4JCV7R2A
.
Unless you have a very large cluster (eg, at least 5000 services) or a very constrained CI system you should not be hitting the 5s iptables timeout. Something is going wrong with the test.
Most helpful comment
Unless you have a very large cluster (eg, at least 5000 services) or a very constrained CI system you should not be hitting the 5s iptables timeout. Something is going wrong with the test.