I have been following the Calico the hardway documentation and I spun up a K8s HA cluster with multi-master nodes. I'm using AWS as my cloud provider and nodes are distributed across two AZs. I have followed the multi-az guide described here and I have configured an IPPool as below:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: ippool-1
spec:
cidr: 10.200.0.0/16
ipipMode: CrossSubnet
natOutgoing: true
Now calico nodes are coming up and ending up with "CrashLoopBackOff" state and pod logs says as below:
root@ip-10-0-4-193:~# ks get pods
NAME READY STATUS RESTARTS AGE
calico-node-p8b6m 0/1 CrashLoopBackOff 22 91m
calico-node-wl8th 0/1 CrashLoopBackOff 23 97m
calico-typha-c4fdd98d9-7jfsr 1/1 Running 0 105m
calico-typha-c4fdd98d9-n7n5f 1/1 Running 0 105m
root@ip-10-0-4-193:~# ks logs calico-node-wl8th
2020-11-08 08:50:30.541 [INFO][8] startup.go 256: Early log level set to info
2020-11-08 08:50:30.541 [INFO][8] startup.go 272: Using NODENAME environment for node name
2020-11-08 08:50:30.541 [INFO][8] startup.go 284: Determined node name: ip-10-0-4-218.ec2.internal
2020-11-08 08:50:30.542 [INFO][8] k8s.go 228: Using Calico IPAM
2020-11-08 08:50:30.542 [INFO][8] startup.go 316: Checking datastore connection
2020-11-08 08:50:30.555 [INFO][8] startup.go 340: Datastore connection verified
2020-11-08 08:50:30.556 [INFO][8] startup.go 95: Datastore is ready
2020-11-08 08:50:30.568 [INFO][8] startup.go 584: Using autodetected IPv4 address on interface eth0: 10.0.4.218/24
2020-11-08 08:50:30.568 [INFO][8] startup.go 647: No AS number configured on node resource, using global value
2020-11-08 08:50:30.568 [INFO][8] startup.go 149: Setting NetworkUnavailable to False
2020-11-08 08:50:30.597 [INFO][8] startup.go 530: FELIX_IPV6SUPPORT is false through environment variable
2020-11-08 08:50:30.612 [INFO][8] startup.go 181: Using node name: ip-10-0-4-218.ec2.internal
2020-11-08 08:50:30.639 [INFO][16] k8s.go 228: Using Calico IPAM
2020-11-08 08:50:30.664 [INFO][16] ipam.go 87: Auto-assign 1 ipv4, 0 ipv6 addrs for host 'ip-10-0-4-218.ec2.internal'
2020-11-08 08:50:30.673 [INFO][16] ipam.go 313: Looking up existing affinities for host handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.679 [INFO][16] ipam.go 381: Trying affinity for 10.200.54.64/26 handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.684 [INFO][16] ipam.go 135: Attempting to load block cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.689 [INFO][16] ipam.go 140: The referenced block doesn't exist, trying to create it cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.696 [INFO][16] ipam.go 147: Wrote affinity as pending cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.700 [INFO][16] ipam.go 156: Attempting to claim the block cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.700 [INFO][16] ipam_block_reader_writer.go 183: Attempting to create a new block host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.705 [WARNING][16] ipam_block_reader_writer.go 219: Problem creating block while claiming block error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.705 [WARNING][16] ipam.go 159: Error claiming block cidr=10.200.54.64/26 error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.705 [WARNING][16] ipam.go 396: Couldn't get block for affinity, try next one error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.705 [INFO][16] ipam.go 413: Block '10.200.54.64/26' provided addresses: [] handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.705 [INFO][16] ipam.go 371: Ran out of existing affine blocks for host handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.710 [INFO][16] ipam.go 436: No more affine blocks, but need to allocate 1 more addresses - allocate another block handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.710 [INFO][16] ipam.go 440: Looking for an unclaimed block handle="ipip-tunnel-addr-ip-10-0-4-218.ec2.internal" host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.713 [INFO][16] ipam_block_reader_writer.go 114: Found free block: 10.200.54.64/26
2020-11-08 08:50:30.714 [INFO][16] ipam.go 452: Found unclaimed block host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.714 [INFO][16] ipam_block_reader_writer.go 130: Trying to create affinity in pending state host="ip-10-0-4-218.ec2.internal"subnet=10.200.54.64/26
2020-11-08 08:50:30.723 [INFO][16] ipam_block_reader_writer.go 141: Block affinity already exists, getting existing affinity host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.728 [INFO][16] ipam_block_reader_writer.go 149: Got existing affinity host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.728 [INFO][16] ipam_block_reader_writer.go 153: Marking existing affinity with current state pending as pending host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.735 [INFO][16] ipam.go 135: Attempting to load block cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.739 [INFO][16] ipam.go 140: The referenced block doesn't exist, trying to create it cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.745 [INFO][16] ipam.go 147: Wrote affinity as pending cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.748 [INFO][16] ipam.go 156: Attempting to claim the block cidr=10.200.54.64/26 host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.748 [INFO][16] ipam_block_reader_writer.go 183: Attempting to create a new block host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.751 [WARNING][16] ipam_block_reader_writer.go 219: Problem creating block while claiming block error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.751 [WARNING][16] ipam.go 159: Error claiming block cidr=10.200.54.64/26 error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value host="ip-10-0-4-218.ec2.internal"
2020-11-08 08:50:30.751 [ERROR][16] ipam.go 479: Error getting block for affinity error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value host="ip-10-0-4-218.ec2.internal" subnet=10.200.54.64/26
2020-11-08 08:50:30.751 [ERROR][16] ipam.go 101: Error assigning IPV4 addresses: IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value
2020-11-08 08:50:30.751 [FATAL][16] allocateip.go 158: Unable to autoassign an address error=IPAMBlock.crd.projectcalico.org "10-200-54-64-26" is invalid: spec.deleted: Required value type="ipipTunnelAddress"
Calico node failed to start
Any thoughts on this what went wrong and how to fix this?
I just wanted to complete the Calico Hardway setup with fully functioning Kubernetes HA Cluster
root@ip-10-0-4-193:~# calicoctl version
Client Version: v3.14.0
Git commit: c97876ba
Cluster Version: v3.8.0
Cluster Type: typha,kdd,k8s,bgp
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Thought these output would be helpful to understand the current situation:
root@ip-10-0-4-193:~# calicoctl ipam show --show-blocks
+----------+---------------+-----------+------------+--------------+
| GROUPING | CIDR | IPS TOTAL | IPS IN USE | IPS FREE |
+----------+---------------+-----------+------------+--------------+
| IP Pool | 10.200.0.0/16 | 65536 | 0 (0%) | 65536 (100%) |
+----------+---------------+-----------+------------+--------------+
root@ip-10-0-4-193:~# calicoctl ipam show --show-configuration
+--------------------+-------+
| PROPERTY | VALUE |
+--------------------+-------+
| StrictAffinity | false |
| AutoAllocateBlocks | true |
+--------------------+-------+
root@ip-10-0-4-193:~# calicoctl ipam show --show-borrowed
+----+----------------+-------+-------------+------+--------------+
| IP | BORROWING-NODE | BLOCK | BLOCK OWNER | TYPE | ALLOCATED-TO |
+----+----------------+-------+-------------+------+--------------+
+----+----------------+-------+-------------+------+--------------+
@fasaxc any ideas?
I'm told that the problem is that the CRD has an optional field marked as required. I think that we fixed this in newer releases, but the "hard way" uses its own manifests that we don't update as regularly, so it probably hasn't got the fix yet.
I can confirm that, this approach is perfectly working and I didn't see such issues with that. So we might need to wait to proceed with the "hardway" then. Thanks for the support.
Ah. The "Hard Way" is not really intended for production installs - its supposed to be a tutorial for people to learn how all the components of kubernetes (and Calico) fit together (which is why its manifests don't get updated so often).
If that's not clear, it probably needs to be made more clear.
Exactly, It was in my hardway cluster and that was the main intention of the use which I was trying to achieve. Prod is using the given article which succeeds the deployment. Thanks. :)
You could probably work around this in the hardway documentation by removing the "deleted" field from the "required" section in the IPPools CRD. Now you will really be getting into the "hardway" :D
But agree this sounds like a bug that we fixed in the main manifests already but needs to be ported into the hardway documentation as well.
First part of fix here: https://github.com/projectcalico/libcalico-go/pull/1347
Second part here: https://github.com/projectcalico/calico/pull/4205