See: https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all and https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce
All jobs started failing sometime around 12/14 5am PST, eg:
Discussion from #sig-release:
talked to dawn and she suspects it's because the kubeadm job (via kubernetes-anywhere) is using the vanilla ubuntu image, and GCE may have just rolled out some change that triggers a bug in that particular version
the reason other tests pass is because they either use COS or they use a special GKE build of the ubuntu image
Suggested workaround: use COS or gke-specific variant of ubuntu 1604 that other jobs use, eg:
/cc @enisoc @dchen1107 to make sure I got the wording right
FYI @kubernetes/sig-cluster-lifecycle-bugs @luxas
Failure looks like
ssh: connect to host 35.226.76.186 port 22: Connection timed out
We looked at the serial console from a live instance and found a kernel panic, eg:
SeaBIOS (version 1.8.2-20171012_061934-google)
Total RAM Size = 0x00000000f0000000 = 3840 MiB
CPUs found: 1 Max CPUs supported: 1
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=20971520 = 10240 MiB
drive 0x000f3070: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=20971520
Booting from Hard Disk 0...
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 4.4.0-21-generic (buildd@lgw01-21) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic 4.4.6)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=98c51306-83a2-49da-94a9-2a841c9f27b0 ro console=ttyS0
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] x86/fpu: Using 'eager' FPU context switches.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bfffcfff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bfffd000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffbc000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000012fffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] e820: last_pfn = 0x130000 max_arch_pfn = 0x400000000
[ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT
[ 0.000000] e820: last_pfn = 0xbfffd max_arch_pfn = 0x400000000
[ 0.000000] found SMP MP-table at [mem 0x000f32c0-0x000f32cf] mapped at [ffff8800000f32c0]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] RAMDISK: [mem 0x37104000-0x37879fff]
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000000F30B0 000014 (v00 Google)
[ 0.000000] ACPI: RSDT 0x00000000BFFFDCD0 000034 (v01 Google GOOGRSDT 00000001 GOOG 00000001)
[ 0.000000] ACPI: FACP 0x00000000BFFFFF00 0000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001)
[ 0.000000] ACPI: DSDT 0x00000000BFFFDD10 0017B2 (v01 Google GOOGDSDT 00000001 GOOG 00000001)
[ 0.000000] ACPI: FACS 0x00000000BFFFFEC0 000040
[ 0.000000] ACPI: FACS 0x00000000BFFFFEC0 000040
[ 0.000000] ACPI: SSDT 0x00000000BFFFF5F0 0008CF (v01 Google GOOGSSDT 00000001 GOOG 00000001)
[ 0.000000] ACPI: APIC 0x00000000BFFFF500 00006E (v01 Google GOOGAPIC 00000001 GOOG 00000001)
[ 0.000000] ACPI: WAET 0x00000000BFFFF4D0 000028 (v01 Google GOOGWAET 00000001 GOOG 00000001)
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000012fffffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x12fff9000-0x12fffdfff]
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 1:2fff5001, primary cpu clock
[ 0.000000] kvm-clock: using sched offset of 11082341458 cycles
[ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x000000012fffffff]
[ 0.000000] Device empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x00000000bfffcfff]
[ 0.000000] node 0: [mem 0x0000000100000000-0x000000012fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000012fffffff]
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xbfffd000-0xbfffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0xc0000000-0xfffbbfff]
[ 0.000000] PM: Registered nosave memory: [mem 0xfffbc000-0xffffffff]
[ 0.000000] e820: [mem 0xc0000000-0xfffbbfff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 33 pages/cpu @ffff88012fc00000 s98008 r8192 d28968 u2097152
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 967558
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=98c51306-83a2-49da-94a9-2a841c9f27b0 ro console=ttyS0
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Memory: 3777668K/3931756K available (8356K kernel code, 1278K rwdata, 3920K rodata, 1476K init, 1292K bss, 154088K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] Build-time adjustment of leaf fanout to 64.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=1
[ 0.000000] NR_IRQS:16640 nr_irqs:256 16
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [ttyS0] enabled
[ 0.000000] tsc: Detected 2499.998 MHz processor
[ 0.155923] Calibrating delay loop (skipped) preset value.. 4999.99 BogoMIPS (lpj=9999992)
[ 0.157136] pid_max: default: 32768 minimum: 301
[ 0.157828] ACPI: Core revision 20150930
[ 0.159756] ACPI: 2 ACPI AML tables successfully acquired and loaded
[ 0.160707] Security Framework initialized
[ 0.161298] Yama: becoming mindful.
[ 0.161852] AppArmor: AppArmor initialized
[ 0.163307] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 0.166279] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.167644] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.168579] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.169687] Initializing cgroup subsys io
[ 0.170246] Initializing cgroup subsys memory
[ 0.170844] Initializing cgroup subsys devices
[ 0.171475] Initializing cgroup subsys freezer
[ 0.172082] Initializing cgroup subsys net_cls
[ 0.172699] Initializing cgroup subsys perf_event
[ 0.173377] Initializing cgroup subsys net_prio
[ 0.174033] Initializing cgroup subsys hugetlb
[ 0.174651] Initializing cgroup subsys pids
[ 0.175337] CPU: Physical Processor ID: 0
[ 0.176639] mce: CPU supports 32 MCE banks
[ 0.177330] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[ 0.178102] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0, 1GB 4
[ 0.192089] Freeing SMP alternatives memory: 28K (ffffffff820b2000 - ffffffff820b9000)
[ 0.199521] ftrace: allocating 31878 entries in 125 pages
[ 0.237766] divide error: 0000 [#1] SMP
[ 0.238495] Modules linked in:
[ 0.238983] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.0-21-generic #37-Ubuntu
[ 0.240027] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 0.241268] task: ffff88012af80000 ti: ffff88012af88000 task.ti: ffff88012af88000
[ 0.242283] RIP: 0010:[<ffffffff81f6f5de>] [<ffffffff81f6f5de>] smp_store_boot_cpu_info+0x51/0x17f
[ 0.243547] RSP: 0000:ffff88012af8beb8 EFLAGS: 00010286
[ 0.244272] RAX: 0000000000000000 RBX: ffffffff81f34f60 RCX: 0000000000000000
[ 0.245250] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88012fc0a180
[ 0.246226] RBP: ffff88012af8bed8 R08: 0000000000000000 R09: 0000000000000001
[ 0.247207] R10: ffffffff81a11ee0 R11: ffffffff81a11ec0 R12: 00000000ffffffff
[ 0.248178] R13: 000000000000a0a0 R14: 000000000000a192 R15: 0000000000000000
[ 0.249131] FS: 0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[ 0.250204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.250996] CR2: ffff88012ffff000 CR3: 0000000001e0a000 CR4: 00000000001406f0
[ 0.251990] Stack:
[ 0.252286] ffffffff81f34f60 0000000000000100 000000000000a0a0 0000000000000000
[ 0.253445] ffff88012af8bf08 ffffffff81f6f763 ffffffff82089ef8 ffff88012af806a8
[ 0.254602] 0000000000000001 0000000000000000 ffff88012af8bf38 ffffffff81f5a0e5
[ 0.255799] Call Trace:
[ 0.256158] [<ffffffff81f6f763>] native_smp_prepare_cpus+0x57/0x2eb
[ 0.257031] [<ffffffff81f5a0e5>] kernel_init_freeable+0xb3/0x212
[ 0.257898] [<ffffffff81817f30>] ? rest_init+0x80/0x80
[ 0.258660] [<ffffffff81817f3e>] kernel_init+0xe/0xe0
[ 0.259415] [<ffffffff8182488f>] ret_from_fork+0x3f/0x70
[ 0.260163] [<ffffffff81817f30>] ? rest_init+0x80/0x80
[ 0.260886] Code: 53 41 83 cc ff 49 c7 c6 92 a1 00 00 48 89 c7 f3 a5 66 c7 80 da 00 00 00 00 00 0f b7 35 b4 36 fc ff 8b 05 16 88 27 00 8d 44 06 ff <f7> f6 31 d2 89 05 78 4a fc ff 8d 86 ff 7f 00 00 f7 f6 be c0 00
[ 0.266092] RIP [<ffffffff81f6f5de>] smp_store_boot_cpu_info+0x51/0x17f
[ 0.267029] RSP <ffff88012af8beb8>
[ 0.267600] ---[ end trace 5e570ee6dbb3edb9 ]---
[ 0.268247] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.268247]
[ 0.269490] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.269490]
/reopen
leaving this open until I see green on https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce or one of the periodics
The latest kubeadm presubmit run is green:
https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce&width=80
I think we can probably close this now?
Sounds good to me. The job is failing again, but for new individual test failures, not because of the image.
/close
Most helpful comment
The latest kubeadm presubmit run is green:
https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce&width=80