Test-infra: all kubeadm jobs are failing due to use of ubuntu-1604-xenial-v20160420c image

Created on 15 Dec 2017  路  5Comments  路  Source: kubernetes/test-infra

See: https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all and https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce

All jobs started failing sometime around 12/14 5am PST, eg:

Discussion from #sig-release:

talked to dawn and she suspects it's because the kubeadm job (via kubernetes-anywhere) is using the vanilla ubuntu image, and GCE may have just rolled out some change that triggers a bug in that particular version
the reason other tests pass is because they either use COS or they use a special GKE build of the ubuntu image

Suggested workaround: use COS or gke-specific variant of ubuntu 1604 that other jobs use, eg:

  • ubuntu-gke-1604-lts
  • ubuntu-gke-1604-xenial-v20171108-1
  • ubuntu-gke-1604-xenial-v20170816-1

/cc @enisoc @dchen1107 to make sure I got the wording right
FYI @kubernetes/sig-cluster-lifecycle-bugs @luxas

Failure looks like

ssh: connect to host 35.226.76.186 port 22: Connection timed out

We looked at the serial console from a live instance and found a kernel panic, eg:

SeaBIOS (version 1.8.2-20171012_061934-google)
Total RAM Size = 0x00000000f0000000 = 3840 MiB
CPUs found: 1     Max CPUs supported: 1
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=20971520 = 10240 MiB
drive 0x000f3070: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=20971520
Booting from Hard Disk 0...
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.4.0-21-generic (buildd@lgw01-21) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic 4.4.6)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=98c51306-83a2-49da-94a9-2a841c9f27b0 ro console=ttyS0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bfffcfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bfffd000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffbc000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000012fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.4 present.
[    0.000000] Hypervisor detected: KVM
[    0.000000] e820: last_pfn = 0x130000 max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
[    0.000000] e820: last_pfn = 0xbfffd max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000f32c0-0x000f32cf] mapped at [ffff8800000f32c0]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] RAMDISK: [mem 0x37104000-0x37879fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F30B0 000014 (v00 Google)
[    0.000000] ACPI: RSDT 0x00000000BFFFDCD0 000034 (v01 Google GOOGRSDT 00000001 GOOG 00000001)
[    0.000000] ACPI: FACP 0x00000000BFFFFF00 0000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001)
[    0.000000] ACPI: DSDT 0x00000000BFFFDD10 0017B2 (v01 Google GOOGDSDT 00000001 GOOG 00000001)
[    0.000000] ACPI: FACS 0x00000000BFFFFEC0 000040
[    0.000000] ACPI: FACS 0x00000000BFFFFEC0 000040
[    0.000000] ACPI: SSDT 0x00000000BFFFF5F0 0008CF (v01 Google GOOGSSDT 00000001 GOOG 00000001)
[    0.000000] ACPI: APIC 0x00000000BFFFF500 00006E (v01 Google GOOGAPIC 00000001 GOOG 00000001)
[    0.000000] ACPI: WAET 0x00000000BFFFF4D0 000028 (v01 Google GOOGWAET 00000001 GOOG 00000001)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000012fffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x12fff9000-0x12fffdfff]
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 1:2fff5001, primary cpu clock
[    0.000000] kvm-clock: using sched offset of 11082341458 cycles
[    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000012fffffff]
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x00000000bfffcfff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000012fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000012fffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0xbfffd000-0xbfffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xc0000000-0xfffbbfff]
[    0.000000] PM: Registered nosave memory: [mem 0xfffbc000-0xffffffff]
[    0.000000] e820: [mem 0xc0000000-0xfffbbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 33 pages/cpu @ffff88012fc00000 s98008 r8192 d28968 u2097152
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 967558
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-21-generic root=UUID=98c51306-83a2-49da-94a9-2a841c9f27b0 ro console=ttyS0
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 3777668K/3931756K available (8356K kernel code, 1278K rwdata, 3920K rodata, 1476K init, 1292K bss, 154088K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=1
[    0.000000] NR_IRQS:16640 nr_irqs:256 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [ttyS0] enabled
[    0.000000] tsc: Detected 2499.998 MHz processor
[    0.155923] Calibrating delay loop (skipped) preset value.. 4999.99 BogoMIPS (lpj=9999992)
[    0.157136] pid_max: default: 32768 minimum: 301
[    0.157828] ACPI: Core revision 20150930
[    0.159756] ACPI: 2 ACPI AML tables successfully acquired and loaded
[    0.160707] Security Framework initialized
[    0.161298] Yama: becoming mindful.
[    0.161852] AppArmor: AppArmor initialized
[    0.163307] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.166279] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.167644] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
[    0.168579] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
[    0.169687] Initializing cgroup subsys io
[    0.170246] Initializing cgroup subsys memory
[    0.170844] Initializing cgroup subsys devices
[    0.171475] Initializing cgroup subsys freezer
[    0.172082] Initializing cgroup subsys net_cls
[    0.172699] Initializing cgroup subsys perf_event
[    0.173377] Initializing cgroup subsys net_prio
[    0.174033] Initializing cgroup subsys hugetlb
[    0.174651] Initializing cgroup subsys pids
[    0.175337] CPU: Physical Processor ID: 0
[    0.176639] mce: CPU supports 32 MCE banks
[    0.177330] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[    0.178102] Last level dTLB entries: 4KB 512, 2MB 0, 4MB 0, 1GB 4
[    0.192089] Freeing SMP alternatives memory: 28K (ffffffff820b2000 - ffffffff820b9000)
[    0.199521] ftrace: allocating 31878 entries in 125 pages
[    0.237766] divide error: 0000 [#1] SMP 
[    0.238495] Modules linked in:
[    0.238983] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.0-21-generic #37-Ubuntu
[    0.240027] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[    0.241268] task: ffff88012af80000 ti: ffff88012af88000 task.ti: ffff88012af88000
[    0.242283] RIP: 0010:[<ffffffff81f6f5de>]  [<ffffffff81f6f5de>] smp_store_boot_cpu_info+0x51/0x17f
[    0.243547] RSP: 0000:ffff88012af8beb8  EFLAGS: 00010286
[    0.244272] RAX: 0000000000000000 RBX: ffffffff81f34f60 RCX: 0000000000000000
[    0.245250] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88012fc0a180
[    0.246226] RBP: ffff88012af8bed8 R08: 0000000000000000 R09: 0000000000000001
[    0.247207] R10: ffffffff81a11ee0 R11: ffffffff81a11ec0 R12: 00000000ffffffff
[    0.248178] R13: 000000000000a0a0 R14: 000000000000a192 R15: 0000000000000000
[    0.249131] FS:  0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[    0.250204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.250996] CR2: ffff88012ffff000 CR3: 0000000001e0a000 CR4: 00000000001406f0
[    0.251990] Stack:
[    0.252286]  ffffffff81f34f60 0000000000000100 000000000000a0a0 0000000000000000
[    0.253445]  ffff88012af8bf08 ffffffff81f6f763 ffffffff82089ef8 ffff88012af806a8
[    0.254602]  0000000000000001 0000000000000000 ffff88012af8bf38 ffffffff81f5a0e5
[    0.255799] Call Trace:
[    0.256158]  [<ffffffff81f6f763>] native_smp_prepare_cpus+0x57/0x2eb
[    0.257031]  [<ffffffff81f5a0e5>] kernel_init_freeable+0xb3/0x212
[    0.257898]  [<ffffffff81817f30>] ? rest_init+0x80/0x80
[    0.258660]  [<ffffffff81817f3e>] kernel_init+0xe/0xe0
[    0.259415]  [<ffffffff8182488f>] ret_from_fork+0x3f/0x70
[    0.260163]  [<ffffffff81817f30>] ? rest_init+0x80/0x80
[    0.260886] Code: 53 41 83 cc ff 49 c7 c6 92 a1 00 00 48 89 c7 f3 a5 66 c7 80 da 00 00 00 00 00 0f b7 35 b4 36 fc ff 8b 05 16 88 27 00 8d 44 06 ff <f7> f6 31 d2 89 05 78 4a fc ff 8d 86 ff 7f 00 00 f7 f6 be c0 00 
[    0.266092] RIP  [<ffffffff81f6f5de>] smp_store_boot_cpu_info+0x51/0x17f
[    0.267029]  RSP <ffff88012af8beb8>
[    0.267600] ---[ end trace 5e570ee6dbb3edb9 ]---
[    0.268247] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.268247] 
[    0.269490] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.269490]
kinbug sicluster-lifecycle

Most helpful comment

The latest kubeadm presubmit run is green:

https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce&width=80

All 5 comments

/reopen
leaving this open until I see green on https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce or one of the periodics

The latest kubeadm presubmit run is green:

https://k8s-testgrid.appspot.com/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-kubeadm-gce&width=80

I think we can probably close this now?

Sounds good to me. The job is failing again, but for new individual test failures, not because of the image.

/close

Was this page helpful?
0 / 5 - 0 ratings