Origin: origin 3.9.0 - Unable to create vSphere storage - nodeVmDetail is empty

Created on 3 May 2018  路  12Comments  路  Source: openshift/origin

Unable to create vSphere storage with origin 3.9.0

Error-Message: "Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []"

Version
oc v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://cp-lb-01.cloud.mycompany.com:443
openshift v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
Additional Informations

I found this Bug-Ticket
The fix is for this bug are the ClusterRole "system:vsphere-cloud-provider" and the ClusterRoleBinding "system:vsphere-cloud-provider". Therefore I listed the content of my actual ClusterRole and ClusterRoleBinding.

Maybe this issues are related to my problem:
https://github.com/kubernetes/kubernetes/issues/58927
https://github.com/vmware/kubernetes/issues/450
If this Issue is related then the fix is in K8s 1.9.4 with this commit

I tried a lot with Openshift configuration after ansible-deployment, therefore I print all snippets.

I rewrote my configuration to the new style, using this documentation.

Steps To Reproduce
  1. Install Openshift 3.9 with vSphere cloud-provider
  2. Check for clusterrole
- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRole
  metadata:
    annotations:
      authorization.openshift.io/system-only: "true"
      openshift.io/reconcile-protect: "false"
      rbac.authorization.kubernetes.io/autoupdate: "true"
    creationTimestamp: 2018-04-26T16:32:27Z
    labels:
      kubernetes.io/bootstrapping: rbac-defaults
    name: system:vsphere-cloud-provider
    namespace: ""
    resourceVersion: "1675333"
    selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/system%3Avsphere-cloud-provider
    uid: 6896110e-496f-11e8-a170-00505694394e
  rules:
  - apiGroups:
    - ""
    resources:
    - nodes
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - ""
    resources:
    - events
    verbs:
    - create
    - patch
    - update
  1. Check for clusterrolebinding
- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRoleBinding
  metadata:
    annotations:
      openshift.io/reconcile-protect: "false"
      rbac.authorization.kubernetes.io/autoupdate: "true"
    creationTimestamp: 2018-04-26T16:32:27Z
    labels:
      kubernetes.io/bootstrapping: rbac-defaults
    name: system:vsphere-cloud-provider
    namespace: ""
    resourceVersion: "1674944"
    selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system%3Avsphere-cloud-provider
    uid: 6897dfcb-496f-11e8-a170-00505694394e
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: system:vsphere-cloud-provider
  subjects:
  - kind: ServiceAccount
    name: vsphere-cloud-provider
    namespace: kube-system
  1. Configure vSphere cloud-provider on master
    /etc/origin/master/master-config.yaml
...
kubernetesMasterConfig:
  apiServerArguments:
    cloud-provider:
    - "vsphere"
    cloud-config:
    - "/etc/origin/cloudprovider/vsphere.conf"
    runtime-config:
    - apis/settings.k8s.io/v1alpha1=true
    storage-backend:
    - etcd3
    storage-media-type:
    - application/vnd.kubernetes.protobuf
  controllerArguments:
    cloud-config:
    - /etc/origin/cloudprovider/vsphere.conf
    cloud-provider:
    - vsphere
...

/etc/origin/cloudprovider/vsphere.conf

[Global]
        user = "MyAdminUser" 
        password = "MySuperSecurePassword" 
        port = "443" 
        insecure-flag = "1" 
        datacenters = "OCP-Datacenter" 
        datastore = "iscsi-hdd" 
[VirtualCenter "10.y.y.xxx"]

[Workspace]
        server = "10.y.y.xxx"
        datacenter = "OCP-Datacenter"
        default-datastore = "iscsi-hdd"
        folder = "/OCP-Datacenter/vm"
[Disk]
    scsicontrollertype = pvscsi
  1. Create a StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: 2018-04-26T16:25:02Z
  name: slow
  resourceVersion: "43413"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/slow
  uid: 5ee47fb3-496e-11e8-a170-00505694394e
parameters:
  datastore: iscsi-hdd
  diskformat: thin
  fstype: ext3
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
  1. Configure vSphere cloud-provider on node
    /etc/origin/node/node-config.yaml
...
kubeletArguments: 
  cloud-provider:
  - "vsphere"
...
Current Result

Provisioning Failed: Failed to provision volume with StorageClass "fast": Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []

Log in origin-master-controller:

Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482751    2728 pv_controller_base.go:402] resyncing PV controller
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482821    2728 pv_controller_base.go:529] storeObjectUpdate updating claim "openshift-ansible-service-broker/etcd" with version 7264
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482844    2728 pv_controller.go:228] synchronizing PersistentVolumeClaim[openshift-ansible-service-broker/etcd]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482865    2728 pv_controller.go:310] synchronizing unbound PersistentVolumeClaim[openshift-ansible-service-broker/etcd]: no volume found
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482892    2728 pv_controller.go:648] updating PersistentVolumeClaim[openshift-ansible-service-broker/etcd] status: set phase Pending
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482907    2728 pv_controller.go:693] updating PersistentVolumeClaim[openshift-ansible-service-broker/etcd] status: phase Pending already set
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482921    2728 pv_controller_base.go:529] storeObjectUpdate updating claim "test-storage/test-storage" with version 1875650
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482932    2728 pv_controller.go:228] synchronizing PersistentVolumeClaim[test-storage/test-storage]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482940    2728 pv_controller.go:310] synchronizing unbound PersistentVolumeClaim[test-storage/test-storage]: no volume found
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482947    2728 pv_controller.go:1315] provisionClaim[test-storage/test-storage]: started
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482954    2728 pv_controller.go:1523] scheduleOperation[provision-test-storage/test-storage[0aa90544-4eb0-11e8-a35a-005056943169]]
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482975    2728 pv_controller.go:1334] provisionClaimOperation [test-storage/test-storage] started, class: "slow"
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.483463    2728 event.go:218] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-ansible-service-broker", Name:"etcd", UID:"e65f983f-4953-11e8-bfa6-00505694394e", APIVersion:"v1", ResourceVersion:"7264", FieldPath:""}): type: 'Normal' reason: 'FailedBinding' no persistent volumes available for this claim and no storage class is set
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.493299    2728 vsphere_volume_util.go:114] Setting fstype as "ext3"
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.493314    2728 vsphere_volume_util.go:137] VSANStorageProfileData in vsphere volume ""
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.493330    2728 vsphere.go:1007] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:3145728 Tags:map[kubernetes.io/created-for/pvc/namespace:test-storage kubernetes.io/created-for/pvc/name:test-storage kubernetes.io/created-for/pv/name:pvc-0aa90544-4eb0-11e8-a35a-005056943169] Name:kubernetes-dynamic-pvc-0aa90544-4eb0-11e8-a35a-005056943169 DiskFormat:thin Datastore:iscsi-hdd VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType:}
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: E0503 10:58:19.505559    2728 vsphere_util.go:199] Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: E0503 10:58:19.505581    2728 vsphere.go:1059] Failed to get shared datastore: Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.505596    2728 vsphere.go:1111] The canonical volume path for the newly created vSphere volume is ""
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.505612    2728 pv_controller.go:1425] failed to provision volume for claim "test-storage/test-storage" with StorageClass "slow": Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.505943    2728 event.go:218] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"test-storage", Name:"test-storage", UID:"0aa90544-4eb0-11e8-a35a-005056943169", APIVersion:"v1", ResourceVersion:"1875650", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "slow": Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Expected Result

PV and PVC creation should be successful

sistorage

Most helpful comment

So I was able to workaround the issue by forcing an older hardware version (11) for my VM:

  • Shutdown each node/master serially
  • Unregister each VM
  • Download/edit each vmx associated with each node/master and update virtualHW.version = "13" tovirtualHW.version = "11"
  • Register VM and start
  • Confirm output of cat /sys/class/dmi/id/product_uuid matches cat /sys/class/dmi/id/product_serial
  • Attempt creation of a new PV

This seems related to https://github.com/kubernetes/kubernetes/pull/59602 and should be fixed in k8s 1.9.4 but not 1.9.1 shipping with OCP 3.9.27 or 3.9.30

All 12 comments

@openshift/sig-storage

@jsafrane assigning you directly since you were involved with the attached BZ, lets just make sure the fix is in master for origin

Just to let you known that I face a similar problem. I have installed the vsphere provider with ansible. Not sure it is the proper way to do that though.

[OSEv3:vars]
...
openshift_cloudprovider_kind='vsphere'
openshift_cloudprovider_vsphere_username='[email protected]'
openshift_cloudprovider_vsphere_password='S3cr3t!'
openshift_cloudprovider_vsphere_host='vcsa-1.lss1.domain.tld'
openshift_cloudprovider_vsphere_datacenter='Datacenter'
openshift_cloudprovider_vsphere_datastore='datastore2'
oc v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://console.oshift.lss1.domain.tld:8443
openshift v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
May  8 15:52:04 master origin-master-controllers: I0508 15:52:04.375030   22540 vsphere.go:1007] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:1024 Tags:map[kubernetes.io/created-for/pv/name:pvc-fd6b880f-52c6-11e8-a0bc-005056b9ed4a kubernetes.io/created-for/pvc/namespace:my-project-olc kubernetes.io/created-for/pvc/name:my-storage] Name:kubernetes-dynamic-pvc-fd6b880f-52c6-11e8-a0bc-005056b9ed4a DiskFormat: Datastore:datastore2 VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType:}
May  8 15:52:04 master origin-master-controllers: E0508 15:52:04.383825   22540 vsphere_util.go:199] Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
May  8 15:52:04 master origin-master-controllers: E0508 15:52:04.383877   22540 vsphere.go:1059] Failed to get shared datastore: Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []

Opened https://github.com/openshift/origin/pull/19648 to remove the need of client access altogether from vmware cloudprovider.

@gnufied Why do you think that PR #19648 fixes this problem? Do I have a problem with the node to vSphere connection?

@ReadmeCritic The linked BZ was because of vphere cloud provider unable to fetch node info from api-server. It is possible that - this BZ is different, so I am going to try and isolate that.

Seeing the exact same issue as @Reamer on both OCP (3.9.27) and Origin (v3.9.0+ba7faec-1) deployments. I'm Working to test OCP 3.9.30 since it's supposed to be fixed but haven't made any progress on diagnosing/fixing the issue with origin installs.

Confirmed the same issue exists on OCP 3.9.30

So I was able to workaround the issue by forcing an older hardware version (11) for my VM:

  • Shutdown each node/master serially
  • Unregister each VM
  • Download/edit each vmx associated with each node/master and update virtualHW.version = "13" tovirtualHW.version = "11"
  • Register VM and start
  • Confirm output of cat /sys/class/dmi/id/product_uuid matches cat /sys/class/dmi/id/product_serial
  • Attempt creation of a new PV

This seems related to https://github.com/kubernetes/kubernetes/pull/59602 and should be fixed in k8s 1.9.4 but not 1.9.1 shipping with OCP 3.9.27 or 3.9.30

@liveaverage Thanks for description of your workaround. I'll try it.

@liveaverage Thanks for your workaround. It seems to work correctly.

I updated to okd 3.10 and it works with newest VM version 14. Thanks for your help.
```
oc v3.10.0+0c4577e-1
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://s-cp-lb-01.cloud.example.de:443
openshift v3.10.0+7eee6f8-2
kubernetes v1.10.0+b81c8f8
``

Was this page helpful?
0 / 5 - 0 ratings