Openshift-ansible: oc veriosn v1.2.0-1-g7386b49 , doesnt seem stable

Created on 15 Jun 2016  Â·  38Comments  Â·  Source: openshift/openshift-ansible

Hi , I am installing origin from latest openshift-ansible branch , now I am trying to deploy docker-registry , But it always gives

Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Tag v1.2.0-1-g7386b49 not found in repository docker.io/openshift/origin-pod"

Could you please tell me where is the stable version so that I can try that out. Daily I see some changes and always stuck with the deployment :(

Most helpful comment

I have built a fixed origin for centos (origin-1.2.0-2.el7). This will take care of the problem.
https://cbs.centos.org/koji/buildinfo?buildID=11297
It currently is in the centos paas testing repo, and has been marked to go into the released repo. I'm not positive how soon it takes for packages to get into the relesed repo. Hopefully by Monday, possibly sooner.

All 38 comments

@sdodson @tdawson This seems related to the centos sig work being done for origin?

@abutcher this isoc version v1.2.0-1-g7386b49 so it tries to pull images with tags: v1.2.0-1-g7386b49 but this tag does not exist in the openshift origin registry on dockerhub.

Safe to call this a duplicate of https://github.com/openshift/origin/issues/9315 ?

Yeah, not sure if the origin issue should be considered a dupe of this or
the otherway around. It's really in the centos packaging process which is
contained in neither of these two repos :-(

On Thu, Jun 16, 2016 at 8:30 AM, Devan Goodwin [email protected]
wrote:

Safe to call this a duplicate of openshift/origin#9315
https://github.com/openshift/origin/issues/9315 ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/openshift-ansible/issues/2045#issuecomment-226471110,
or mute the thread
https://github.com/notifications/unsubscribe/AAC8IXjCa8DdQ4EfBX_fCe7M56TdtGUdks5qMUHmgaJpZM4I2BCV
.

I am in the middle of reworking the CentOS packaging process. If there are certain things that are being left out, now is the time to say what they are.
The issue I see right now is that the origin spec file is still at version 0.0.1 (for testing reasons) which means that any build of the origin rpm's are completely independent of any other builds. Thus the tags are going to be different.
I would really like fix the main origin spec file, and workflow if needed, so that it is easy for people to build the origin rpm in a consistent manner.

@tdawson : in centos 7 it says NetworkManager should also be enabled before installation. Does it really required?

@priyanka5 I'm not the correct person to ask about that. All I know is that it should be the same as RHEL. We haven't done anything special to CentOS to make OpenShift different on it.

Is there a workaround in the mean time? Kinda dead in the water here.

I've tried editing the "image" value by hand to "v1.2.0" in the dc's and in "/etc/origin/master/master-config.yaml", and redeployed but it keeps trying to grab the "v1.2.0-36-g4a3f9c5" tag.

@boweeb, for me it worked to edit the dc's. (Which is kinda stupid solution but okay). My first deployement faild. By editing the dc a new deployment was triggered and it pulled the right image.

I have built a fixed origin for centos (origin-1.2.0-2.el7). This will take care of the problem.
https://cbs.centos.org/koji/buildinfo?buildID=11297
It currently is in the centos paas testing repo, and has been marked to go into the released repo. I'm not positive how soon it takes for packages to get into the relesed repo. Hopefully by Monday, possibly sooner.

Thank you. We are looking forward. We have installed openshift from openshift-ansible on github.com on last friday and it does not working.

@lvthillo can u explain please what do u mean for "edit the dc's" ?
I have the same issue , I tried to change master-config.yaml/node-config.yaml files on master and node-config.yaml file on each node , the line "false" to "true" for the image section. One registry now is deployed but a second one has the same issue.
Please share some detail , thank you for your time

@adini83
I meant:

$ oc project default
$ oc get dc
NAME              REVISION   REPLICAS   TRIGGERED BY
docker-registry   6          1          config
router            1          1          config
$ oc edit dc docker-registry

//So edit the deploymentConfig of the pod/container (to change which image you want to use).
//edit this section:

image: openshift/origin-docker-registry:v1.2.0-1-g7386b49
        imagePullPolicy: IfNotPresent

Edit the tag of your image (v1.2.0-1-g7386b49) to an existing tag on docker hub (v1.2.0)

Changing dc will automatically trigger a new deployment with the image you specified.
This is a workaround. Not a solution for a stable environment.

I try immediatelly thank you!!

@lvthillo I've made those changes to my DCs and it still tries to pull the bad tag when I start a new deployment. Unfortunately in my case, this leaves OpenShift completely unusable as it's a new install. Installing the fixed package from testing doesn't appear to fix the problem; it simply changes the tag it tries to pull to v1.2.0-rc1-13-g2e62fab.

Hello Steve I think I found the best solution!
Just delete both registry and router , and create them with --images= option !

In my case I have the Origin version and I ran:
1) oadm router --credentials=/etc/origin/master/openshift-router.kubeconfig --images=openshift/origin-haproxy-router:v1.2.0 --service-account=router

2) oadm registry --credentials=/etc/origin/master/openshift-registry.kubeconfig --images=docker.io/openshift/origin-docker-registry:v1.2.0 --service-account=registry

Both worked as charm for me :)
Let me know

@adini83 That didn't work initially, but after I went through and applied the ${version} --> v1.2.0 workaround to all of my node-config.yamls things started to work. Thanks for the push in the right direction!

origin-1.2.0-2.el7 which hopefully fixes this issue, has finally made it to the CentOS released mirrors.
If anyone hasn't reconfigured to work around this problem, can you please give it another try.

@tdawson Just updated my nodes and will try reverting the config patch and deploying another router in a moment.

@tdawson The new packages don't appear to solve the issue, if anything it's worse. As shown in the dry-run config below, ${version} is now returning v1.2.0-rc1-13-g2e62fab.

[root@os-master01 ~]# oadm router -o yaml --service-account=router --credentials=${ROUTER_KUBECONFIG:-"$KUBECONFIG"}
Flag --credentials has been deprecated, use --service-account to specify the service account the router will use to make API calls
apiVersion: v1
items:
- apiVersion: v1
  kind: ServiceAccount
  metadata:
    creationTimestamp: null
    name: router
- apiVersion: v1
  groupNames: null
  kind: ClusterRoleBinding
  metadata:
    creationTimestamp: null
    name: router-router-role
  roleRef:
    kind: ClusterRole
    name: system:router
  subjects:
  - kind: ServiceAccount
    name: router
    namespace: default
  userNames:
  - system:serviceaccount:default:router
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    creationTimestamp: null
    labels:
      router: router
    name: router
  spec:
    replicas: 1
    selector:
      router: router
    strategy:
      resources: {}
      rollingParams:
        maxSurge: 0
        maxUnavailable: 25%
        updatePercent: -25
      type: Rolling
    template:
      metadata:
        creationTimestamp: null
        labels:
          router: router
      spec:
        containers:
        - env:
          - name: ROUTER_EXTERNAL_HOST_HOSTNAME
          - name: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER
          - name: ROUTER_EXTERNAL_HOST_HTTP_VSERVER
          - name: ROUTER_EXTERNAL_HOST_INSECURE
            value: "false"
          - name: ROUTER_EXTERNAL_HOST_PARTITION_PATH
          - name: ROUTER_EXTERNAL_HOST_PASSWORD
          - name: ROUTER_EXTERNAL_HOST_PRIVKEY
            value: /etc/secret-volume/router.pem
          - name: ROUTER_EXTERNAL_HOST_USERNAME
          - name: ROUTER_SERVICE_HTTPS_PORT
            value: "443"
          - name: ROUTER_SERVICE_HTTP_PORT
            value: "80"
          - name: ROUTER_SERVICE_NAME
            value: router
          - name: ROUTER_SERVICE_NAMESPACE
            value: default
          - name: ROUTER_SUBDOMAIN
          - name: STATS_PASSWORD
            value: [redacted]
          - name: STATS_PORT
            value: [redacted]
          - name: STATS_USERNAME
            value: [redacted]
          image: openshift/origin-haproxy-router:v1.2.0-rc1-13-g2e62fab
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              host: localhost
              path: /healthz
              port: 1936
            initialDelaySeconds: 10
          name: router
          ports:
          - containerPort: 80
          - containerPort: 443
          - containerPort: 1936
            name: stats
            protocol: TCP
          readinessProbe:
            httpGet:
              host: localhost
              path: /healthz
              port: 1936
            initialDelaySeconds: 10
          resources: {}
        hostNetwork: true
        securityContext: {}
        serviceAccount: router
        serviceAccountName: router
    test: false
    triggers:
    - type: ConfigChange
  status: {}
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: null
    labels:
      router: router
    name: router
  spec:
    ports:
    - name: 80-tcp
      port: 80
      targetPort: 80
    - name: 443-tcp
      port: 443
      targetPort: 443
    - name: 1936-tcp
      port: 1936
      protocol: TCP
      targetPort: 1936
    selector:
      router: router
  status:
    loadBalancer: {}
kind: List
metadata: {}

@tdawson : I tried it just now, still its trying to pull wrong image tag:

Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Tag v1.2.0-rc1-13-g2e62fab not found in repository docker.io/openshift/origin-pod"

maybe this helps?:
edit master-config.yaml
edit node-config.yaml

imageConfig:
  format: openshift/origin-${component}:v1.2.0
  latest: false

restarting master and nodes.
than we use this command to deploy (this is for an inital router).:

# oadm router router --replicas=1 \
    --credentials='/etc/origin/master/openshift-router.kubeconfig' \
    --service-account=router \
    '--images=docker.io/openshift/origin-${component}:v1.2.0'

This for our initial registry

oadm registry --config=/etc/origin/master/admin.kubeconfig \
    --credentials=/etc/origin/master/openshift-registry.kubeconfig \
    '--images=docker.io/openshift/origin-${component}:v1.2.0'

But I still agree this is a very bad version of origin. There seems a lot wrong

@lvthillo yes it works , thanks a lot. I agree lot of bugs in latest version

Bug was likely caused by running ansible playbooks against the rpms that carried the bad version number, so even updating the rpms wouldn't fix the issue as you'd still have config laid down from ansible. Re-running the ansible config playbook should correct the problem in the config files.

@dgoodwin thanks for the answer. But what are the right RPMS we need to install?

@lvthillo just basing off @tdawson comment, but I think it should be origin-1.2.0-2.el7. I would expect that to report correct version, so once that's installed we'd just need to get the config files updated either manually, or by re-running the ansible config playbooks.

@dgoodwin
We are using
https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm (like it's in the documentation). Which one do we have to use to install origin-1.2.0-2.el7?
Or is this an issue that you have to fix first before we can install everything in the right way for a stable environment.

@dgoodwin Could this bug also be an explanation for openshift/origin#9396

@lvthillo which docs are you following? I'm not aware how this relates to EPEL, but I believe Troy is building for the CentOS PAAS SIG, which is landing here: http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin/

Could you link me to the issue you're referring to, doesn't seem to be an issue 9396 in this repo, and that issue in origin doesn't seem to be related to this discussion.

@dgoodwin Ow okay. it's just an issue we also got on those latest 'weird' versions of OpenShift Origin and we don't know why it appears.
We're using prereq and advanced.

Ok I think the ansible playbooks do configure the CentOS PAAS repos now so you should be able to update your origin rpm with yum.

@dgoodwin
First I got: oc v1.2.0-1-g7386b49 with this in my playbook: 1.2.0-1.git.10183.7386b49.el7. Than I perform the yum update. This updates to 1.2.0-2.el7. Now my oc version is oc v1.2.0-rc1-13-g2e62fab.
I read on the users list about someone with the same issue. @tdawson was talking about a 1.2.0-4.
If I run yum --enablerepo=centos-openshift-origin-testing install origin\* I see I'm using v1.2.0. So this looks fine. But now I have to wait until it's in the released repo of centos? And then it's available to use in my ansible/hosts?

Yeah this popped up on #openshift yesterday, @tdawson is looking into it I believe, the new build had a similar bad tag problem.

@dgoodwin Okay thank you. I'll wait for until the new 1.2.0-4 is in the released repo.

origin 1.2.0-4 is now in the released repo. Please test again.
Sorry for the delay.

I haven't heard any complaints. Are we ok if I close this now?

@tdawson it's fine now. Thanks

Glad it's working. Closing issue.

Was this page helpful?
0 / 5 - 0 ratings