Openshift-ansible: oc veriosn v1.2.0-1-g7386b49 , doesnt seem stable

Created on 15 Jun 2016 · 38Comments · Source: openshift/openshift-ansible

Hi , I am installing origin from latest openshift-ansible branch , now I am trying to deploy docker-registry , But it always gives

Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Tag v1.2.0-1-g7386b49 not found in repository docker.io/openshift/origin-pod"

Could you please tell me where is the stable version so that I can try that out. Daily I see some changes and always stuck with the deployment :(

Source

priyanka5

Most helpful comment

I have built a fixed origin for centos (origin-1.2.0-2.el7). This will take care of the problem.
https://cbs.centos.org/koji/buildinfo?buildID=11297
It currently is in the centos paas testing repo, and has been marked to go into the released repo. I'm not positive how soon it takes for packages to get into the relesed repo. Hopefully by Monday, possibly sooner.

tdawson on 17 Jun 2016

🎉1 👍1

All 38 comments

@sdodson @tdawson This seems related to the centos sig work being done for origin?

abutcher on 15 Jun 2016

@abutcher this isoc version v1.2.0-1-g7386b49 so it tries to pull images with tags: v1.2.0-1-g7386b49 but this tag does not exist in the openshift origin registry on dockerhub.

ghost on 16 Jun 2016

Safe to call this a duplicate of https://github.com/openshift/origin/issues/9315 ?

dgoodwin on 16 Jun 2016

Yeah, not sure if the origin issue should be considered a dupe of this or
the otherway around. It's really in the centos packaging process which is
contained in neither of these two repos :-(

On Thu, Jun 16, 2016 at 8:30 AM, Devan Goodwin [email protected]
wrote:

Safe to call this a duplicate of openshift/origin#9315
https://github.com/openshift/origin/issues/9315 ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/openshift-ansible/issues/2045#issuecomment-226471110,
or mute the thread
https://github.com/notifications/unsubscribe/AAC8IXjCa8DdQ4EfBX_fCe7M56TdtGUdks5qMUHmgaJpZM4I2BCV
.

sdodson on 16 Jun 2016

I am in the middle of reworking the CentOS packaging process. If there are certain things that are being left out, now is the time to say what they are.
The issue I see right now is that the origin spec file is still at version 0.0.1 (for testing reasons) which means that any build of the origin rpm's are completely independent of any other builds. Thus the tags are going to be different.
I would really like fix the main origin spec file, and workflow if needed, so that it is easy for people to build the origin rpm in a consistent manner.

tdawson on 16 Jun 2016

@tdawson : in centos 7 it says NetworkManager should also be enabled before installation. Does it really required?

priyanka5 on 16 Jun 2016

@priyanka5 I'm not the correct person to ask about that. All I know is that it should be the same as RHEL. We haven't done anything special to CentOS to make OpenShift different on it.

tdawson on 16 Jun 2016

Is there a workaround in the mean time? Kinda dead in the water here.

I've tried editing the "image" value by hand to "v1.2.0" in the dc's and in "/etc/origin/master/master-config.yaml", and redeployed but it keeps trying to grab the "v1.2.0-36-g4a3f9c5" tag.

boweeb on 17 Jun 2016

@boweeb, for me it worked to edit the dc's. (Which is kinda stupid solution but okay). My first deployement faild. By editing the dc a new deployment was triggered and it pulled the right image.

ghost on 17 Jun 2016

tdawson on 17 Jun 2016

🎉1 👍1

Thank you. We are looking forward. We have installed openshift from openshift-ansible on github.com on last friday and it does not working.

kfrydrys on 20 Jun 2016

@lvthillo can u explain please what do u mean for "edit the dc's" ?
I have the same issue , I tried to change master-config.yaml/node-config.yaml files on master and node-config.yaml file on each node , the line "false" to "true" for the image section. One registry now is deployed but a second one has the same issue.
Please share some detail , thank you for your time

alessiodini on 20 Jun 2016

@adini83
I meant:

$ oc project default
$ oc get dc
NAME              REVISION   REPLICAS   TRIGGERED BY
docker-registry   6          1          config
router            1          1          config
$ oc edit dc docker-registry

//So edit the deploymentConfig of the pod/container (to change which image you want to use).
//edit this section:

image: openshift/origin-docker-registry:v1.2.0-1-g7386b49
        imagePullPolicy: IfNotPresent

Edit the tag of your image (v1.2.0-1-g7386b49) to an existing tag on docker hub (v1.2.0)

Changing dc will automatically trigger a new deployment with the image you specified.
This is a workaround. Not a solution for a stable environment.

ghost on 20 Jun 2016

I try immediatelly thank you!!

alessiodini on 20 Jun 2016

@lvthillo I've made those changes to my DCs and it still tries to pull the bad tag when I start a new deployment. Unfortunately in my case, this leaves OpenShift completely unusable as it's a new install. Installing the fixed package from testing doesn't appear to fix the problem; it simply changes the tag it tries to pull to v1.2.0-rc1-13-g2e62fab.

stevenmirabito on 20 Jun 2016

Hello Steve I think I found the best solution!
Just delete both registry and router , and create them with --images= option !

In my case I have the Origin version and I ran:
1) oadm router --credentials=/etc/origin/master/openshift-router.kubeconfig --images=openshift/origin-haproxy-router:v1.2.0 --service-account=router

2) oadm registry --credentials=/etc/origin/master/openshift-registry.kubeconfig --images=docker.io/openshift/origin-docker-registry:v1.2.0 --service-account=registry

Both worked as charm for me :)
Let me know

alessiodini on 20 Jun 2016

@adini83 That didn't work initially, but after I went through and applied the ${version} --> v1.2.0 workaround to all of my node-config.yamls things started to work. Thanks for the push in the right direction!

stevenmirabito on 21 Jun 2016

origin-1.2.0-2.el7 which hopefully fixes this issue, has finally made it to the CentOS released mirrors.
If anyone hasn't reconfigured to work around this problem, can you please give it another try.

tdawson on 21 Jun 2016

@tdawson Just updated my nodes and will try reverting the config patch and deploying another router in a moment.

stevenmirabito on 21 Jun 2016

@tdawson The new packages don't appear to solve the issue, if anything it's worse. As shown in the dry-run config below, ${version} is now returning v1.2.0-rc1-13-g2e62fab.

[root@os-master01 ~]# oadm router -o yaml --service-account=router --credentials=${ROUTER_KUBECONFIG:-"$KUBECONFIG"}
Flag --credentials has been deprecated, use --service-account to specify the service account the router will use to make API calls
apiVersion: v1
items:
- apiVersion: v1
  kind: ServiceAccount
  metadata:
    creationTimestamp: null
    name: router
- apiVersion: v1
  groupNames: null
  kind: ClusterRoleBinding
  metadata:
    creationTimestamp: null
    name: router-router-role
  roleRef:
    kind: ClusterRole
    name: system:router
  subjects:
  - kind: ServiceAccount
    name: router
    namespace: default
  userNames:
  - system:serviceaccount:default:router
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    creationTimestamp: null
    labels:
      router: router
    name: router
  spec:
    replicas: 1
    selector:
      router: router
    strategy:
      resources: {}
      rollingParams:
        maxSurge: 0
        maxUnavailable: 25%
        updatePercent: -25
      type: Rolling
    template:
      metadata:
        creationTimestamp: null
        labels:
          router: router
      spec:
        containers:
        - env:
          - name: ROUTER_EXTERNAL_HOST_HOSTNAME
          - name: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER
          - name: ROUTER_EXTERNAL_HOST_HTTP_VSERVER
          - name: ROUTER_EXTERNAL_HOST_INSECURE
            value: "false"
          - name: ROUTER_EXTERNAL_HOST_PARTITION_PATH
          - name: ROUTER_EXTERNAL_HOST_PASSWORD
          - name: ROUTER_EXTERNAL_HOST_PRIVKEY
            value: /etc/secret-volume/router.pem
          - name: ROUTER_EXTERNAL_HOST_USERNAME
          - name: ROUTER_SERVICE_HTTPS_PORT
            value: "443"
          - name: ROUTER_SERVICE_HTTP_PORT
            value: "80"
          - name: ROUTER_SERVICE_NAME
            value: router
          - name: ROUTER_SERVICE_NAMESPACE
            value: default
          - name: ROUTER_SUBDOMAIN
          - name: STATS_PASSWORD
            value: [redacted]
          - name: STATS_PORT
            value: [redacted]
          - name: STATS_USERNAME
            value: [redacted]
          image: openshift/origin-haproxy-router:v1.2.0-rc1-13-g2e62fab
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              host: localhost
              path: /healthz
              port: 1936
            initialDelaySeconds: 10
          name: router
          ports:
          - containerPort: 80
          - containerPort: 443
          - containerPort: 1936
            name: stats
            protocol: TCP
          readinessProbe:
            httpGet:
              host: localhost
              path: /healthz
              port: 1936
            initialDelaySeconds: 10
          resources: {}
        hostNetwork: true
        securityContext: {}
        serviceAccount: router
        serviceAccountName: router
    test: false
    triggers:
    - type: ConfigChange
  status: {}
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: null
    labels:
      router: router
    name: router
  spec:
    ports:
    - name: 80-tcp
      port: 80
      targetPort: 80
    - name: 443-tcp
      port: 443
      targetPort: 443
    - name: 1936-tcp
      port: 1936
      protocol: TCP
      targetPort: 1936
    selector:
      router: router
  status:
    loadBalancer: {}
kind: List
metadata: {}

stevenmirabito on 21 Jun 2016

@tdawson : I tried it just now, still its trying to pull wrong image tag:

Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Tag v1.2.0-rc1-13-g2e62fab not found in repository docker.io/openshift/origin-pod"

priyanka5 on 22 Jun 2016

maybe this helps?:
edit master-config.yaml
edit node-config.yaml

imageConfig:
  format: openshift/origin-${component}:v1.2.0
  latest: false

restarting master and nodes.
than we use this command to deploy (this is for an inital router).:

# oadm router router --replicas=1 \
    --credentials='/etc/origin/master/openshift-router.kubeconfig' \
    --service-account=router \
    '--images=docker.io/openshift/origin-${component}:v1.2.0'

This for our initial registry

oadm registry --config=/etc/origin/master/admin.kubeconfig \
    --credentials=/etc/origin/master/openshift-registry.kubeconfig \
    '--images=docker.io/openshift/origin-${component}:v1.2.0'

But I still agree this is a very bad version of origin. There seems a lot wrong

ghost on 22 Jun 2016

👍1

@lvthillo yes it works , thanks a lot. I agree lot of bugs in latest version

priyanka5 on 22 Jun 2016

Bug was likely caused by running ansible playbooks against the rpms that carried the bad version number, so even updating the rpms wouldn't fix the issue as you'd still have config laid down from ansible. Re-running the ansible config playbook should correct the problem in the config files.

dgoodwin on 22 Jun 2016

@dgoodwin thanks for the answer. But what are the right RPMS we need to install?

ghost on 22 Jun 2016

@lvthillo just basing off @tdawson comment, but I think it should be origin-1.2.0-2.el7. I would expect that to report correct version, so once that's installed we'd just need to get the config files updated either manually, or by re-running the ansible config playbooks.

dgoodwin on 22 Jun 2016

@dgoodwin
We are using
https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm (like it's in the documentation). Which one do we have to use to install origin-1.2.0-2.el7?
Or is this an issue that you have to fix first before we can install everything in the right way for a stable environment.

ghost on 22 Jun 2016

@dgoodwin Could this bug also be an explanation for openshift/origin#9396

ghost on 22 Jun 2016

@lvthillo which docs are you following? I'm not aware how this relates to EPEL, but I believe Troy is building for the CentOS PAAS SIG, which is landing here: http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin/

Could you link me to the issue you're referring to, doesn't seem to be an issue 9396 in this repo, and that issue in origin doesn't seem to be related to this discussion.

dgoodwin on 22 Jun 2016

@dgoodwin Ow okay. it's just an issue we also got on those latest 'weird' versions of OpenShift Origin and we don't know why it appears.
We're using prereq and advanced.

ghost on 22 Jun 2016

Ok I think the ansible playbooks do configure the CentOS PAAS repos now so you should be able to update your origin rpm with yum.

dgoodwin on 22 Jun 2016

@dgoodwin
First I got: oc v1.2.0-1-g7386b49 with this in my playbook: 1.2.0-1.git.10183.7386b49.el7. Than I perform the yum update. This updates to 1.2.0-2.el7. Now my oc version is oc v1.2.0-rc1-13-g2e62fab.
I read on the users list about someone with the same issue. @tdawson was talking about a 1.2.0-4.
If I run yum --enablerepo=centos-openshift-origin-testing install origin\* I see I'm using v1.2.0. So this looks fine. But now I have to wait until it's in the released repo of centos? And then it's available to use in my ansible/hosts?

ghost on 23 Jun 2016

Yeah this popped up on #openshift yesterday, @tdawson is looking into it I believe, the new build had a similar bad tag problem.

dgoodwin on 23 Jun 2016

@dgoodwin Okay thank you. I'll wait for until the new 1.2.0-4 is in the released repo.

ghost on 23 Jun 2016

origin 1.2.0-4 is now in the released repo. Please test again.
Sorry for the delay.

tdawson on 24 Jun 2016

I haven't heard any complaints. Are we ok if I close this now?

tdawson on 28 Jun 2016

@tdawson it's fine now. Thanks

ghost on 28 Jun 2016

Glad it's working. Closing issue.

tdawson on 28 Jun 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to redeploy only named certificates?

leoluk · 4Comments

Rerunning openshift-logging fails on patch command when rerunning

jmontleon · 5Comments

Installing 3.11 cluster fails with "Node start failed"

dharmit · 4Comments

OpenShift 3.10 openshift_master_audit_config is not getting encoded correctly into /etc/origin/master/master-config.yaml

outcoldman · 6Comments

No package matching 'origin-docker-excluder-3.11**' found available, installed or updated

wongkafai · 7Comments