Argo: Argo and Istio Injection enabled Namespace

Created on 13 Nov 2020  路  5Comments  路  Source: argoproj/argo

Summary

I expect to run Argo workflows on Istio enabled Namespaces. When we disable Istio injection workflow passes to other steps (ie wait-seldon-resource) but when Istio is enabled it waits forever on the first phase.

Diagnostics

AKS 1.16.15

Argo v2.11.7

argo-workflow-emtech.yaml 
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: seldon-batch-process
  namespace: emtech
spec:
  entrypoint: seldon-batch-process
  serviceAccountName: emtech
  volumeClaimTemplates:
  - metadata:
      name: "seldon-pvc"
      ownerReferences:
      - apiVersion: argoproj.io/v1alpha1
        blockOwnerDeletion: true
        kind: Workflow
        name: "{{workflow.name}}"
        uid: "{{workflow.uid}}"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: "2Mi"
  templates:
  - name: seldon-batch-process
    steps:
    - - name: create-seldon-resource            
        template: create-seldon-resource-template
    - - name: wait-seldon-resource
        template: wait-seldon-resource-template
    - - name: download-object-store
        template: download-object-store-template
    - - name: process-batch-inputs
        template: process-batch-inputs-template
    - - name: upload-object-store
        template: upload-object-store-template

  - name: create-seldon-resource-template
    resource:
      action: create
      manifest: |
        apiVersion: machinelearning.seldon.io/v1
        kind: SeldonDeployment
        metadata:
          name: "sklearn"
          namespace: emtech
          annotations:
            sidecar.istio.io/inject: "false"
          ownerReferences:
          - apiVersion: argoproj.io/v1alpha1
            blockOwnerDeletion: true
            kind: Workflow
            name: "{{workflow.name}}"
            uid: "{{workflow.uid}}"
        spec:
          name: "sklearn"
          serviceAccountName: emtech
          predictors:
            - componentSpecs:
              - spec:
                containers:
                - name: classifier
                  env:
                  - name: GUNICORN_THREADS
                    value: 3
                  - name: GUNICORN_WORKERS
                    value: 1
                  resources:
                    requests:
                      cpu: 50m
                      memory: 100Mi
                    limits:
                      cpu: 50m
                      memory: 1000Mi
              graph:
                children: []
                implementation: SKLEARN_SERVER
                modelUri: gs://seldon-models/sklearn/iris
                name: classifier
              name: default
              replicas: 3

  - name: wait-seldon-resource-template
    script:
      image: bitnami/kubectl:1.18
      command: [bash]
      source: |
        sleep 5
        kubectl rollout status \
            deploy/$(kubectl get deploy -l seldon-deployment-id="sklearn" -o jsonpath='{.items[0].metadata.name}')

  - name: download-object-store-template
    script:
      image: minio/mc:RELEASE.2020-10-03T02-54-56Z
      env:
      - name: MINIO_SERVER_ACCESS_KEY
        valueFrom: 
          secretKeyRef:
            name: argo-minio
            key: accesskey
      - name: MINIO_SERVER_ACCESS_SECRET
        valueFrom:
          secretKeyRef:
            name: argo-minio
            key: secretkey
      - name: MINIO_SERVER_HOST
        value: http://argo-minio.argowf.svc.cluster.local:9000
      volumeMounts:
      - name: "seldon-pvc"
        mountPath: /assets
      command: [sh]
      source: |
        mc config host add minio-local $MINIO_SERVER_HOST $MINIO_SERVER_ACCESS_KEY $MINIO_SERVER_ACCESS_SECRET
        mc cp minio-local/data/input-data.txt /assets/input-data.txt

  - name: process-batch-inputs-template
    script:
      image: seldonio/seldon-core-s2i-python37:1.3.0-dev
      volumeMounts:
      - name: "seldon-pvc"
        mountPath: /assets
      command: [bash]
      source: |
        seldon-batch-processor \
            --deployment-name "sklearn" \
            --benchmark \
            --namespace "emtech" \
            --host "istio-ingressgateway.istio-system.svc.cluster.local" \
            --workers "9" \
            --data-type "data" \
            --payload-type "ndarray" \
            --retries "3" \
            --input-data-path "/assets/input-data.txt" \
            --output-data-path "/assets/output-data.txt"

  - name: upload-object-store-template
    script:
      image: minio/mc:RELEASE.2020-10-03T02-54-56Z
      volumeMounts:
      - name: "seldon-pvc"
        mountPath: /assets
      command: [sh]
      env:
      - name: MINIO_SERVER_ACCESS_KEY
        valueFrom: 
          secretKeyRef:
            name: argo-minio
            key: accesskey
      - name: MINIO_SERVER_ACCESS_SECRET
        valueFrom:
          secretKeyRef:
            name: argo-minio
            key: secretkey
      - name: MINIO_SERVER_HOST
        value: http://argo-minio.argowf.svc.cluster.local:9000
      source: |
        mc config host add minio-local $MINIO_SERVER_HOST $MINIO_SERVER_ACCESS_KEY $MINIO_SERVER_ACCESS_SECRET
        mc cp /assets/output-data.txt minio-local/data/output-data-{{workflow.uid}}.txt
kubectl logs -n argowf $(kubectl get pods -l app=argo-workflow-controller -n argowf -o name) |grep seldon-batch-process 


time="2020-11-13T11:51:53Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:51:53Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:51:54Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:51:54Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Updated phase  -> Running" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Creating pvc seldon-batch-process-seldon-pvc" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Steps node seldon-batch-process initialized Running" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="StepGroup node seldon-batch-process-2674055389 initialized Running" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Pod node seldon-batch-process-3626514072 initialized Pending" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Created pod: seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072)" namespace=emtech workflow=seldon-batch-proces
time="2020-11-13T11:52:47Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Workflow update successful" namespace=emtech phase=Running resourceVersion=80544 workflow=seldon-batch-process
time="2020-11-13T11:52:48Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:48Z" level=info msg="Updating node seldon-batch-process-3626514072 message: PodInitializing"
time="2020-11-13T11:52:48Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:48Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process




time="2020-11-13T11:52:49Z" level=info msg="Workflow update successful" namespace=emtech phase=Running resourceVersion=80562 workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:52Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:52Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:52Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:54Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:54Z" level=info msg="Updating node seldon-batch-process-3626514072 status Pending -> Running"
time="2020-11-13T11:52:54Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:54Z" level=info msg="Workflow update successful" namespace=emtech phase=Running resourceVersion=80621 workflow=seldon-batch-process
time="2020-11-13T11:52:55Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:55Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process


Message from the maintainers:

Impacted by this bug? Give it a 馃憤. We prioritise the issues with the most 馃憤.

bug

Most helpful comment

Worked with @omerfsen offline and figured out the issue. The actual argo workflow pod needs istio-injection disabled, which wasn't happening in the workflow definition above. Adding the sidecar.istio.io/inject: 'false' annotation to each template as follows fixed it.

  - name: create-seldon-resource-template
    metadata:
      annotations:
        sidecar.istio.io/inject: 'false'

and

  - name: wait-seldon-resource-template
    metadata:
      annotations:
        sidecar.istio.io/inject: 'false'

All 5 comments

image

@omerfsen I can see the step pod is running. looks like it stuck in resource creation. Can you provide us pod logs kubectl logs <podname> main

SSHlog.log

ran workflow-controller with --loglevel debug and output attached

Can you provide workflow step pod log seldon-batch-process-3626514072?

Worked with @omerfsen offline and figured out the issue. The actual argo workflow pod needs istio-injection disabled, which wasn't happening in the workflow definition above. Adding the sidecar.istio.io/inject: 'false' annotation to each template as follows fixed it.

  - name: create-seldon-resource-template
    metadata:
      annotations:
        sidecar.istio.io/inject: 'false'

and

  - name: wait-seldon-resource-template
    metadata:
      annotations:
        sidecar.istio.io/inject: 'false'
Was this page helpful?
0 / 5 - 0 ratings