I expect to run Argo workflows on Istio enabled Namespaces. When we disable Istio injection workflow passes to other steps (ie wait-seldon-resource) but when Istio is enabled it waits forever on the first phase.
AKS 1.16.15
Argo v2.11.7
argo-workflow-emtech.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: seldon-batch-process
namespace: emtech
spec:
entrypoint: seldon-batch-process
serviceAccountName: emtech
volumeClaimTemplates:
- metadata:
name: "seldon-pvc"
ownerReferences:
- apiVersion: argoproj.io/v1alpha1
blockOwnerDeletion: true
kind: Workflow
name: "{{workflow.name}}"
uid: "{{workflow.uid}}"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: "2Mi"
templates:
- name: seldon-batch-process
steps:
- - name: create-seldon-resource
template: create-seldon-resource-template
- - name: wait-seldon-resource
template: wait-seldon-resource-template
- - name: download-object-store
template: download-object-store-template
- - name: process-batch-inputs
template: process-batch-inputs-template
- - name: upload-object-store
template: upload-object-store-template
- name: create-seldon-resource-template
resource:
action: create
manifest: |
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: "sklearn"
namespace: emtech
annotations:
sidecar.istio.io/inject: "false"
ownerReferences:
- apiVersion: argoproj.io/v1alpha1
blockOwnerDeletion: true
kind: Workflow
name: "{{workflow.name}}"
uid: "{{workflow.uid}}"
spec:
name: "sklearn"
serviceAccountName: emtech
predictors:
- componentSpecs:
- spec:
containers:
- name: classifier
env:
- name: GUNICORN_THREADS
value: 3
- name: GUNICORN_WORKERS
value: 1
resources:
requests:
cpu: 50m
memory: 100Mi
limits:
cpu: 50m
memory: 1000Mi
graph:
children: []
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
name: classifier
name: default
replicas: 3
- name: wait-seldon-resource-template
script:
image: bitnami/kubectl:1.18
command: [bash]
source: |
sleep 5
kubectl rollout status \
deploy/$(kubectl get deploy -l seldon-deployment-id="sklearn" -o jsonpath='{.items[0].metadata.name}')
- name: download-object-store-template
script:
image: minio/mc:RELEASE.2020-10-03T02-54-56Z
env:
- name: MINIO_SERVER_ACCESS_KEY
valueFrom:
secretKeyRef:
name: argo-minio
key: accesskey
- name: MINIO_SERVER_ACCESS_SECRET
valueFrom:
secretKeyRef:
name: argo-minio
key: secretkey
- name: MINIO_SERVER_HOST
value: http://argo-minio.argowf.svc.cluster.local:9000
volumeMounts:
- name: "seldon-pvc"
mountPath: /assets
command: [sh]
source: |
mc config host add minio-local $MINIO_SERVER_HOST $MINIO_SERVER_ACCESS_KEY $MINIO_SERVER_ACCESS_SECRET
mc cp minio-local/data/input-data.txt /assets/input-data.txt
- name: process-batch-inputs-template
script:
image: seldonio/seldon-core-s2i-python37:1.3.0-dev
volumeMounts:
- name: "seldon-pvc"
mountPath: /assets
command: [bash]
source: |
seldon-batch-processor \
--deployment-name "sklearn" \
--benchmark \
--namespace "emtech" \
--host "istio-ingressgateway.istio-system.svc.cluster.local" \
--workers "9" \
--data-type "data" \
--payload-type "ndarray" \
--retries "3" \
--input-data-path "/assets/input-data.txt" \
--output-data-path "/assets/output-data.txt"
- name: upload-object-store-template
script:
image: minio/mc:RELEASE.2020-10-03T02-54-56Z
volumeMounts:
- name: "seldon-pvc"
mountPath: /assets
command: [sh]
env:
- name: MINIO_SERVER_ACCESS_KEY
valueFrom:
secretKeyRef:
name: argo-minio
key: accesskey
- name: MINIO_SERVER_ACCESS_SECRET
valueFrom:
secretKeyRef:
name: argo-minio
key: secretkey
- name: MINIO_SERVER_HOST
value: http://argo-minio.argowf.svc.cluster.local:9000
source: |
mc config host add minio-local $MINIO_SERVER_HOST $MINIO_SERVER_ACCESS_KEY $MINIO_SERVER_ACCESS_SECRET
mc cp /assets/output-data.txt minio-local/data/output-data-{{workflow.uid}}.txt
kubectl logs -n argowf $(kubectl get pods -l app=argo-workflow-controller -n argowf -o name) |grep seldon-batch-process
time="2020-11-13T11:51:53Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:51:53Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:51:54Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:51:54Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Updated phase -> Running" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Creating pvc seldon-batch-process-seldon-pvc" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Steps node seldon-batch-process initialized Running" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="StepGroup node seldon-batch-process-2674055389 initialized Running" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Pod node seldon-batch-process-3626514072 initialized Pending" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Created pod: seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072)" namespace=emtech workflow=seldon-batch-proces
time="2020-11-13T11:52:47Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:47Z" level=info msg="Workflow update successful" namespace=emtech phase=Running resourceVersion=80544 workflow=seldon-batch-process
time="2020-11-13T11:52:48Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:48Z" level=info msg="Updating node seldon-batch-process-3626514072 message: PodInitializing"
time="2020-11-13T11:52:48Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:48Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:49Z" level=info msg="Workflow update successful" namespace=emtech phase=Running resourceVersion=80562 workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:50Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:52Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:52Z" level=info msg="Skipped pod seldon-batch-process[0].create-seldon-resource (seldon-batch-process-3626514072) creation: already exists" namespace=emtech podPhase=Pending workflow=seldon-batch-process
time="2020-11-13T11:52:52Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:54Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:54Z" level=info msg="Updating node seldon-batch-process-3626514072 status Pending -> Running"
time="2020-11-13T11:52:54Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:54Z" level=info msg="Workflow update successful" namespace=emtech phase=Running resourceVersion=80621 workflow=seldon-batch-process
time="2020-11-13T11:52:55Z" level=info msg="Processing workflow" namespace=emtech workflow=seldon-batch-process
time="2020-11-13T11:52:55Z" level=info msg="Workflow step group node seldon-batch-process-2674055389 not yet completed" namespace=emtech workflow=seldon-batch-process
Message from the maintainers:
Impacted by this bug? Give it a 馃憤. We prioritise the issues with the most 馃憤.

@omerfsen I can see the step pod is running. looks like it stuck in resource creation. Can you provide us pod logs kubectl logs <podname> main
ran workflow-controller with --loglevel debug and output attached
Can you provide workflow step pod log seldon-batch-process-3626514072?
Worked with @omerfsen offline and figured out the issue. The actual argo workflow pod needs istio-injection disabled, which wasn't happening in the workflow definition above. Adding the sidecar.istio.io/inject: 'false' annotation to each template as follows fixed it.
- name: create-seldon-resource-template
metadata:
annotations:
sidecar.istio.io/inject: 'false'
and
- name: wait-seldon-resource-template
metadata:
annotations:
sidecar.istio.io/inject: 'false'
Most helpful comment
Worked with @omerfsen offline and figured out the issue. The actual argo workflow pod needs istio-injection disabled, which wasn't happening in the workflow definition above. Adding the
sidecar.istio.io/inject: 'false'annotation to each template as follows fixed it.and