Checklist:
What happened:
A simple workflow fails saving the logs to S3.
What you expected to happen:
Saving logs to S3 correctly
How to reproduce it (as minimally and precisely as possible):
Workflow
metadata:
name: lovely-rhino
namespace: argo
spec:
workflowSpec:
templates:
- name: whalesay
inputs: {}
outputs: {}
metadata: {}
container:
name: main
image: 'docker/whalesay:latest'
command:
- cowsay
args:
- hello world
resources: {}
entrypoint: whalesay
arguments: {}
schedule: 30 13 * * *
concurrencyPolicy: Forbid
workflow-controller-configmap
artifactRepository:
archiveLogs: true
s3:
endpoint: s3.amazonaws.com
bucket: example-argo-logs
keyPrefix: argo-logs
region: eu-west-1
Anything else we need to know?:
Environment:
$ argo version
2.6.0-rc2
$ kubectl version -o yaml
clientVersion:
buildDate: "2019-06-19T16:40:16Z"
compiler: gc
gitCommit: e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529
gitTreeState: clean
gitVersion: v1.15.0
goVersion: go1.12.5
major: "1"
minor: "15"
platform: darwin/amd64
serverVersion:
buildDate: "2019-12-22T23:14:11Z"
compiler: gc
gitCommit: c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2
gitTreeState: clean
gitVersion: v1.14.9-eks-c0eccc
goVersion: go1.12.12
major: "1"
minor: 14+
platform: linux/amd64
Other debugging information (if applicable):
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)
time="2020-02-24T13:48:02Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:03Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:03Z" level=info msg="Updating node &NodeStatus{ID:lovely-rhino-846d9,Name:lovely-rhino-846d9,DisplayName:lovely-rhino-846d9,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Pending,BoundaryID:,Message:ContainerCreating,StartedAt:2020-02-24 13:48:00 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,} status Pending -> Running"
time="2020-02-24T13:48:03Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=8373555 workflow=lovely-rhino-846d9
time="2020-02-24T13:48:03Z" level=info msg="Enforcing history limit for 'lovely-rhino'"
time="2020-02-24T13:48:04Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:17Z" level=info msg="Alloc=8406 TotalAlloc=97908 Sys=70080 NumGC=20 Goroutines=103"
time="2020-02-24T13:48:35Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=info msg="Processing workflow" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=info msg="Updating node &NodeStatus{ID:lovely-rhino-846d9,Name:lovely-rhino-846d9,DisplayName:lovely-rhino-846d9,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Running,BoundaryID:,Message:,StartedAt:2020-02-24 13:48:00 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,} status Running -> Error"
time="2020-02-24T13:48:35Z" level=info msg="Updating node &NodeStatus{ID:lovely-rhino-846d9,Name:lovely-rhino-846d9,DisplayName:lovely-rhino-846d9,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Error,BoundaryID:,Message:,StartedAt:2020-02-24 13:48:00 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,} message: failed to save outputs: timed out waiting for the condition"
time="2020-02-24T13:48:35Z" level=info msg="Updated phase Running -> Error" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=info msg="Updated message -> failed to save outputs: timed out waiting for the condition" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=info msg="Marking workflow completed" namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=info msg="Checking daemoned children of " namespace=argo workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=info msg="Workflow update successful" namespace=argo phase=Error resourceVersion=8373652 workflow=lovely-rhino-846d9
time="2020-02-24T13:48:35Z" level=warning msg="Workflow 'lovely-rhino-846d9' from CronWorkflow 'lovely-rhino' completed"
Message from the maintainers:
If you are impacted by this bug please add a 馃憤 reaction to this issue! We often sort issues this way to know what to prioritize.
Can you please examine the wait containers logs? This might shed more light on the problem.
Why has this been closed? The issue is not within the container itself, I believe, but a time-out in the Kubernetes API causing the step to be flagged as an error, which is incorrect.
@rmgpinto How did you resolve your issue?
Yes, I forgot to leave the solution. The pod needs needs the IAM permission to write to S3.
In my case, minio was filling up its PVC and running out of space.
It worked for me after configuring the GCS artifact repository in workflow-controller-configmap.
Most helpful comment
Yes, I forgot to leave the solution. The pod needs needs the IAM permission to write to S3.