Checklist:
What happened:
I set up v2.7.2 in minikube with the PNS executor, but found output-parameter.yaml doesn't work because the wait container cannot get PID of the main container. I'm not sure if it only occurs in minikube.
I checked the Kubernetes events of the pod and found image-pulling of the main container took longer than the wait container. I guess this is the reason why the executor cannot get the PID of the main container.
I think the deadline in the following code should be configurable.
https://github.com/argoproj/argo/blob/79217bc89e892ee82bdd5018b2bba65425924d36/workflow/executor/pns/pns.go#L149
I will make a fix soon.
What you expected to happen:
output-parameter.yaml can be executed successfully.
How to reproduce it (as minimally and precisely as possible):
Run output-parameter.yaml with v2.7.2 in minikube.
Anything else we need to know?:
N/A
Environment:
$ argo version
argo: v2.7.2
BuildDate: 2020-04-10T19:25:22Z
GitCommit: c52a65aa62426f5e874e1d3f1058af15c43eb35f
GitTreeState: clean
GitTag: v2.7.2
GoVersion: go1.13.4
Compiler: gc
Platform: darwin/amd64
$ kubectl version -o yaml
clientVersion:
buildDate: "2020-04-10T21:53:58Z"
compiler: gc
gitCommit: 7879fc12a63337efff607952a323df90cdc7a335
gitTreeState: clean
gitVersion: v1.18.1
goVersion: go1.14.2
major: "1"
minor: "18"
platform: darwin/amd64
serverVersion:
buildDate: "2018-11-26T12:46:57Z"
compiler: gc
gitCommit: 435f92c719f279a3a67808c80521ea17d5715c66
gitTreeState: clean
gitVersion: v1.12.3
goVersion: go1.10.4
major: "1"
minor: "12"
platform: linux/amd64
Other debugging information (if applicable):
$ argo get output-parameter-ms85z
Name: output-parameter-ms85z
Namespace: default
ServiceAccount: default
Status: Failed
Message: child 'output-parameter-ms85z-1252809517' failed
Created: Tue Apr 14 00:14:34 +0900 (23 seconds ago)
Started: Tue Apr 14 00:14:34 +0900 (23 seconds ago)
Finished: Tue Apr 14 00:14:39 +0900 (18 seconds ago)
Duration: 5 seconds
STEP TEMPLATE PODNAME DURATION MESSAGE
✖ output-parameter-ms85z output-parameter child 'output-parameter-ms85z-1252809517' failed
└---⚠ generate-parameter whalesay output-parameter-ms85z-1252809517 5s failed to save outputs: Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too quickly
logs
$ kubectl logs output-parameter-ms85z-1252809517 -c wait
time="2020-04-13T15:14:35Z" level=info msg="Starting Workflow Executor" version=vv2.7.2+c52a65a.dirty
time="2020-04-13T15:14:35Z" level=info msg="Creating PNS executor (namespace: default, pod: output-parameter-ms85z-1252809517, pid: 6, hasOutputs: true)"
time="2020-04-13T15:14:35Z" level=info msg="Executor (version: vv2.7.2+c52a65a.dirty, build_date: 2020-04-10T19:30:34Z) initialized (pod: default/output-parameter-ms85z-1252809517) with template:\n{\"name\":\"whalesay\",\"arguments\":{},\"inputs\":{},\"outputs\":{\"parameters\":[{\"name\":\"hello-param\",\"valueFrom\":{\"path\":\"/tmp/hello_world.txt\",\"default\":\"Foobar\"}}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"sleep 1; echo -n hello world \\u003e /tmp/hello_world.txt\"],\"resources\":{}}}"
time="2020-04-13T15:14:35Z" level=info msg="Waiting on main container"
time="2020-04-13T15:14:35Z" level=warning msg="Polling root processes (1m0s)"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 53000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="Secured filehandle on /proc/20/root"
time="2020-04-13T15:14:38Z" level=info msg="containerID docker-3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911 mapped to pid 20"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 53000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="main container started with container ID: 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911"
time="2020-04-13T15:14:38Z" level=info msg="Starting annotations monitor"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=warning msg="Failed to wait for container id '3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911': Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too quickly"
time="2020-04-13T15:14:38Z" level=info msg="/argo/podmetadata/annotations updated"
time="2020-04-13T15:14:38Z" level=error msg="executor error: Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:289\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:162\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:829\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:828\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
time="2020-04-13T15:14:38Z" level=info msg="Saving output parameters"
time="2020-04-13T15:14:38Z" level=info msg="Saving path output parameter: hello-param"
time="2020-04-13T15:14:38Z" level=info msg="Copying /tmp/hello_world.txt from base image layer"
time="2020-04-13T15:14:38Z" level=info msg="Successfully saved output parameter: hello-param"
time="2020-04-13T15:14:38Z" level=info msg="No output artifacts"
time="2020-04-13T15:14:38Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2020-04-13T15:14:38Z" level=info msg="Annotating pod with output"
time="2020-04-13T15:14:38Z" level=info msg="Starting deadline monitor"
time="2020-04-13T15:14:38Z" level=info msg="Deadline monitor stopped"
time="2020-04-13T15:14:38Z" level=info msg="Annotations monitor stopped"
time="2020-04-13T15:14:38Z" level=info msg="pid 20: &{root 4096 2147484141 {53000000 63722387678 0x2d10800} {1048599 4188280 1 16877 0 0 0 0 4096 4096 8 {1586790878 249000000} {1586790878 53000000} {1586790878 168000000} [0 0 0]}}"
time="2020-04-13T15:14:38Z" level=info msg="Killing sidecars"
time="2020-04-13T15:14:38Z" level=info msg="Alloc=5794 TotalAlloc=13026 Sys=70080 NumGC=4 Goroutines=8"
time="2020-04-13T15:14:38Z" level=fatal msg="Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:289\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:162\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:829\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:828\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
logs
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)
$ kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)
time="2020-04-13T15:14:10Z" level=info msg="config map" name=workflow-controller-configmap
time="2020-04-13T15:14:10Z" level=info msg="Configuration:\nartifactRepository: {}\ncontainerRuntimeExecutor: pns\nfeatureFlags: {}\nmetricsConfig:\n disableLegacy: false\n enabled: true\n path: /metrics\n port: \"9090\"\npersistence:\n archive: true\n clusterName: local\n connectionPool:\n maxIdleConns: 100\n maxOpenConns: 0\n nodeStatusOffLoad: true\n postgresql:\n database: postgres\n host: postgres\n passwordSecret:\n key: password\n name: argo-postgres-config\n port: \"5432\"\n tableName: argo_workflows\n userNameSecret:\n key: username\n name: argo-postgres-config\npodSpecLogStrategy: {}\ntelemetryConfig:\n disableLegacy: false\n"
time="2020-04-13T15:14:10Z" level=info msg="Persistence configuration enabled"
time="2020-04-13T15:14:10Z" level=info msg="Creating DB session"
time="2020-04-13T15:14:10Z" level=info msg="Persistence Session created successfully"
time="2020-04-13T15:14:10Z" level=info msg="Migrating database schema" clusterName=local dbType=postgres
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="create table if not exists argo_workflows (\n id varchar(128) ,\n name varchar(256),\n phase varchar(25),\n namespace varchar(256),\n workflow text,\n startedat timestamp,\n finishedat timestamp,\n primary key (id, namespace)\n)" changeSchemaVersion=0
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="create unique index idx_name on argo_workflows (name)" changeSchemaVersion=1
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="create table if not exists argo_workflow_history (\n id varchar(128) ,\n name varchar(256),\n phase varchar(25),\n namespace varchar(256),\n workflow text,\n startedat timestamp,\n finishedat timestamp,\n primary key (id, namespace)\n)" changeSchemaVersion=2
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_workflow_history rename to argo_archived_workflows" changeSchemaVersion=3
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="drop index idx_name" changeSchemaVersion=4
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="create unique index idx_name on argo_workflows(name, namespace)" changeSchemaVersion=5
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_workflows drop constraint argo_workflows_pkey" changeSchemaVersion=6
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_workflows add primary key(name,namespace)" changeSchemaVersion=7
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows drop constraint argo_workflow_history_pkey" changeSchemaVersion=8
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows add primary key(id)" changeSchemaVersion=9
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows rename column id to uid" changeSchemaVersion=10
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column uid set not null" changeSchemaVersion=11
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column phase set not null" changeSchemaVersion=12
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column namespace set not null" changeSchemaVersion=13
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column workflow set not null" changeSchemaVersion=14
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column startedat set not null" changeSchemaVersion=15
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column finishedat set not null" changeSchemaVersion=16
time="2020-04-13T15:14:10Z" level=info msg="applying database change" change="alter table argo_archived_workflows add clustername varchar(64)" changeSchemaVersion=17
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="update argo_archived_workflows set clustername = 'local' where clustername is null" changeSchemaVersion=18
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column clustername set not null" changeSchemaVersion=19
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows drop constraint argo_archived_workflows_pkey" changeSchemaVersion=20
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows add primary key(clustername,uid)" changeSchemaVersion=21
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="create index argo_archived_workflows_i1 on argo_archived_workflows (clustername,namespace)" changeSchemaVersion=22
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows drop column phase" changeSchemaVersion=23
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows drop column startedat" changeSchemaVersion=24
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows drop column finishedat" changeSchemaVersion=25
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows rename column id to uid" changeSchemaVersion=26
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows alter column uid set not null" changeSchemaVersion=27
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows alter column namespace set not null" changeSchemaVersion=28
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows add column clustername varchar(64)" changeSchemaVersion=29
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="update argo_workflows set clustername = 'local' where clustername is null" changeSchemaVersion=30
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows alter column clustername set not null" changeSchemaVersion=31
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows add column version varchar(64)" changeSchemaVersion=32
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows add column nodes text" changeSchemaVersion=33
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="backfillNodes{argo_workflows}" changeSchemaVersion=34
time="2020-04-13T15:14:11Z" level=info msg="Backfill node status"
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows alter column nodes set not null" changeSchemaVersion=35
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows drop column workflow" changeSchemaVersion=36
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows add column updatedat timestamp not null default current_timestamp" changeSchemaVersion=37
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows drop constraint argo_workflows_pkey" changeSchemaVersion=38
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="drop index idx_name" changeSchemaVersion=39
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows drop column name" changeSchemaVersion=40
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows add primary key(clustername,uid,version)" changeSchemaVersion=41
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="create index argo_workflows_i1 on argo_workflows (clustername,namespace)" changeSchemaVersion=42
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column workflow type json using workflow::json" changeSchemaVersion=43
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column name set not null" changeSchemaVersion=44
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="create index argo_workflows_i2 on argo_workflows (clustername,namespace,updatedat)" changeSchemaVersion=45
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="create table if not exists argo_archived_workflows_labels (\n\tclustername varchar(64) not null,\n\tuid varchar(128) not null,\n name varchar(317) not null,\n value varchar(63) not null,\n primary key (clustername, uid, name),\n \tforeign key (clustername, uid) references argo_archived_workflows(clustername, uid) on delete cascade\n)" changeSchemaVersion=46
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_workflows alter column nodes type json using nodes::json" changeSchemaVersion=47
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows add column instanceid varchar(64)" changeSchemaVersion=48
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="update argo_archived_workflows set instanceid = '' where instanceid is null" changeSchemaVersion=49
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="alter table argo_archived_workflows alter column instanceid set not null" changeSchemaVersion=50
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="drop index argo_archived_workflows_i1" changeSchemaVersion=51
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="create index argo_archived_workflows_i1 on argo_archived_workflows (clustername,instanceid,namespace)" changeSchemaVersion=52
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="drop index argo_workflows_i1" changeSchemaVersion=53
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="drop index argo_workflows_i2" changeSchemaVersion=54
time="2020-04-13T15:14:11Z" level=info msg="applying database change" change="create index argo_workflows_i1 on argo_workflows (clustername,namespace,updatedat)" changeSchemaVersion=55
time="2020-04-13T15:14:11Z" level=info msg="Node status offloading config" ttl=5m0s
time="2020-04-13T15:14:11Z" level=info msg="Node status offloading is enabled"
time="2020-04-13T15:14:11Z" level=info msg="Workflow archiving is enabled"
time="2020-04-13T15:14:11Z" level=info msg="Starting prometheus metrics server at 0.0.0.0:9090/metrics"
time="2020-04-13T15:14:11Z" level=info msg="Starting Workflow Controller" version=vv2.7.2+c52a65a.dirty
time="2020-04-13T15:14:11Z" level=info msg="Workers: workflow: 8, pod: 8"
time="2020-04-13T15:14:11Z" level=info msg="Starting CronWorkflow controller"
time="2020-04-13T15:14:11Z" level=info msg="Starting workflow TTL controller (resync 20m0s)"
time="2020-04-13T15:14:11Z" level=info msg="Performing periodic GC every 5m0s"
time="2020-04-13T15:14:11Z" level=info msg="Started workflow TTL worker"
time="2020-04-13T15:14:34Z" level=info msg="Processing workflow" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:34Z" level=info msg="Updated phase -> Running" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:34Z" level=info msg="Steps node {output-parameter-ms85z output-parameter-ms85z output-parameter-ms85z Steps output-parameter nil Running 2020-04-13 15:14:34.923722135 +0000 UTC 0001-01-01 00:00:00 +0000 UTC <nil> nil nil [] []} initialized Running" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:34Z" level=info msg="StepGroup node {output-parameter-ms85z-3009655378 output-parameter-ms85z[0] [0] StepGroup output-parameter nil Running output-parameter-ms85z 2020-04-13 15:14:34.923765636 +0000 UTC 0001-01-01 00:00:00 +0000 UTC <nil> nil nil [] []} initialized Running" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:34Z" level=info msg="Pod node {output-parameter-ms85z-1252809517 output-parameter-ms85z[0].generate-parameter generate-parameter Pod whalesay nil Pending output-parameter-ms85z 2020-04-13 15:14:34.923959764 +0000 UTC 0001-01-01 00:00:00 +0000 UTC <nil> nil nil [] []} initialized Pending" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:34Z" level=info msg="Created pod: output-parameter-ms85z[0].generate-parameter (output-parameter-ms85z-1252809517)" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:34Z" level=info msg="Workflow step group node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34.923765636 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} not yet completed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:35Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3259 workflow=output-parameter-ms85z
time="2020-04-13T15:14:36Z" level=info msg="Processing workflow" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:36Z" level=info msg="Updating node &NodeStatus{ID:output-parameter-ms85z-1252809517,Name:output-parameter-ms85z[0].generate-parameter,DisplayName:generate-parameter,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Pending,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} message: ContainerCreating"
time="2020-04-13T15:14:36Z" level=info msg="Workflow step group node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} not yet completed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:36Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3265 workflow=output-parameter-ms85z
time="2020-04-13T15:14:37Z" level=info msg="Processing workflow" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:37Z" level=info msg="Workflow step group node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} not yet completed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:38Z" level=info msg="Processing workflow" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:38Z" level=info msg="Updating node &NodeStatus{ID:output-parameter-ms85z-1252809517,Name:output-parameter-ms85z[0].generate-parameter,DisplayName:generate-parameter,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Pending,BoundaryID:output-parameter-ms85z,Message:ContainerCreating,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} status Pending -> Running"
time="2020-04-13T15:14:38Z" level=info msg="Workflow step group node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} not yet completed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:38Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=3273 workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Processing workflow" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Setting node &NodeStatus{ID:output-parameter-ms85z-1252809517,Name:output-parameter-ms85z[0].generate-parameter,DisplayName:generate-parameter,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} outputs"
time="2020-04-13T15:14:39Z" level=info msg="Updating node &NodeStatus{ID:output-parameter-ms85z-1252809517,Name:output-parameter-ms85z[0].generate-parameter,DisplayName:generate-parameter,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:&Outputs{Parameters:[]Parameter{Parameter{Name:hello-param,Default:nil,Value:*Foobar,ValueFrom:&ValueFrom{Path:/tmp/hello_world.txt,JSONPath:,JQFilter:,Parameter:,Default:Foobar,},GlobalName:,},},Artifacts:[]Artifact{},Result:nil,},Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} status Running -> Error"
time="2020-04-13T15:14:39Z" level=info msg="Updating node &NodeStatus{ID:output-parameter-ms85z-1252809517,Name:output-parameter-ms85z[0].generate-parameter,DisplayName:generate-parameter,Type:Pod,TemplateName:whalesay,TemplateRef:nil,Phase:Error,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:&Outputs{Parameters:[]Parameter{Parameter{Name:hello-param,Default:nil,Value:*Foobar,ValueFrom:&ValueFrom{Path:/tmp/hello_world.txt,JSONPath:,JQFilter:,Parameter:,Default:Foobar,},GlobalName:,},},Artifacts:[]Artifact{},Result:nil,},Children:[],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} message: failed to save outputs: Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too quickly"
time="2020-04-13T15:14:39Z" level=info msg="Step group node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} deemed failed: child 'output-parameter-ms85z-1252809517' failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} phase Running -> Failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Failed,BoundaryID:output-parameter-ms85z,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} message: child 'output-parameter-ms85z-1252809517' failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="node &NodeStatus{ID:output-parameter-ms85z-3009655378,Name:output-parameter-ms85z[0],DisplayName:[0],Type:StepGroup,TemplateName:output-parameter,TemplateRef:nil,Phase:Failed,BoundaryID:output-parameter-ms85z,Message:child 'output-parameter-ms85z-1252809517' failed,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:2020-04-13 15:14:39.808738163 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-1252809517],OutboundNodes:[],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} finished: 2020-04-13 15:14:39.808738163 +0000 UTC" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="step group output-parameter-ms85z-3009655378 was unsuccessful: child 'output-parameter-ms85z-1252809517' failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Outbound nodes of output-parameter-ms85z-1252809517 is [output-parameter-ms85z-1252809517]" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Outbound nodes of output-parameter-ms85z is [output-parameter-ms85z-1252809517]" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="node &NodeStatus{ID:output-parameter-ms85z,Name:output-parameter-ms85z,DisplayName:output-parameter-ms85z,Type:Steps,TemplateName:output-parameter,TemplateRef:nil,Phase:Running,BoundaryID:,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-3009655378],OutboundNodes:[output-parameter-ms85z-1252809517],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} phase Running -> Failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="node &NodeStatus{ID:output-parameter-ms85z,Name:output-parameter-ms85z,DisplayName:output-parameter-ms85z,Type:Steps,TemplateName:output-parameter,TemplateRef:nil,Phase:Failed,BoundaryID:,Message:,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:0001-01-01 00:00:00 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-3009655378],OutboundNodes:[output-parameter-ms85z-1252809517],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} message: child 'output-parameter-ms85z-1252809517' failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="node &NodeStatus{ID:output-parameter-ms85z,Name:output-parameter-ms85z,DisplayName:output-parameter-ms85z,Type:Steps,TemplateName:output-parameter,TemplateRef:nil,Phase:Failed,BoundaryID:,Message:child 'output-parameter-ms85z-1252809517' failed,StartedAt:2020-04-13 15:14:34 +0000 UTC,FinishedAt:2020-04-13 15:14:39.808827328 +0000 UTC,PodIP:,Daemoned:nil,Inputs:nil,Outputs:nil,Children:[output-parameter-ms85z-3009655378],OutboundNodes:[output-parameter-ms85z-1252809517],StoredTemplateID:,WorkflowTemplateName:,TemplateScope:,ResourcesDuration:ResourcesDuration{},} finished: 2020-04-13 15:14:39.808827328 +0000 UTC" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Checking daemoned children of output-parameter-ms85z" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Updated phase Running -> Failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Updated message -> child 'output-parameter-ms85z-1252809517' failed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Marking workflow completed" namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Checking daemoned children of " namespace=default workflow=output-parameter-ms85z
time="2020-04-13T15:14:39Z" level=info msg="Workflow update successful" namespace=default phase=Failed resourceVersion=3279 workflow=output-parameter-ms85z
time="2020-04-13T15:14:40Z" level=info msg="Labeled pod default/output-parameter-ms85z-1252809517 completed"
Kubernetes pod events from kubectl describe.
$ kubectl describe pod output-parameter-ms85z-12528
09517
Name: output-parameter-ms85z-1252809517
Namespace: default
Priority: 0
Node: pfn-argo/192.168.99.107
Start Time: Tue, 14 Apr 2020 00:14:34 +0900
Labels: workflows.argoproj.io/completed=true
workflows.argoproj.io/workflow=output-parameter-ms85z
Annotations: workflows.argoproj.io/node-message:
Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too qu...
workflows.argoproj.io/node-name: output-parameter-ms85z[0].generate-parameter
workflows.argoproj.io/outputs:
{"parameters":[{"name":"hello-param","value":"Foobar","valueFrom":{"path":"/tmp/hello_world.txt","default":"Foobar"}}]}
workflows.argoproj.io/template:
{"name":"whalesay","arguments":{},"inputs":{},"outputs":{"parameters":[{"name":"hello-param","valueFrom":{"path":"/tmp/hello_world.txt","d...
Status: Failed
IP: 172.17.0.5
IPs: <none>
Controlled By: Workflow/output-parameter-ms85z
Containers:
wait:
Container ID: docker://e5f0e1af6b453b2c98af21a141a815f442bab3e3591be6704a3abd9716cb37c2
Image: argoproj/argoexec:v2.7.2
Image ID: docker-pullable://argoproj/argoexec@sha256:8e9c6440e7ef62977077632ca56acf0eb4d86b29ba5842a531a53e85ececc5ff
Port: <none>
Host Port: <none>
Command:
argoexec
wait
State: Terminated
Reason: Error
Message: Failed to determine pid for containerID 3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911: container may have exited too quickly
Exit Code: 1
Started: Tue, 14 Apr 2020 00:14:35 +0900
Finished: Tue, 14 Apr 2020 00:14:38 +0900
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: output-parameter-ms85z-1252809517 (v1:metadata.name)
ARGO_CONTAINER_RUNTIME_EXECUTOR: pns
Mounts:
/argo/podmetadata from podmetadata (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2zw64 (ro)
main:
Container ID: docker://3e76e9a21e2dad264e3e71c06ccfc5fe9966ce0bf95a2ebfe19ba3f2a8690911
Image: docker/whalesay:latest
Image ID: docker-pullable://docker/whalesay@sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
sleep 1; echo -n hello world > /tmp/hello_world.txt
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 14 Apr 2020 00:14:38 +0900
Finished: Tue, 14 Apr 2020 00:14:39 +0900
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2zw64 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
default-token-2zw64:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-2zw64
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m2s default-scheduler Successfully assigned default/output-parameter-ms85z-1252809517 to pfn-argo
Normal Pulled 3m1s kubelet, pfn-argo Container image "argoproj/argoexec:v2.7.2" already present on machine
Normal Created 3m1s kubelet, pfn-argo Created container
Normal Started 3m1s kubelet, pfn-argo Started container
Normal Pulling 3m1s kubelet, pfn-argo pulling image "docker/whalesay:latest"
Normal Pulled 2m58s kubelet, pfn-argo Successfully pulled image "docker/whalesay:latest"
Normal Created 2m58s kubelet, pfn-argo Created container
Normal Started 2m58s kubelet, pfn-argo Started container
Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
I think the deadline in the following code should be configurable.
I realized my assumption is wrong.
The PNS executor waits on initializing init containers.
However, init containers don't exist in a specific situation.
I got it. Init containers have nothing to do with the PNS executor.
WaitInit doesn't block at all here while the name of the function sounds waiting.
https://github.com/argoproj/argo/blob/79217bc89e892ee82bdd5018b2bba65425924d36/workflow/executor/pns/pns.go#L149
Is it expected behavior?
looks like WaitIniti initializing step for Wait operation in Wait container. Comments of the function is clear. In PNSExecutor case, It is trying to secure the "FS" for outputs.
// WaitInit is called before Wait() to signal the executor about an impending Wait call.
// For most executors this is a noop, and is only used by the the PNS executor
WaitInit() error
I think the go function doesn't wait for the main container.
I've encountered this problem now even when storing outputs in emptyDirs. The problem for me isn't actually saving the outputs, it says it successfully saved the outputs. Here's the wait container logs.
time="2020-04-29T12:24:53Z" level=info msg="Starting Workflow Executor" version=vv2.7.1+2a3f59c.dirty
time="2020-04-29T12:24:53Z" level=info msg="Creating PNS executor (namespace: acquia-polaris-system, pod: code-redeploy-qsmpx-2720976418, pid: 6, hasOutputs: true)"
time="2020-04-29T12:24:53Z" level=info msg="Executor (version: vv2.7.1+2a3f59c.dirty, build_date: 2020-04-07T16:44:22Z) initialized (pod: acquia-polaris-system/code-redeploy-qsmpx-2720976418) with templat
e:\n{\"name\":\"get-envs\",\"arguments\":{},\"inputs\":{},\"outputs\":{\"parameters\":[{\"name\":\"json\",\"valueFrom\":{\"path\":\"/tmp/out.json\"}}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image
\":\"881217801864.dkr.ecr.us-east-1.amazonaws.com/fn-taskstep:latest\",\"args\":[\"get-envs-by-git\",\"[email protected]:acquia/fn-example-repos.git\",\"e2e-d8-build-code-deploy\"],\"resources\":{},\"volumeM
ounts\":[{\"name\":\"out\",\"mountPath\":\"/tmp\"}]},\"volumes\":[{\"name\":\"out\",\"emptyDir\":{}}],\"retryStrategy\":{\"limit\":4}}"
time="2020-04-29T12:24:53Z" level=info msg="Waiting on main container"
time="2020-04-29T12:24:53Z" level=warning msg="Polling root processes (1m0s)"
time="2020-04-29T12:24:53Z" level=warning msg="Failed to stat /proc/27/root: stat /proc/27/root: permission denied"
time="2020-04-29T12:25:08Z" level=info msg="main container started with container ID: 0e63f624ba94dbf5b9dc69ee2ad28d2183ce7d2dbea4bfcd50a534ce343bdce9"
time="2020-04-29T12:25:08Z" level=info msg="Starting annotations monitor"
time="2020-04-29T12:25:08Z" level=warning msg="Failed to wait for container id '0e63f624ba94dbf5b9dc69ee2ad28d2183ce7d2dbea4bfcd50a534ce343bdce9': Failed to determine pid for containerID 0e63f624ba94dbf5b
9dc69ee2ad28d2183ce7d2dbea4bfcd50a534ce343bdce9: container may have exited too quickly"
time="2020-04-29T12:25:08Z" level=error msg="executor error: Failed to determine pid for containerID 0e63f624ba94dbf5b9dc69ee2ad28d2183ce7d2dbea4bfcd50a534ce343bdce9: container may have exited too quickly
\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/ar
goproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo
/workflow/executor/pns/pns.go:289\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:162\ngithub.com/argoproj/argo/workflo
w/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:829\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/apimachinery@v0.
16.7-beta.0/pkg/util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:828\ngithub.com/argoproj/argo/c
md/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/
argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command
).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]
4-fe5e611709b0/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm
_amd64.s:1357"
time="2020-04-29T12:25:08Z" level=info msg="Saving output parameters"
time="2020-04-29T12:25:08Z" level=info msg="Saving path output parameter: json"
time="2020-04-29T12:25:08Z" level=info msg="Copying /tmp/out.json from from volume mount"
time="2020-04-29T12:25:08Z" level=info msg="Successfully saved output parameter: json"
time="2020-04-29T12:25:08Z" level=info msg="No output artifacts"
time="2020-04-29T12:25:08Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2020-04-29T12:25:08Z" level=info msg="Annotating pod with output"
time="2020-04-29T12:25:08Z" level=info msg="/argo/podmetadata/annotations updated"
time="2020-04-29T12:25:08Z" level=info msg="Starting deadline monitor"
time="2020-04-29T12:25:08Z" level=info msg="Deadline monitor stopped"
time="2020-04-29T12:25:08Z" level=info msg="Killing sidecars"
time="2020-04-29T12:25:08Z" level=info msg="Alloc=4301 TotalAlloc=20985 Sys=71360 NumGC=7 Goroutines=9"
time="2020-04-29T12:25:08Z" level=fatal msg="Failed to determine pid for containerID 0e63f624ba94dbf5b9dc69ee2ad28d2183ce7d2dbea4bfcd50a534ce343bdce9: container may have exited too quickly\ngithub.com/arg
oproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/erro
rs.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/execut
or/pns/pns.go:289\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:162\ngithub.com/argoproj/argo/workflow/executor.(*Wor
kflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:829\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/
util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:828\ngithub.com/argoproj/argo/cmd/argoexec/comm
ands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexe
c/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/g
o/pkg/mod/github.com/spf13/[email protected]/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/c
ommand.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
The reason on the workflow does state it couldn't save outputs, but this doesn't seem right given the logs above.
failed to save outputs: Failed to determine pid for containerID 0e63f624ba94dbf5b9dc69ee2ad28d2183ce7d2dbea4bfcd50a534ce343bdce9: container may have exited too quickly
It looks like the issue is at https://github.com/argoproj/argo/blob/master/workflow/executor/pns/pns.go#L333 . For some reason, intermittently, the Stat is failing with "permission denied". Then the next time checking in the map it still isn't found, so the same error that happens when outputs can't be saved gets thrown, but the real issue in my case is intermittent permission denied....
If it helps, I'm running on microk8s locally on ubuntu. All permissions are set up normally on a microk8s install, my user is in the microk8s group, the pod I'm launching does not have a securityContext set. Also, I'm running version 2.7.1. I can upgrade and see if that helps.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
i'm still hitting this issue.
I'm getting this with runAsNonRoot
I'm also having similar issue
PNS executor needs to run-as-root. Check that?
PNS executor needs to run-as-root. Check that?
It does not if you are on containerd or cri-o and modify the executor image a bit. Docker has some bugs regarding capabilities without root afaik. I am using it rootless with pod security policies. Do you want me to make a pull request with the necessary changes? It depends a bit on
https://github.com/argoproj/argo/pull/3785#issuecomment-676052063
@juliusvonkohout can I get you to add your security policy to #4186 as the first step so we can understand the solution?
@wreed @juliusvonkohout I've noted that when ones run PNS with runAsNonRoot and output artifacts, then you get the permission denied error, files are copied, but the wait container cannot determine when the main container to exit.
Most helpful comment
I've encountered this problem now even when storing outputs in emptyDirs. The problem for me isn't actually saving the outputs, it says it successfully saved the outputs. Here's the wait container logs.
The reason on the workflow does state it couldn't save outputs, but this doesn't seem right given the logs above.
It looks like the issue is at https://github.com/argoproj/argo/blob/master/workflow/executor/pns/pns.go#L333 . For some reason, intermittently, the Stat is failing with "permission denied". Then the next time checking in the map it still isn't found, so the same error that happens when outputs can't be saved gets thrown, but the real issue in my case is intermittent permission denied....
If it helps, I'm running on microk8s locally on ubuntu. All permissions are set up normally on a microk8s install, my user is in the microk8s group, the pod I'm launching does not have a securityContext set. Also, I'm running version 2.7.1. I can upgrade and see if that helps.