What happened/what you expected to happen?
I am trying to execute a workflow, and the ui return this error.
I am expecting that i can see workflow ui correctly.
http://localhost:2746/workflows/default/xxxxxx-pipeline-xxxx

What version of Argo Workflows are you running?
v2.11.0
Paste the workflow here, including status:
kubectl get wf -o yaml ${workflow}
Paste the logs from the workflow controller:
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name) | grep ${workflow}
Message from the maintainers:
Impacted by this bug? Give it a 馃憤. We prioritise the issues with the most 馃憤.
From dev console view:

@simster7 is this fixed by #4028?
@simster7 @sarabala1979 @jessesuen it is pretty clean than node.children can contain node IDs that do not actually exist. We need to make sure we always tolerate these.
@simster7 does #4028 work for completed workflows?
Yes this is fixed
By #4028 and #3950
@simster7 Installed Argo this morning and I still get this error. Is it fixed ? Any suggesions?
I have the same issue when navigating to a workflow with a failed step.
20-09-22 15:00 ~ kubectl -n argo exec argo-server-7967846856-kwgsf -c argo-server -- argo version
argo: v2.11.0
BuildDate: 2020-09-17T22:00:47Z
GitCommit: f8e750de5ebab6f3c494c972889b31ef24c73c9b
GitTreeState: clean
GitTag: v2.11.0
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64
Update:
Actually I'm getting this error for all complet workflows.
The problem solved for me after updating the CRD.
Updates:
The issue is intermittent still.
I can confirm that argo-server v2.10.2 works fine.
@simster7 it sounds like this may not be fixed?
Will take a look
I see that #4028 is not included in 2.11, but I will investigate to see if this is reproducible in master.
@ediezh @saranyaeu2987 Can you guys attach workflows to reproduce this bug? Preferably "live" workflows that have already been run and have a "Status:" field. You can get them by running kubectl get wf <NAME> -o yaml
Inspecting the code, I think that waitOnUpdate will still result in an error at it returns null. I think we should add a null check here. Maybe you can even remove waitOnUpdate then.
@simster7
kubectl -n argo1 get wf hello-spark-databricks-steps-rtpl9 -o yaml
Attached
wf.txt
@saranyaeu2987 Thanks for attaching the workflow. Question: is this workflow the result of Retrying a previously failed workflow?
The issue is that the workflow is malformed (a node that is referenced by a parent node does not actually exist). This was intended to be fixed by #4028
I think that waitOnUpdate will still result in an error at it returns null.
The code calls setState before returning null, which is supposed to trigger a new React render. This is not happening for some reason, will investigate.
which is supposed to trigger a new React render.
this is probably happening, but after the error is thrown
Ideally this is fixed by the back-end/data-cleaning changes already made in #4028
To add a fail-safe to the UI I have three options, as I no longer think the waitForUpdate fix is exhaustive.
I prefer either (1) or (2). Thoughts?
@simster7
Question: is this workflow the result of Retrying a previously failed workflow?
Its a new submitted workflow.
The issue is that the workflow is malformed (a node that is referenced by a parent node does not actually exist).
Can you please explain me this. This error happens randomly and error happens within a couple of seconds.
I am able to run this workflow on resubmission. How can parent node doesnt exists for a newly submitted job?
_On a different note:_
I have my workflows failed due to " Failed to load logs: Internal Server Error" (though the workflow pod is successful in k8s). I have described in https://github.com/argoproj/argo/issues/4092.
P.S - both https://github.com/argoproj/argo/issues/4092 and https://github.com/argoproj/argo/issues/4079 uses same workflow
Can you please explain me this. This error happens randomly and error happens within a couple of seconds.
I am able to run this workflow on resubmission. How can parent node doesnt exists for a newly submitted job?
To clarify, this means that Argo's internal data structures are malformed. This does not mean that the user (you) submitted a malformed workflow. When that happens you get a lint/syntax error
@simster7 inspected some of my workflows, some of them have the same problem you mentioned above(missing children nodes). But I also get errors with the succeeded workflows(all children nodes seem to exist). See attached.
@ediezh Thanks! Was able to reproduce with your workflow as well. Will fix
@ediezh I have included a fix for that issue in #4099
Thanks @simster7