Argo: Cannot read property type of undefined

Created on 21 Sep 2020  路  23Comments  路  Source: argoproj/argo

Summary

What happened/what you expected to happen?
I am trying to execute a workflow, and the ui return this error.
I am expecting that i can see workflow ui correctly.
http://localhost:2746/workflows/default/xxxxxx-pipeline-xxxx

image

Diagnostics

What version of Argo Workflows are you running?
v2.11.0

Paste the workflow here, including status:
kubectl get wf -o yaml ${workflow} 
Paste the logs from the workflow controller:
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name) | grep ${workflow}


Message from the maintainers:

Impacted by this bug? Give it a 馃憤. We prioritise the issues with the most 馃憤.

bug

All 23 comments

From dev console view:
DevTools_-_argo_workflows_

@simster7 is this fixed by #4028?

@simster7 @sarabala1979 @jessesuen it is pretty clean than node.children can contain node IDs that do not actually exist. We need to make sure we always tolerate these.

@simster7 does #4028 work for completed workflows?

Yes this is fixed

By #4028 and #3950

@simster7 Installed Argo this morning and I still get this error. Is it fixed ? Any suggesions?

I have the same issue when navigating to a workflow with a failed step.

 20-09-22 15:00  ~  kubectl -n argo exec argo-server-7967846856-kwgsf -c argo-server -- argo version
argo: v2.11.0
  BuildDate: 2020-09-17T22:00:47Z
  GitCommit: f8e750de5ebab6f3c494c972889b31ef24c73c9b
  GitTreeState: clean
  GitTag: v2.11.0
  GoVersion: go1.13.4
  Compiler: gc
  Platform: linux/amd64

Update:
Actually I'm getting this error for all complet workflows.

The problem solved for me after updating the CRD.

Updates:
The issue is intermittent still.

I can confirm that argo-server v2.10.2 works fine.

@simster7 it sounds like this may not be fixed?

Will take a look

I see that #4028 is not included in 2.11, but I will investigate to see if this is reproducible in master.

@ediezh @saranyaeu2987 Can you guys attach workflows to reproduce this bug? Preferably "live" workflows that have already been run and have a "Status:" field. You can get them by running kubectl get wf <NAME> -o yaml

Inspecting the code, I think that waitOnUpdate will still result in an error at it returns null. I think we should add a null check here. Maybe you can even remove waitOnUpdate then.

@simster7
kubectl -n argo1 get wf hello-spark-databricks-steps-rtpl9 -o yaml

Attached
wf.txt

@saranyaeu2987 Thanks for attaching the workflow. Question: is this workflow the result of Retrying a previously failed workflow?

The issue is that the workflow is malformed (a node that is referenced by a parent node does not actually exist). This was intended to be fixed by #4028

I think that waitOnUpdate will still result in an error at it returns null.

The code calls setState before returning null, which is supposed to trigger a new React render. This is not happening for some reason, will investigate.

which is supposed to trigger a new React render.

this is probably happening, but after the error is thrown

Ideally this is fixed by the back-end/data-cleaning changes already made in #4028

To add a fail-safe to the UI I have three options, as I no longer think the waitForUpdate fix is exhaustive.

  1. "Silent fail": if a node has children of nodes that don't exist, ignore them silently by simply not rendering them.
  2. "Soft fail": if a node has children of nodes that don't exist, render a placeholder node (probably a node with a "?" or similar icon), and when clicked explain that the node is not present and the workflow is malformed.
  3. "Hard fail": if a node has children of nodes that don't exist, error out the rendering of the whole workflow and provide a message that the workflow is malformed instead.

I prefer either (1) or (2). Thoughts?

@simster7

Question: is this workflow the result of Retrying a previously failed workflow?

Its a new submitted workflow.

The issue is that the workflow is malformed (a node that is referenced by a parent node does not actually exist).

Can you please explain me this. This error happens randomly and error happens within a couple of seconds.
I am able to run this workflow on resubmission. How can parent node doesnt exists for a newly submitted job?

_On a different note:_
I have my workflows failed due to " Failed to load logs: Internal Server Error" (though the workflow pod is successful in k8s). I have described in https://github.com/argoproj/argo/issues/4092.
P.S - both https://github.com/argoproj/argo/issues/4092 and https://github.com/argoproj/argo/issues/4079 uses same workflow

Can you please explain me this. This error happens randomly and error happens within a couple of seconds.
I am able to run this workflow on resubmission. How can parent node doesnt exists for a newly submitted job?

To clarify, this means that Argo's internal data structures are malformed. This does not mean that the user (you) submitted a malformed workflow. When that happens you get a lint/syntax error

@simster7 inspected some of my workflows, some of them have the same problem you mentioned above(missing children nodes). But I also get errors with the succeeded workflows(all children nodes seem to exist). See attached.

wf.txt

@ediezh Thanks! Was able to reproduce with your workflow as well. Will fix

@ediezh I have included a fix for that issue in #4099

Thanks @simster7

Was this page helpful?
0 / 5 - 0 ratings