Argo: Step or dag workflows do not seem to release semaphore locks

Created on 16 Sep 2020  ·  16Comments  ·  Source: argoproj/argo

Summary

What happened/what you expected to happen?
Running synchronization-tmpl-level.yam the locks are acquired, but are not released once the step is finished. The workflow keeps running and the other steps are waiting with Message Waiting for default/configmap/workflow-synchronization/template lock. Lock status: 0/2 (same behavior with DAG). Synchronization works on workflow level.

Diagnostics

What version of Argo Workflows are you running?
v2.10.2

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  annotations:
    argo: workflows
  creationTimestamp: "2020-09-16T06:59:40Z"
  generateName: synchronization-tmpl-level-
  generation: 8
  labels:
    workflows.argoproj.io/phase: Running
  managedFields:
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
      f:spec:
        .: {}
        f:arguments: {}
        f:entrypoint: {}
        f:templates: {}
      f:status:
        .: {}
        f:finishedAt: {}
    manager: argo
    operation: Update
    time: "2020-09-16T06:59:40Z"
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:argo: {}
        f:labels:
          .: {}
          f:workflows.argoproj.io/phase: {}
      f:spec:
        f:parallelism: {}
        f:serviceAccountName: {}
        f:ttlStrategy:
          .: {}
          f:secondsAfterCompletion: {}
          f:secondsAfterFailure: {}
          f:secondsAfterSuccess: {}
      f:status:
        f:nodes:
          .: {}
          f:synchronization-tmpl-level-xxgc2:
            .: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:synchronization-tmpl-level-xxgc2-327139691:
            .: {}
            f:boundaryID: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:synchronization-tmpl-level-xxgc2-633772542:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:message: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:synchronization-tmpl-level-xxgc2-1878609776:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:message: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:synchronization-tmpl-level-xxgc2-2314512256:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:message: {}
            f:name: {}
            f:phase: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:synchronization-tmpl-level-xxgc2-2913002658:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:hostNodeName: {}
            f:id: {}
            f:name: {}
            f:outputs:
              .: {}
              f:artifacts: {}
              f:exitCode: {}
            f:phase: {}
            f:resourcesDuration:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:synchronization-tmpl-level-xxgc2-3085788296:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:hostNodeName: {}
            f:id: {}
            f:name: {}
            f:outputs:
              .: {}
              f:artifacts: {}
              f:exitCode: {}
            f:phase: {}
            f:resourcesDuration:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
        f:phase: {}
        f:startedAt: {}
        f:synchronization:
          .: {}
          f:semaphore:
            .: {}
            f:holding: {}
            f:waiting: {}
    manager: workflow-controller
    operation: Update
    time: "2020-09-16T07:00:01Z"
  name: synchronization-tmpl-level-xxgc2
  namespace: default
  resourceVersion: "39383194"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/default/workflows/synchronization-tmpl-level-xxgc2
  uid: 1630345b-c478-4013-b4e7-1435c5ba901c
spec:
  arguments: {}
  entrypoint: synchronization-tmpl-level-example
  parallelism: 3
  serviceAccountName: argo
  templates:
  - arguments: {}
    inputs: {}
    metadata: {}
    name: synchronization-tmpl-level-example
    outputs: {}
    steps:
    - - arguments:
          parameters:
          - name: seconds
            value: '{{item}}'
        name: synchronization-acquire-lock
        template: acquire-lock
        withParam: '["1","2","3","4","5"]'
  - arguments: {}
    container:
      args:
      - sleep 10; echo acquired lock
      command:
      - sh
      - -c
      image: alpine:latest
      name: ""
      resources: {}
    inputs: {}
    metadata: {}
    name: acquire-lock
    outputs: {}
    synchronization:
      semaphore:
        configMapKeyRef:
          key: template
          name: workflow-synchronization
  ttlStrategy:
    secondsAfterCompletion: 600
    secondsAfterFailure: 43200
    secondsAfterSuccess: 600
status:
  finishedAt: null
  nodes:
    synchronization-tmpl-level-xxgc2:
      children:
      - synchronization-tmpl-level-xxgc2-327139691
      displayName: synchronization-tmpl-level-xxgc2
      finishedAt: null
      id: synchronization-tmpl-level-xxgc2
      name: synchronization-tmpl-level-xxgc2
      phase: Running
      startedAt: "2020-09-16T06:59:40Z"
      templateName: synchronization-tmpl-level-example
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: Steps
    synchronization-tmpl-level-xxgc2-327139691:
      boundaryID: synchronization-tmpl-level-xxgc2
      children:
      - synchronization-tmpl-level-xxgc2-3085788296
      - synchronization-tmpl-level-xxgc2-2913002658
      - synchronization-tmpl-level-xxgc2-1878609776
      - synchronization-tmpl-level-xxgc2-633772542
      - synchronization-tmpl-level-xxgc2-2314512256
      displayName: '[0]'
      finishedAt: null
      id: synchronization-tmpl-level-xxgc2-327139691
      name: synchronization-tmpl-level-xxgc2[0]
      phase: Running
      startedAt: "2020-09-16T06:59:40Z"
      templateName: synchronization-tmpl-level-example
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: StepGroup
    synchronization-tmpl-level-xxgc2-633772542:
      boundaryID: synchronization-tmpl-level-xxgc2
      displayName: synchronization-acquire-lock(3:4)
      finishedAt: null
      id: synchronization-tmpl-level-xxgc2-633772542
      message: 'Waiting for default/configmap/workflow-synchronization/template
        lock. Lock status: 0/2 '
      name: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(3:4)
      phase: Pending
      startedAt: "2020-09-16T06:59:40Z"
      templateName: acquire-lock
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: Pod
    synchronization-tmpl-level-xxgc2-1878609776:
      boundaryID: synchronization-tmpl-level-xxgc2
      displayName: synchronization-acquire-lock(2:3)
      finishedAt: null
      id: synchronization-tmpl-level-xxgc2-1878609776
      message: 'Waiting for default/configmap/workflow-synchronization/template
        lock. Lock status: 0/2 '
      name: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(2:3)
      phase: Pending
      startedAt: "2020-09-16T06:59:40Z"
      templateName: acquire-lock
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: Pod
    synchronization-tmpl-level-xxgc2-2314512256:
      boundaryID: synchronization-tmpl-level-xxgc2
      displayName: synchronization-acquire-lock(4:5)
      finishedAt: null
      id: synchronization-tmpl-level-xxgc2-2314512256
      message: 'Waiting for default/configmap/workflow-synchronization/template
        lock. Lock status: 0/2 '
      name: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(4:5)
      phase: Pending
      startedAt: "2020-09-16T06:59:40Z"
      templateName: acquire-lock
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: Pod
    synchronization-tmpl-level-xxgc2-2913002658:
      boundaryID: synchronization-tmpl-level-xxgc2
      displayName: synchronization-acquire-lock(1:2)
      finishedAt: "2020-09-16T06:59:56Z"
      hostNodeName: eoc-gzs-pn02-vm
      id: synchronization-tmpl-level-xxgc2-2913002658
      name: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(1:2)
      outputs:
        artifacts:
        - archiveLogs: true
          name: main-logs
          s3:
            accessKeySecret:
              key: accesskey
              name: artifact-s3-secret
            bucket: gzs-workflow-artifacts
            endpoint: artifact-minio-service:9000
            insecure: true
            key: default/synchronization-tmpl-level-xxgc2/synchronization-tmpl-level-xxgc2-2913002658/main.log
            secretKeySecret:
              key: secretkey
              name: artifact-s3-secret
        exitCode: "0"
      phase: Succeeded
      resourcesDuration:
        cpu: 23
        memory: 23
      startedAt: "2020-09-16T06:59:40Z"
      templateName: acquire-lock
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: Pod
    synchronization-tmpl-level-xxgc2-3085788296:
      boundaryID: synchronization-tmpl-level-xxgc2
      displayName: synchronization-acquire-lock(0:1)
      finishedAt: "2020-09-16T06:59:59Z"
      hostNodeName: eoc-gzs-pn02-vm
      id: synchronization-tmpl-level-xxgc2-3085788296
      name: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(0:1)
      outputs:
        artifacts:
        - archiveLogs: true
          name: main-logs
          s3:
            accessKeySecret:
              key: accesskey
              name: artifact-s3-secret
            bucket: gzs-workflow-artifacts
            endpoint: artifact-minio-service:9000
            insecure: true
            key: default/synchronization-tmpl-level-xxgc2/synchronization-tmpl-level-xxgc2-3085788296/main.log
            secretKeySecret:
              key: secretkey
              name: artifact-s3-secret
        exitCode: "0"
      phase: Succeeded
      resourcesDuration:
        cpu: 26
        memory: 26
      startedAt: "2020-09-16T06:59:40Z"
      templateName: acquire-lock
      templateScope: local/synchronization-tmpl-level-xxgc2
      type: Pod
  phase: Running
  startedAt: "2020-09-16T06:59:40Z"
  synchronization:
    semaphore:
      holding:
      - holders:
        - synchronization-tmpl-level-xxgc2-3085788296
        - synchronization-tmpl-level-xxgc2-2913002658
        semaphore: default/configmap/workflow-synchronization/template
      waiting:
      - holders:
        - default/synchronization-tmpl-level-xxgc2/synchronization-tmpl-level-xxgc2-3085788296
        - default/synchronization-tmpl-level-xxgc2/synchronization-tmpl-level-xxgc2-2913002658
        semaphore: default/configmap/workflow-synchronization/template
Paste the logs from the workflow controller:
time="2020-09-16T06:59:40Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Updated phase  -> Running" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Steps node synchronization-tmpl-level-xxgc2 initialized Running" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="StepGroup node synchronization-tmpl-level-xxgc2-327139691 initialized Running" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="default/configmap/workflow-synchronization/template acquired by default/synchronization-tmpl-level-xxgc2/synchronization-tmpl-level-xxgc2-3085788296 " semaphore=default/configmap/workflow-synchronization/template
time="2020-09-16T06:59:40Z" level=info msg="Node synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(0:1) acquired synchronization lock" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Pod node synchronization-tmpl-level-xxgc2-3085788296 initialized Pending" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Created pod: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(0:1) (synchronization-tmpl-level-xxgc2-3085788296)" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="default/configmap/workflow-synchronization/template acquired by default/synchronization-tmpl-level-xxgc2/synchronization-tmpl-level-xxgc2-2913002658 " semaphore=default/configmap/workflow-synchronization/template
time="2020-09-16T06:59:40Z" level=info msg="Node synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(1:2) acquired synchronization lock" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Pod node synchronization-tmpl-level-xxgc2-2913002658 initialized Pending" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Created pod: synchronization-tmpl-level-xxgc2[0].synchronization-acquire-lock(1:2) (synchronization-tmpl-level-xxgc2-2913002658)" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Pod node synchronization-tmpl-level-xxgc2-1878609776 initialized Pending" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Pod node synchronization-tmpl-level-xxgc2-633772542 initialized Pending" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Pod node synchronization-tmpl-level-xxgc2-2314512256 initialized Pending" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Workflow step group node synchronization-tmpl-level-xxgc2-327139691 not yet completed" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:40Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383030 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:41Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:41Z" level=info msg="Updating node synchronization-tmpl-level-xxgc2-3085788296 message: ContainerCreating"
time="2020-09-16T06:59:41Z" level=info msg="Updating node synchronization-tmpl-level-xxgc2-2913002658 message: ContainerCreating"
time="2020-09-16T06:59:41Z" level=info msg="workflow active pod spec parallelism reached 5/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:41Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:41Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383043 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:42Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:42Z" level=info msg="workflow active pod spec parallelism reached 5/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:42Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:42Z" level=info msg="insignificant pod change" key=default/synchronization-tmpl-level-xxgc2-2913002658
time="2020-09-16T06:59:42Z" level=info msg="insignificant pod change" key=default/synchronization-tmpl-level-xxgc2-3085788296
time="2020-09-16T06:59:47Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:47Z" level=info msg="Updating node synchronization-tmpl-level-xxgc2-2913002658 status Pending -> Running"
time="2020-09-16T06:59:47Z" level=info msg="workflow active pod spec parallelism reached 5/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:47Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:47Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383086 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:48Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:48Z" level=info msg="workflow active pod spec parallelism reached 5/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:48Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:51Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:51Z" level=info msg="Updating node synchronization-tmpl-level-xxgc2-3085788296 status Pending -> Running"
time="2020-09-16T06:59:51Z" level=info msg="workflow active pod spec parallelism reached 5/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:51Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:51Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383104 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:52Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:52Z" level=info msg="workflow active pod spec parallelism reached 5/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:52Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:56Z" level=info msg="insignificant pod change" key=default/synchronization-tmpl-level-xxgc2-2913002658
time="2020-09-16T06:59:58Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:58Z" level=info msg="Setting node synchronization-tmpl-level-xxgc2-2913002658 outputs"
time="2020-09-16T06:59:58Z" level=info msg="Updating node synchronization-tmpl-level-xxgc2-2913002658 status Running -> Succeeded"
time="2020-09-16T06:59:58Z" level=info msg="workflow active pod spec parallelism reached 4/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:58Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:58Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383134 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:59Z" level=info msg="insignificant pod change" key=default/synchronization-tmpl-level-xxgc2-3085788296
time="2020-09-16T06:59:59Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:59Z" level=info msg="Setting node synchronization-tmpl-level-xxgc2-3085788296 outputs"
time="2020-09-16T06:59:59Z" level=info msg="Labeled pod default/synchronization-tmpl-level-xxgc2-2913002658 completed"
time="2020-09-16T06:59:59Z" level=info msg="workflow active pod spec parallelism reached 4/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:59Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T06:59:59Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383142 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:00Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:00Z" level=info msg="workflow active pod spec parallelism reached 4/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:00Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:00Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:00Z" level=info msg="workflow active pod spec parallelism reached 4/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:00Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:01Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:01Z" level=info msg="Updating node synchronization-tmpl-level-xxgc2-3085788296 status Running -> Succeeded"
time="2020-09-16T07:00:01Z" level=info msg="workflow active pod spec parallelism reached 3/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:01Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:01Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=39383194 workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:02Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:02Z" level=info msg="Labeled pod default/synchronization-tmpl-level-xxgc2-3085788296 completed"
time="2020-09-16T07:00:02Z" level=info msg="workflow active pod spec parallelism reached 3/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:02Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:22Z" level=info msg="Processing workflow" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:22Z" level=info msg="workflow active pod spec parallelism reached 3/3" namespace=default workflow=synchronization-tmpl-level-xxgc2
time="2020-09-16T07:00:22Z" level=error msg="error in entry template execution" error="Max parallelism reached" namespace=default workflow=synchronization-tmpl-level-xxgc2


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

bug more-information-needed workaround works-for-me

Most helpful comment

@simster7 @sarabala1979 this looks like an issue that makes semaphores unusable - how can we quickly get this fixed, back-ported and released?

I am currently working on a fix and refactor of the code, as I've found multiple issues with it.

To be clear, this does not make semaphores unusable – it only makes semaphores unusable _while using parallelism_ at the same time.

All 16 comments

I believe the parallelism: 3 is conflicting with the semaphore code for some reason

Found the bug, fixing

We've also seen this issue when a pod that has acquired the lock and is running, but the workflow gets deleted during this phase. The lock is never released.

Probably some sort of cleanup is also required during workflow deletion

We've also seen this issue when a pod that has acquired the lock and is running, but the workflow gets deleted during this phase. The lock is never released.

Yup, this will be included as part of this bug fix

@simster7 @sarabala1979 this looks like an issue that makes semaphores unusable - how can we quickly get this fixed, back-ported and released?

@simster7 @sarabala1979 this looks like an issue that makes semaphores unusable - how can we quickly get this fixed, back-ported and released?

I am currently working on a fix and refactor of the code, as I've found multiple issues with it.

To be clear, this does not make semaphores unusable – it only makes semaphores unusable _while using parallelism_ at the same time.

Dropping to P3 as work-around would be to either not use parallelism or not use semaphores.

@simster7 Any workaround for the lock acquired during workflow issue. Is there a way to manually reset the lock?

@simster7 Any workaround for the lock acquired during workflow issue. Is there a way to manually reset the lock?

Restarting the controller seems like the only way, unfortunately. Will try to get a fix out soon.

We use argo 2.7.1 and we also noticed this problem on our cluster, but we already removed parallelism in our workflow... A restart of the workflow-controller like said above does fix our issue.

We think that a deletion of a running workflow does not free the lock but we are not 100% sure of that... We are also using some workflowgc to delete a completed workflow 5 mins after completion so maybe It can cause some issues ?

@simster7 We are still facing issues of the lock not getting released in a running workflow. Argo version is 2.11.2
All the *.publish templates use the same semaphore
Screenshot 2020-10-09 at 4 55 37 PM

The value of the semaphore is 2. So the first 2 publish succeed but the 3 one is stuck waiting for the lock to be released. Let me know if you need me to create a sample example if it is easier to debug

@sarabala1979 can you investigate, please?

I will take a look

@firecast can you provide the workflow controller logs and reproducible workflow?
I tried multiple examples. it works for me. The last step may not be updated the message and status but it may be started already.

 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: my-config
 data:
   template: "3"
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: synchronization-tmpl-level-
spec:
  entrypoint: synchronization-tmpl-level-example
  templates:
  - name: synchronization-tmpl-level-example
    steps:
    - - name: synchronization-acquire-lock
        template: acquire-lock
        arguments:
          parameters:
          - name: seconds
            value: "{{item}}"
        withParam: '["1","2","3","4","5"]'

  - name: acquire-lock
    synchronization:
      semaphore:
        configMapKeyRef:
          name: my-config
          key: template
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["sleep 10; echo acquired lock"]

Sure. Will try to create a reproducible one from my end and share within a day @sarabala1979

apiVersion: v1
kind: ConfigMap
metadata:
  name: semaphore
data:
  template: "1"

---

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-whalesay-template
spec:
  entrypoint: whalesay-template
  templates:
  - name: whalesay-template
    synchronization:
      semaphore:
        configMapKeyRef:
          name: semaphore
          key: template
    inputs:
      parameters:
      - name: message
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

---

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-hello-world
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    dag:
      tasks:
        - name: call-whalesay-template-1
          templateRef:
            name: workflow-template-whalesay-template
            template: whalesay-template
          arguments:
            parameters:
            - name: message
              value: "hello world"
        - name: call-whalesay-template-2
          dependencies:
          - call-whalesay-template-1
          templateRef:
            name: workflow-template-whalesay-template
            template: whalesay-template
          arguments:
            parameters:
            - name: message
              value: "hello world 2"
        - name: call-whalesay-template-3
          dependencies:
          - call-whalesay-template-2
          templateRef:
            name: workflow-template-whalesay-template
            template: whalesay-template
          arguments:
            parameters:
            - name: message
              value: "hello world 3"

---

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: hello-world
spec:
  workflowTemplateRef:
    name: workflow-template-hello-world

@sarabala1979 This example replicates the issue.
Screenshot 2020-10-12 at 6 02 00 PM

Was this page helpful?
0 / 5 - 0 ratings