Pipelines: Escape sequence bug in func_to_container_op

Created on 24 Dec 2020  路  4Comments  路  Source: kubeflow/pipelines

What steps did you take:

I tried to run a pipeline that contains newline code which is one of the escape sequences in func_to_container_op as follows.

import kfp
from kfp import dsl
from kfp.components import func_to_container_op

@func_to_container_op
def test():
    print("hello\nworld")

@dsl.pipeline()
def pipeline():
    test()

if __name__ == '__main__':
    kfp.compiler.Compiler().compile(pipeline, __file__ + '.yaml')

This is the compiled yaml.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: pipeline-
  annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.2.0, pipelines.kubeflow.org/pipeline_compilation_time: '2020-12-25T00:18:32.938960',
    pipelines.kubeflow.org/pipeline_spec: '{"name": "Pipeline"}'}
  labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.2.0}
spec:
  entrypoint: pipeline
  templates:
  - name: pipeline
    dag:
      tasks:
      - {name: test, template: test}
  - name: test
    container:
      args: []
      command:
      - sh
      - -ec
      - |
        program_path=$(mktemp)
        echo -n "$0" > "$program_path"
        python3 -u "$program_path" "$@"
      - |
        def test():
            print("hello\nworld")

        import argparse
        _parser = argparse.ArgumentParser(prog='Test', description='')
        _parsed_args = vars(_parser.parse_args())

        _outputs = test(**_parsed_args)
      image: python:3.7
    metadata:
      annotations: {pipelines.kubeflow.org/component_spec: '{"implementation": {"container":
          {"args": [], "command": ["sh", "-ec", "program_path=$(mktemp)\necho -n \"$0\"
          > \"$program_path\"\npython3 -u \"$program_path\" \"$@\"\n", "def test():\n    print(\"hello\\nworld\")\n\nimport
          argparse\n_parser = argparse.ArgumentParser(prog=''Test'', description='''')\n_parsed_args
          = vars(_parser.parse_args())\n\n_outputs = test(**_parsed_args)\n"], "image":
          "python:3.7"}}, "name": "Test"}', pipelines.kubeflow.org/component_ref: '{}'}
  arguments:
    parameters: []
  serviceAccountName: pipeline-runner

What happened:

Syntax error have been displayed.

  File "/tmp/tmp.ncPxm38FZc", line 2
    print("hello
               ^
SyntaxError: EOL while scanning string literal

What did you expect to happen:

Escape sequences are being interpreted when outputting python to a file. I want the newline code to be output to the file as is.

Environment:

AI Platform Pipeline in GCP
cluster version: 1.0.4
KFP version: 1.2.0

Anything else you would like to add:

https://github.com/kubeflow/pipelines/commit/7a66414cf72ba5c746cac7fc6c8ecad67fc5e885
This bug is caused by the above commit. By using echo in /bin/sh, the escape sequence is interpreted and output to a file. When I changed sh in yaml to bash as a test, this syntax error did not occur and the program worked fine.

fix yaml

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: pipeline-
  annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.2.0, pipelines.kubeflow.org/pipeline_compilation_time: '2020-12-24T23:54:02.062104',
    pipelines.kubeflow.org/pipeline_spec: '{"name": "Pipeline"}'}
  labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.2.0}
spec:
  entrypoint: pipeline
  templates:
  - name: pipeline
    dag:
      tasks:
      - {name: test, template: test}
  - name: test
    container:
      args: []
      command:
      - bash
      - -ec
      - |
        program_path=$(mktemp)
        echo -n "$0" > "$program_path"
        python3 -u "$program_path" "$@"
      - |
        def test():
            print("hello\nworld")

        import argparse
        _parser = argparse.ArgumentParser(prog='Test', description='')
        _parsed_args = vars(_parser.parse_args())

        _outputs = test(**_parsed_args)
      image: python:3.7
    metadata:
      annotations: {pipelines.kubeflow.org/component_spec: '{"implementation": {"container":
          {"args": [], "command": ["sh", "-ec", "program_path=$(mktemp)\necho -n \"$0\"
          > \"$program_path\"\npython3 -u \"$program_path\" \"$@\"\n", "def test():\n    print(\"hello\\nworld\")\n\nimport
          argparse\n_parser = argparse.ArgumentParser(prog=''Test'', description='''')\n_parsed_args
          = vars(_parser.parse_args())\n\n_outputs = test(**_parsed_args)\n"], "image":
          "python:3.7"}}, "name": "Test"}', pipelines.kubeflow.org/component_ref: '{}'}
  arguments:
    parameters: []
  serviceAccountName: pipeline-runner

output

hello
world

/kind bug

kinbug

All 4 comments

I've run into this issue as well with escaping quoted characters. I feel like this ought to be a higher priority bug because this creates an unexpected difference in serialization of strings in lightweight components versus regular components.

For us this is a blocking issue for adopting 1.2/1.3 since it would require a lot of refactoring in our string manipulation components to work around this.

@Ark-kun do you think that the above workaround using bash would be an appropriate fix or is there a particular reason for using sh?

/assign @chensun @numerology
I think this is a P0, I tried Data passing sample pipeline, it's also failing with EOL while scanning string literal like these issues.

/assign @Ark-kun

I think this is a P0, I tried Data passing sample pipeline, it's also failing with EOL while scanning string literal like these issues.

Oh. I think I know what's happening then. It's echo interpreting the escapes. I'll fix this ASAP.

Was this page helpful?
0 / 5 - 0 ratings