Pipelines: Cannot create artifact when using func_to_container_op

Created on 15 Oct 2019  路  4Comments  路  Source: kubeflow/pipelines

/kind bug

What steps did you take and what happened:
Trying to create a simple artifact and make it display in "Run output" for a pipeline

The following code is used to create the pipeline:

import kfp.dsl as dsl
import kfp.gcp as gcp
import kfp.components as comp

def test(foo):
    import json

    source_str = 'Test text: %s' % foo

    metadata = {
        'outputs': [
            {
                'storage': 'inline',
                'source': '# Inline Markdown\n[A link](https://www.kubeflow.org/)',
                'type': 'markdown',
            },
            {
                'source': source_str,
                'type': 'markdown',
            }]
    }
    print(json.dumps(metadata))
    print(metadata)
    with open('/mlpipeline-ui-metadata.json', 'w') as f:
        json.dump(metadata, f)

@dsl.pipeline(
    name='Pipeline name',
    description='Debug...'
)
def pipeline(foo: str = "default value"):
    test_op = comp.func_to_container_op(test)
    test_task = test_op(foo)

if __name__ == '__main__':
    import kfp.compiler as compiler
    compiler.Compiler().compile(pipeline, __file__ + '.tar.gz')

What did you expect to happen:
The pipeline job compltes as expected, but there is noehting in "Run output".

Environment:

  • Kubeflow version: build commit 812ca7f
  • kfctl version: kfctl v0.6.2-0-g47a0e4c7
  • Kubernetes platform: gcp 1.12.10-gke.5
  • kubectl version: 1.6
  • OS: Ubuntu Bionic x64

Python module version:
kfp (0.1.31.2)
kfp-server-api (0.1.18.3)

aresdcomponents aresdcomponentpython_container_op

Most helpful comment

@Toeplitz @tanguycdls

If you're using func to container op you can also define a namedtuple with type: UI_metadata and Metrics such as explained here:
https://github.com/kubeflow/pipelines/pull/2046/files#diff-0aec3bb4eee5b5b7a9b97fc30a516060

Yes, that's the correct way.
The component author has to declare all outputs. Since #2046 the mlpipeline_ui_metadata and mlpipeline_metrics now also need to be declared just like every other output. See the sample: https://github.com/kubeflow/pipelines/blob/7b6957a/samples/core/lightweight_component/lightweight_component.ipynb

def test() -> NamedTuple('MyDivmodOutput', [('mlpipeline_ui_metadata', 'UI_metadata'), ('mlpipeline_metrics', 'Metrics')]):
    ...
    return (json.dumps(metadata), json.dumps(metrics))

All 4 comments

In the last version you have to define yourself the container op outputs artifacts:

test_task.output_artifact_paths = {
     'mlpipeline-ui-metadata': '/tmp/mlpipeline-ui-metadata.json',
     'mlpipeline-metrics': '/tmp/mlpipeline-metrics.json',
       }

In the pipeline add the code above and it should work with the correct path (fix path).

If you're using func to container op you can also define a namedtuple with type: UI_metadata and Metrics such as explained here:
https://github.com/kubeflow/pipelines/pull/2046/files#diff-0aec3bb4eee5b5b7a9b97fc30a516060

related to https://github.com/kubeflow/pipelines/issues/2268 and the PR https://github.com/kubeflow/pipelines/pull/2046

@tanguycdls

test_task.output_artifact_paths = {

This might not be a good idea.

  1. ContainerOp().output_artifact_paths should not be used. It should have been made private long time ago https://github.com/kubeflow/pipelines/pull/1832. Generally, setting any attribute of ContainerOp instance (apart from .container.*) is not a good idea.

  2. ContainerOp(output_artifact_paths=...) will be deprecated soon since it's no longer needed - file_outputs now works just as well - all outputs now produce artifacts (support big data).

@Toeplitz @tanguycdls

If you're using func to container op you can also define a namedtuple with type: UI_metadata and Metrics such as explained here:
https://github.com/kubeflow/pipelines/pull/2046/files#diff-0aec3bb4eee5b5b7a9b97fc30a516060

Yes, that's the correct way.
The component author has to declare all outputs. Since #2046 the mlpipeline_ui_metadata and mlpipeline_metrics now also need to be declared just like every other output. See the sample: https://github.com/kubeflow/pipelines/blob/7b6957a/samples/core/lightweight_component/lightweight_component.ipynb

def test() -> NamedTuple('MyDivmodOutput', [('mlpipeline_ui_metadata', 'UI_metadata'), ('mlpipeline_metrics', 'Metrics')]):
    ...
    return (json.dumps(metadata), json.dumps(metrics))

@Toeplitz @tanguycdls

If you're using func to container op you can also define a namedtuple with type: UI_metadata and Metrics such as explained here:
https://github.com/kubeflow/pipelines/pull/2046/files#diff-0aec3bb4eee5b5b7a9b97fc30a516060

Yes, that's the correct way.
The component author has to declare all outputs. Since #2046 the mlpipeline_ui_metadata and mlpipeline_metrics now also need to be declared just like every other output. See the sample: https://github.com/kubeflow/pipelines/blob/7b6957a/samples/core/lightweight_component/lightweight_component.ipynb

def test() -> NamedTuple('MyDivmodOutput', [('mlpipeline_ui_metadata', 'UI_metadata'), ('mlpipeline_metrics', 'Metrics')]):
    ...
    return (json.dumps(metadata), json.dumps(metrics))

In addition to declaring the mlpipeline_metrics output, what other steps do i need to do in order to accomplish the original ask of this thread which is having artifacts appear under "Run Outputs" when working with func_to_container_op? I see the artifact is saved as an argo artifact with a .tgz extension in the minio storage server I have deployed, but KFP does not seem to display it from there.

Was this page helpful?
0 / 5 - 0 ratings