Sagemaker-python-sdk: PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts

Created on 19 Sep 2020  ยท  11Comments  ยท  Source: aws/sagemaker-python-sdk

Describe the bug
Model artefacts packagedin model.tar.gz are skipped when the model object is converted into a TorchServe model. Similarly, dependencies included are also dropped.

To reproduce
Add any extra file to model.tar.gz that is not model.pth, and it won't show up in the container. Similarly, any of the extra depenencies specified during the initialization of a PyTorchModel object are dropped.

Expected behavior
Model package would include all of the extra files and dependencies, as described in the API documentation.

Screenshots or logs
Tried os.walk in the model_dir and realized artefacts I was expecting were missing. The below log shows the files in model_dir. I had an extra .pkl object in there which was not included.

2020-09-18 23:50:40,059 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 2020-09-18 23:50:40.058 | INFO | inference:model_fn:59 - ['inference.py', 'handler_service.py', 'model.pth']

Similarly, the directories log is missing the extra dependencies specified while creating the PyTorchModel object under model_dir/lib (#1832 another bug) or model_dir (as specified in API documentation) both.

2020-09-18 23:50:40,058 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 2020-09-18 23:50:40.058 | INFO | inference:model_fn:58 - ['__pycache__', 'MAR-INF']

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.9.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: 1.6.0
  • Python version: 3.6
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context

The problem arises from this line in the process of TorchServe packaging, that is a result of https://github.com/aws/sagemaker-pytorch-inference-toolkit/pull/79/. It drops every other object as a part of the repackaging except the inference script.

Clearly, that is a regression and unexpected.

bug

Most helpful comment

@setu4993, I have experienced similar issues with that. From the source code, I believe the following line might be the one that's causing what you are experiencing (and me as well):

https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/9a6869ea3af1ebf9292da0a8d752b0c3389ecdec/src/sagemaker_pytorch_serving_container/torchserve.py#L125

This makes inference.py the only (custom) py file in the model directory. I believe this container should support adding extra/custom artifacts (e.g., model companion objects, model architecture).

Furthermore, Sagemaker Python SDK example with PyTorch (with the code here) does not work with this serving container (because the model has a companion object).

All 11 comments

Hi @setu4993,

Could you provide the python sdk code and source_dir structure so we could rootcause the issue?

A short example to reproduce the issue (X and Y are other directories on the file system):

sagemaker_model = PyTorchModel(
    model_data="model.tar.gz",
    source_dir="/root/",
    entry_point="inference.py",
    dependencies=[
        "X",
        "Y",
    ],
    framework_version="1.6.0",
    py_version="py3",
)

After the re-packaging step occurs and a model.tar.gz is created on S3 at the code_location:

โ”œโ”€โ”€ code
โ”‚ย ย  โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”œโ”€โ”€ inference.py
โ”‚ย ย  โ”œโ”€โ”€ lib
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ X
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย      โ”œโ”€โ”€ ...
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ Y
โ”‚ย ย  โ”‚ย ย      โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”‚ย ย      โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย      โ”œโ”€โ”€ ...
โ”‚ย ย  โ””โ”€โ”€ requirements.txt
โ”œโ”€โ”€ model.pth
โ””โ”€โ”€ another_model.pkl

But, at runtime, the only files and directories in the model_dir the container are: inference.py, model.pth and handler_service.py (see logs in the opening comment).

The problem here is multi-fold:

  1. Why does an extra repackaging step occur that creates a difference between what is made available on S3 at code_location and what is in the final inference service? This further makes it more difficult to understand (and diagnose) what's happening on the container since the model tarball produced by SageMaker drifts away from the expected (which was repackaged by the SDK!).
  2. Why is everything being dropped from the container?

@ChuyangDeng : Any update on this?

@setu4993, I have experienced similar issues with that. From the source code, I believe the following line might be the one that's causing what you are experiencing (and me as well):

https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/9a6869ea3af1ebf9292da0a8d752b0c3389ecdec/src/sagemaker_pytorch_serving_container/torchserve.py#L125

This makes inference.py the only (custom) py file in the model directory. I believe this container should support adding extra/custom artifacts (e.g., model companion objects, model architecture).

Furthermore, Sagemaker Python SDK example with PyTorch (with the code here) does not work with this serving container (because the model has a companion object).

Thanks @jonsnowseven. +1, I think I found the same line earlier :).

In my case, it all works with 1.5.0, so I'm sticking to it and updating to 1.6 with requirements.txt.

After almost 4 months from the last comment, this bug still persists....

Yes @ldong87. Any news regarding this?

Same issue here. Have been trying to work around this for 2 days now with no luck.

Wanted to share that @amaharek found a potential solution and posted it on the supplementary issue on the sagemaker-pytorch-inference-toolkit repo here: https://github.com/aws/sagemaker-pytorch-inference-toolkit/issues/85#issuecomment-780621393

@dectl One (ugly) workaround is to copy/load the missing model files from /opt/ml/model/, which apparently is a documented decompressed model location: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-load-artifacts

Was this page helpful?
0 / 5 - 0 ratings