We obtain the following error when trying to deploy a custom MXNet model via MXNetModel.deploy:
Python 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:07:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sagemaker.mxnet.model import MXNetModel
>>> mxnet_model = MXNetModel(model_data=s3_data_bucket,
... role=role_arn,
... entry_point='inference.py',
... dependencies=['path/to/package'],
... framework_version='1.4.1',
... py_version='py3')
>>>
>>> mxnet_model.deploy(instance_type='ml.c5.xlarge', initial_instance_count=1)
```
Traceback (most recent call last):
File "
File "/anaconda3/envs/xx/lib/python3.6/site-packages/sagemaker/model.py", line 426, in deploy
self._create_sagemaker_model(instance_type, accelerator_type, tags)
File "/anaconda3/envs/xx/lib/python3.6/site-packages/sagemaker/model.py", line 165, in _create_sagemaker_model
container_def = self.prepare_container_def(instance_type, accelerator_type=accelerator_type)
File "/anaconda3/envs/xx/lib/python3.6/site-packages/sagemaker/mxnet/model.py", line 148, in prepare_container_def
self._upload_code(deploy_key_prefix, is_mms_version)
File "/anaconda3/envs/xx/lib/python3.6/site-packages/sagemaker/model.py", line 788, in _upload_code
sagemaker_session=self.sagemaker_session,
File "/anaconda3/envs/xx/lib/python3.6/site-packages/sagemaker/utils.py", line 413, in repack_model
model_dir, inference_script, source_directory, dependencies, sagemaker_session, tmp
File "/anaconda3/envs/xx/lib/python3.6/site-packages/sagemaker/utils.py", line 472, in _create_or_update_code_dir
shutil.copytree(dependency, code_dir, os.path.basename(dependency))
File "/anaconda3/envs/xx/lib/python3.6/shutil.py", line 315, in copytree
os.makedirs(dst)
File "/anaconda3/envs/xx/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/var/folders/3z/kqrl01qd2vv6248zcs56q0wh0000gn/T/tmpqbeoxi0t/model/code'
We strongly suspect the problem to be related to [line 472](https://github.com/aws/sagemaker-python-sdk/blob/9765de68ad8b776740d800148c861ca0e4794716/src/sagemaker/utils.py#L472) in `sagemaker/utils.py`:
```python
shutil.copytree(dependency, code_dir)
The problem is that code_dir already exists at that point and hence shutils.copytree throws an error (cf. shutils.copytree doc).
This problem only occurs when supplying the MXNetModel constructor with the parameter framework_version, which triggers a call to utils.repack_model from FrameworkModel._upload_code.
We believe the solution is to modify the aforementioned line to the following:
shutil.copytree(dependency, os.path.join(code_dir, os.path.basename(dependency)))
so that the dependency folder is copied into code_dir instead of trying to replace it.
To reproduce:
touch inference.py
mkdir -p path/to/package
touch path/to/package/utils.py
from sagemaker.mxnet.model import MXNetModel
s3_data_bucket = 'bucket_with_model_artifacts'
role_arn = 'role_arn'
mxnet_model = MXNetModel(model_data=s3_data_bucket,
role=role_arn,
entry_point='inference.py',
dependencies=['path/to/package'],
framework_version='1.4.1',
py_version='py3')
mxnet_model.deploy(instance_type='ml.c5.xlarge', initial_instance_count=1)
+1. It seems using framework_version='1.3.0' is OK, and I only get the error when specifying 1.4.1.
Thank you for the super-detailed bug report! I've submitted a PR here: https://github.com/aws/sagemaker-python-sdk/pull/1021
v1.39.1 has been released with this change. Hopefully this solves your issue!
Most helpful comment
v1.39.1 has been released with this change. Hopefully this solves your issue!