Sagemaker-python-sdk: Script mode project structure inflexibility.

Created on 14 Dec 2018  路  4Comments  路  Source: aws/sagemaker-python-sdk

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Tensorflow
  • Framework Version: 1.10
  • Python Version: 3.6
  • CPU or GPU: CPU
  • Python SDK Version: latest
  • Are you using a custom image: No

Describe the problem

I have found that a file structure like the following works best for the Tensorflow SDK.

- root
- - - my_project
- - - - - my_package
- - - - - - - code.py
- - - - - main.py
- - - - - setup.py
- - - run_training.py
- - - etc.

Works best for training, where your run_training.py code looks something like this:

from sagemaker.tensorflow import TensorFlow

train_instance_type = 'ml.p2.xlarge' 

from sagemaker import get_execution_role
role = get_execution_role()

tf_estimator = TensorFlow(entry_point='main.py', role=role,
                          train_instance_count=1, train_instance_type=train_instance_type,
                          framework_version='1.11', py_version='py3',
                          source_dir='my_project')

tf_estimator.fit('s3://location...', wait=True)

However, most of my projects out in the wild would have the following structure:

- my_project
- - - data or docs or config etc...
- - - sagemaker
- - - - - run_training.py
- - - my_package
- - - - - code.py
- - - main.py
- - - setup.py

Is there a way the sdk could allow for dependencies and source_dir with some arguments like "include" or "exclude"? I'd like to make the '../' directory the source directory and then include only python files (or certain folders).

One solution to this also could be to allow dependencies to be a map like so:

dependencies = {'my_package': '../my_package', 'main.py': '../main.py'}

Mapping relative to root container directories to local directories. Then container root becomes your source dir. This would be the most flexible of all IMO.

Lastly, as an aside, requirements.txt does not seem to work with the new tensorflow script mode. Is there a reason for this? Will it be fixed soon?

bug feature request

Most helpful comment

Hello @ryanpeach,

Thank you so much for the idea!

I think this idea/feature would be very useful and allow individuals to decide which folders, files and structures will be preserved while running in SageMaker or local mode.

I am not able to provide an ETA, however I will speak to my team in regards to this feature.

SageMaker is open source and we are always open to contributions and suggestions!

As for the requirements.txt, with script mode that will not be supported. It is recommended to install your dependencies as needed in your script.

Please let me know if there is anything I can clarify.

Thanks!

All 4 comments

Hello @ryanpeach,

Thank you so much for the idea!

I think this idea/feature would be very useful and allow individuals to decide which folders, files and structures will be preserved while running in SageMaker or local mode.

I am not able to provide an ETA, however I will speak to my team in regards to this feature.

SageMaker is open source and we are always open to contributions and suggestions!

As for the requirements.txt, with script mode that will not be supported. It is recommended to install your dependencies as needed in your script.

Please let me know if there is anything I can clarify.

Thanks!

Thanks for your reply. First off, let me say that I tried putting a requirements.txt file in the source directory and it actually worked. A setup.py file in src also works.

As for your suggestion though, are you suggesting for our script mode entypoint we use a shell script instead of a python file and pip install inside it?

yeah, the suggestion in this case is to create a shell script that installs your dependencies and then launches your script. Here's a simple example of one we've written: https://github.com/aws/sagemaker-rl-container/blob/master/test/resources/gym/launcher.sh.

As you've pointed out, using a setup.py file also works, so feel free to do that instead of the shell script solution seems unnecessary.

Hi,
Is there a way to exclude a specific folder from working dir to sagemaker?
I have a repository that includes the code and the virtual env. (which takes a lot more than the code itself).
I want to copy the source dir and exclude only the virtual env. folder.
Any way to do it?

Was this page helpful?
0 / 5 - 0 ratings