Please add support for requirements.txt in ScriptProcessor similar to other "Script Mode" parts of the SageMaker Python SDK where I can specify source_dir
Hi Chris, thanks for your suggestion. I've added it to our backlog.
As a workaround, you can provide a shell script containing pip install commands. (You'll want to call your python script at the end of this shell script.)
@ajaykarpur The problem is that the ScriptProcessor only takes a single file as argument not a source_dir, so you cannot include a directory with your python source file, so the workaround does not really work around the problem.
As a workaround, we ended up using the SklearnProcessor which actually takes a python script. The python script gets access to a packaged version of our code which gets downloaded using the ProcessingInput mechanism and installs it and runs the entrypoint. It works, but it was too much effort for something that should be builtin IMHO.
Hi @ajaykarpur completely agree with the prior comments about the importance and usefulness of allowing processing to use a requirements file.
Thank you!
I think it is important feature that SKLearnProcessor takes multiple python files.
Hi, I want to share an experimental / stop-gap work called FrameworkProcessor, to simplify submitting a Python processing job with requirements.txt, source_dir, dependencies, and git_config, using SageMaker framework training containers (i.e., tf, pytorch, mxnet, xgboost, and sklearn).
It aims to give you familiar workflow of (1) instantiate a processor, then immediately (2) call the run(...) method.
Here's an example how to use this FrameworkProcessor class (right now as Python script as opposed to .ipynb). Then, run that Python example using this shell script, but you must first change the S3 prefix and execution role, then optionally choose your prefered container.
It slightly changes the processing API by adding a SageMaker Framework estimator, which was done for two purposes: (1) auto-detect container uri, and (2) re-use the packaging mechanism in the estimator to upload to s3://.../sourcedir.tar.gz.
So far it works for my cases, but more testings or bug reports are welcome.
HTH.
any news on this?
Most helpful comment
Hi, I want to share an experimental / stop-gap work called FrameworkProcessor, to simplify submitting a Python processing job with
requirements.txt,source_dir,dependencies, andgit_config, using SageMaker framework training containers (i.e., tf, pytorch, mxnet, xgboost, and sklearn).It aims to give you familiar workflow of (1) instantiate a processor, then immediately (2) call the
run(...)method.Here's an example how to use this
FrameworkProcessorclass (right now as Python script as opposed to.ipynb). Then, run that Python example using this shell script, but you must first change the S3 prefix and execution role, then optionally choose your prefered container.It slightly changes the processing API by adding a SageMaker Framework estimator, which was done for two purposes: (1) auto-detect container uri, and (2) re-use the packaging mechanism in the estimator to upload to
s3://.../sourcedir.tar.gz.So far it works for my cases, but more testings or bug reports are welcome.
HTH.