Sagemaker-python-sdk: Limited size of parameters

Created on 26 Jul 2018  路  12Comments  路  Source: aws/sagemaker-python-sdk

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans):
    Tensorflow
  • Python Version:
    2.7
  • Python SDK Version:
    1.7.0

Describe the problem

When calling Tensorflow from the SDK, we are limited in the size of the parameters :

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value '{sagemaker_requirements="", batch_size=32, evaluation_steps=null, ... sagemaker_job_name="train-image-nature-2018-07-26-11-05-33-968", epochs=10, training_steps=3450}' at 'hyperParameters' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 256, Member must have length greater than or equal to 0]

256 is small, in particular if you send a list of labels or have many parameters.

Minimal repro / logs

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/sagemaker/session.pyc in train(self, image, input_mode, input_config, role, job_name, output_config, resource_config, hyperparameters, stop_condition, tags)
    262         LOGGER.info('Creating training-job with name: {}'.format(job_name))
    263         LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4)))
--> 264         self.sagemaker_client.create_training_job(**train_request)
    265 
    266     def tune(self, job_name, strategy, objective_type, objective_metric_name,

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/botocore/client.pyc in _api_call(self, *args, **kwargs)
    312                     "%s() only accepts keyword arguments." % py_operation_name)
    313             # The "self" in this scope is referring to the BaseClient.
--> 314             return self._make_api_call(operation_name, kwargs)
    315 
    316         _api_call.__name__ = str(py_operation_name)

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/botocore/client.pyc in _make_api_call(self, operation_name, api_params)
    610             error_code = parsed_response.get("Error", {}).get("Code")
    611             error_class = self.exceptions.from_code(error_code)
--> 612             raise error_class(parsed_response, operation_name)
    613         else:
    614             return parsed_response
question

Most helpful comment

Just hit this issue, using a custom docker container to train a model and I can't specify the features I want to train on. :-1:

All 12 comments

Hi @PedroCardoso ,

For each hyper-parameter in the map, we have limits that each key or value should have length no more than 256.

For what you mentioned, if you have too many hyper-parameters, that won't reach this limit if each of them has key or value length within 256. If the map value is a list of a lot things, it might be a problem.

So could you give me a specific example? Then we can either recommend better practice to you or increase the limit to a more reasonable number.

Thanks

Hi @yangaws

I believe that my particular problem is with sending a list of labels as parameter. I do need those to build the Estimator.

As an example, think of a parameter that contains a list with 30 or 40 strings objects.

@PedroCardoso

I am not confident that we will increase that limit recently. I can put a feature request here. If we keep receiving such issues, we will definitely prioritize this feature.

For now my suggestion is, for your list of 30-40 labels, specify all the labels as a separate channel in some common format like JSON.

Are the channels information present in the parameters for the function call estimator_fn() ?

Hello,

I don't think the channels information is exposed to the estimator_fn(), as evident here https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L92

I believe only the train_input_fn and eval_input_fn have access to the channels.
https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L116
https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L153

A workaround for this is to use the hyperparameters to store the channel metadata. Like...
hp = {'my_channel': 's3//:url/labels.json'}

Closing due to inactivity. Feel free to reopen if necessary.

Just hit this issue, using a custom docker container to train a model and I can't specify the features I want to train on. :-1:

hitting the same thing too. Its odd that this notebook for shows a value larger than 256 in the hyper params but its actually not supported

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

do you have a sample for that?

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

I too am interested in learning about this, since I'm currently using the hyperparams file for all my image annotation labels in an object recognition case, and there are too many labels apparently.

I'm also stuck here. My use case is that I need to set the SAGEMAKER_SPARKML_SCHEMA environment variable when using the https://github.com/aws/sagemaker-sparkml-serving-container (required for CSV input) and I also have ~40 features to pass. I don't think this is an uncommon pattern

Was this page helpful?
0 / 5 - 0 ratings