I'm a developer working on AWS SageMaker Studio. We have two environments that have botocore==1.19.48 and sagemaker==2.23.1 installed, and one system is able to call the SageMaker function fit
on an estimator object, while the other system produces the following botocore error:
botocore.exceptions.ParamValidationError: Parameter validation failed:\nUnknown parameter in input: 'ProfilerRuleConfigurations', must be one of: TrainingJobName, HyperParameters, AlgorithmSpecification, RoleArn, InputDataConfig, OutputDataConfig, ResourceConfig, VpcConfig, StoppingCondition, Tags, EnableNetworkIsolation, EnableInterContainerTrafficEncryption, EnableManagedSpotTraining, CheckpointConfig, DebugHookConfig, DebugRuleConfigurations, TensorBoardOutputConfig, ExperimentConfig\nUnknown parameter in input: 'ProfilerConfig', must be one of: TrainingJobName, HyperParameters, AlgorithmSpecification, RoleArn, InputDataConfig, OutputDataConfig, ResourceConfig, VpcConfig, StoppingCondition, Tags, EnableNetworkIsolation, EnableInterContainerTrafficEncryption, EnableManagedSpotTraining, CheckpointConfig, DebugHookConfig, DebugRuleConfigurations, TensorBoardOutputConfig, ExperimentConfig
To solve this issue, we have tried re-installing the libraries and confirming that the versions are up-to-date, but the problem persists. Are there other dependencies that must be checked, or is there any way to identify why this problem occurs on just 1 system? We do not know how to reproduce this problem on another system.
Thanks!
This seems like a custom model override issue instead of a python sdk issue. Resolving.
I am seeing same error on a SMStudio environment I just restarted today (Data Science kernel), using the XGBoost algorithm and not manually configuring anything to do with ProfilerRuleConfigurations:
# Instantiate an XGBoost estimator object
estimator = sagemaker.estimator.Estimator(
image_uri=training_image, # XGBoost algorithm container
instance_type="ml.m5.xlarge", # type of training instance
instance_count=1, # number of instances to be used
role=sgmk_role, # IAM role to be used
max_run=20*60, # Maximum allowed active runtime
use_spot_instances=True, # Use spot instances to reduce cost
max_wait=30*60, # Maximum clock time (including spot delays)
)
# define its hyperparameters
estimator.set_hyperparameters(
num_round=150,
max_depth=5,
alpha=2.5,
eta=0.5,
objective="binary:logistic",
)
# start a training (fitting) job
estimator.fit({ "train": s3_input_train, "validation": s3_input_validation })
Throws:
ParamValidationError: Parameter validation failed:
Unknown parameter in input: "ProfilerRuleConfigurations", must be one of: TrainingJobName, HyperParameters, AlgorithmSpecification, RoleArn, InputDataConfig, OutputDataConfig, ResourceConfig, VpcConfig, StoppingCondition, Tags, EnableNetworkIsolation, EnableInterContainerTrafficEncryption, EnableManagedSpotTraining, CheckpointConfig, DebugHookConfig, DebugRuleConfigurations, TensorBoardOutputConfig, ExperimentConfig, ProfilerConfig
I tried running a !pip install --upgrade awscli boto3 but no luck. Current versions are:
Upgrading to sagemaker @ 2.24.3 does not help either. Please re-open this issue - any help appreciated!
PS: @evakravi I guess you're probably observing this on just one system because the container images of SageMaker "apps" (kernel images) pull the latest image only when they're re-started (e.g. "deleted" through the SMStudio AWS console or the stop button tab in Studio? So maybe other environments are still running on an older image where the installed libs didn't have an issue. If it does turn out to be a bug and updating libs to latest-as-possible doesn't work, there might be a solution by downgrading instead...
An update, since I came across and investigated https://github.com/boto/botocore/issues/2260:
botocore incorrectly loads the shape of SageMaker.CreateTrainingJob, even though the parameter is present in the service JSON at 1.20.3, because it loads from /root/.aws/models/sagemaker/2017-07-24/service-2.json in Studio which I see on my user profile is stale (does not include the ProfilerRuleConfigurations key).
/root in this case is the SageMaker Studio base user directory (i.e. the top level one we see in JupyterLab).
I was able to fix this issue by:
!rm -rf /root/.aws/models/sagemaker from my notebook to clear this cache (check the folder gets deleted)