Sagemaker-python-sdk: Inconsistent SKLearn image lookup causing image pull permission error

Created on 8 Feb 2019  路  3Comments  路  Source: aws/sagemaker-python-sdk

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Sci-kit Learn
  • Framework Version: 0.20.0
  • Python Version: 2.7
  • CPU or GPU: CPU
  • Python SDK Version: 1.18.2
  • Are you using a custom image: No

Describe the problem

I am trying to deploy a previously trained SKLearn model. Training works fine when using the SDK. However, when using the SKLearnModel class, a different account id is used for the base image. This fails deployment with the error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Role <my_role_arn> cannot pull 520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3. Ensure that the role exists and the image was granted pull permission.

Minimal repro / logs

Train model

from sagemaker.sklearn.estimator import SKLearn
from sagemaker import s3_input
channel = {
        'training': s3_input(s3_training_key),
}

skl = SKLearn(
        entry_point='./model.py',
        role=role,
        train_instance_type="ml.c4.xlarge",
        train_instance_count=1,
        output_path=s3_output_prefix,
        base_job_name=job_name,
)

skl.fit(
        inputs=channel,
        job_name=job_name,
)

Create Model and deploy

import sagemaker
from sagemaker.sklearn.model import SKLearnModel

estim = sagemaker.estimator.Estimator.attach(job_name)
estim_data = str(estim.model_data)

model = SKLearnModel(
        model_data=estim_data,
        role=role,
        entry_point='./model.py',
        name=job_name
)

model.deploy(
            initial_instance_count=1,
            instance_type=instance_type,
            endpoint_name=endpoint_name,
)
  • Exact command to reproduce:
    The call to deploy is what gives the error. I think the problem comes from the account inconsistency between fw_utils and fw_registry. It seems that the Estimator class uses fw_registry.default_framework_uri whereas the SKLearnModel class uses fw_utils.create_image_uri. The error indicates that account id = 520713654638 is the problem, which is the default value of create_image_uri.
pending release bug

Most helpful comment

apologies for the inconvenience. #624 has now been merged, and will be released with the next version

All 3 comments

Found #624

apologies for the inconvenience. #624 has now been merged, and will be released with the next version

This has been released in https://github.com/aws/sagemaker-python-sdk/releases/tag/v1.18.3

Please update your library.

pip install --upgrade sagemaker

Was this page helpful?
0 / 5 - 0 ratings

Related issues

velociraptor111 picture velociraptor111  路  3Comments

eprochasson picture eprochasson  路  3Comments

meownoid picture meownoid  路  5Comments

stevehawley picture stevehawley  路  3Comments

evakravi picture evakravi  路  3Comments