I am trying to sync a bucket hosted on a RadosGW instance to the local filesystem:
aws s3 sync s3://mybucket /tmp/mybucket --endpoint http://rgw:7480
and this appears to consider the first 1000 objects only.
sync
should do paging automatically, correct?
$ aws --version
aws-cli/1.16.151 Python/2.7.12 Linux/4.4.0-145-generic botocore/1.12.141
For a bucket that has more than 1000 files I tried:
$ aws s3 ls s3://mybucket --endpoint http://rgw:7480 --summarize
...
Total Objects: 999
Total Size: 2441250092
In order to reduce the debug output I manually limited the page size to be used:
$ aws s3 ls s3://s3.data --endpoint http://rgw:7480 --summarize --debug --page-size=2
2019-05-03 20:21:53,790 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/1.16.151 Python/2.7.12 Linux/4.4.0-145-generic botocore/1.12.141
2019-05-03 20:21:53,791 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['s3', 'ls', 's3://s3.data', '--endpoint', 'http://rgw:7480', '--summarize', '--debug', '--page-size=2']
2019-05-03 20:21:53,791 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_scalar_parsers at 0x7f7d3a2aee60>
2019-05-03 20:21:53,791 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function register_uri_param_handler at 0x7f7d383a1140>
2019-05-03 20:21:53,791 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x7f7d383d7320>
2019-05-03 20:21:53,793 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x7f7d37f8e140>
2019-05-03 20:21:53,794 - MainThread - botocore.hooks - DEBUG - Event building-command-table.s3: calling handler <function add_waiters at 0x7f7d3a2b9500>
2019-05-03 20:21:53,795 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.s3.anonymous: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,795 - MainThread - botocore.hooks - DEBUG - Event building-command-table.ls: calling handler <function add_waiters at 0x7f7d3a2b9500>
2019-05-03 20:21:53,796 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.ls.paths: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,797 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.ls.human-readable: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,797 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.custom.ls: calling handler <awscli.argprocess.ParamShorthandParser object at 0x7f7d383aee90>
2019-05-03 20:21:53,798 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.ls.page-size: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,798 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.custom.ls: calling handler <awscli.argprocess.ParamShorthandParser object at 0x7f7d383aee90>
2019-05-03 20:21:53,798 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.ls.anonymous: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,798 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.ls.request-payer: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,799 - MainThread - botocore.hooks - DEBUG - Event load-cli-arg.custom.ls.summarize: calling handler <awscli.paramfile.URIArgumentHandler object at 0x7f7d37ab1650>
2019-05-03 20:21:53,799 - MainThread - botocore.hooks - DEBUG - Event process-cli-arg.custom.ls: calling handler <awscli.argprocess.ParamShorthandParser object at 0x7f7d383aee90>
2019-05-03 20:21:53,799 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: env
2019-05-03 20:21:53,799 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: assume-role
2019-05-03 20:21:53,800 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: shared-credentials-file
2019-05-03 20:21:53,800 - MainThread - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2019-05-03 20:21:53,801 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /venv/local/lib/python2.7/site-packages/botocore/data/endpoints.json
2019-05-03 20:21:53,842 - MainThread - botocore.hooks - DEBUG - Event choose-service-name: calling handler <function handle_service_name_alias at 0x7f7d38cea0c8>
2019-05-03 20:21:53,844 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /venv/local/lib/python2.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
2019-05-03 20:21:53,918 - MainThread - botocore.hooks - DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f7d38d177d0>
2019-05-03 20:21:53,918 - MainThread - botocore.hooks - DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f7d38d175f0>
2019-05-03 20:21:53,919 - MainThread - botocore.args - DEBUG - The s3 config key is not a dictionary type, ignoring its value of: None
2019-05-03 20:21:53,923 - MainThread - botocore.endpoint - DEBUG - Setting s3 timeout as (60, 60)
2019-05-03 20:21:53,925 - MainThread - botocore.client - DEBUG - Registering retry handlers for service: s3
2019-05-03 20:21:53,926 - MainThread - botocore.client - DEBUG - Using S3 path style addressing.
2019-05-03 20:21:53,936 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /venv/local/lib/python2.7/site-packages/botocore/data/s3/2006-03-01/paginators-1.json
2019-05-03 20:21:53,938 - MainThread - botocore.hooks - DEBUG - Event before-parameter-build.s3.ListObjectsV2: calling handler <function set_list_objects_encoding_type_url at 0x7f7d38ced7d0>
2019-05-03 20:21:53,938 - MainThread - botocore.hooks - DEBUG - Event before-parameter-build.s3.ListObjectsV2: calling handler <function validate_bucket_name at 0x7f7d38cea668>
2019-05-03 20:21:53,938 - MainThread - botocore.hooks - DEBUG - Event before-parameter-build.s3.ListObjectsV2: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f7d37717dd0>>
2019-05-03 20:21:53,938 - MainThread - botocore.hooks - DEBUG - Event before-parameter-build.s3.ListObjectsV2: calling handler <function generate_idempotent_uuid at 0x7f7d38cea320>
2019-05-03 20:21:53,939 - MainThread - botocore.hooks - DEBUG - Event before-call.s3.ListObjectsV2: calling handler <function add_expect_header at 0x7f7d38ceaaa0>
2019-05-03 20:21:53,939 - MainThread - botocore.hooks - DEBUG - Event before-call.s3.ListObjectsV2: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f7d37717dd0>>
2019-05-03 20:21:53,939 - MainThread - botocore.hooks - DEBUG - Event before-call.s3.ListObjectsV2: calling handler <function inject_api_version_header_if_needed at 0x7f7d38cedaa0>
2019-05-03 20:21:53,939 - MainThread - botocore.endpoint - DEBUG - Making request for OperationModel(name=ListObjectsV2) with params: {'body': '', 'url': u'http://rgw:7480/s3.data?list-type=2&delimiter=%2F&prefix=&max-keys=2&encoding-type=url', 'headers': {'User-Agent': 'aws-cli/1.16.151 Python/2.7.12 Linux/4.4.0-145-generic botocore/1.12.141'}, 'context': {'auth_type': None, 'client_region': u'us-east-1', 'signing': {'bucket': u's3.data'}, 'has_streaming_input': False, 'client_config': <botocore.config.Config object at 0x7f7d372c9ed0>, 'encoding_type_auto_set': True}, 'query_string': {u'delimiter': '/', u'prefix': '', u'max-keys': 2, u'encoding-type': 'url'}, 'url_path': u'/s3.data?list-type=2', 'method': u'GET'}
2019-05-03 20:21:53,940 - MainThread - botocore.hooks - DEBUG - Event request-created.s3.ListObjectsV2: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f7d372c9e90>>
2019-05-03 20:21:53,941 - MainThread - botocore.hooks - DEBUG - Event choose-signer.s3.ListObjectsV2: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f7d37ab1150>>
2019-05-03 20:21:53,941 - MainThread - botocore.hooks - DEBUG - Event choose-signer.s3.ListObjectsV2: calling handler <function set_operation_specific_signer at 0x7f7d38cea230>
2019-05-03 20:21:53,941 - MainThread - botocore.auth - DEBUG - Calculating signature using v4 auth.
2019-05-03 20:21:53,942 - MainThread - botocore.auth - DEBUG - CanonicalRequest:
GET
/s3.data
delimiter=%2F&encoding-type=url&list-type=2&max-keys=2&prefix=
host:rgw:7480
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20190503T202153Z
host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2019-05-03 20:21:53,942 - MainThread - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20190503T202153Z
20190503/us-east-1/s3/aws4_request
4a53df374d45f3c4fd3f23544eaf38052f0ffafc56887c2b810daef396a3c9a5
2019-05-03 20:21:53,942 - MainThread - botocore.auth - DEBUG - Signature:
f435d208ec85c441c5535da508cd83ed896314387cf87ce52b3c33276b7cdf07
2019-05-03 20:21:53,942 - MainThread - botocore.endpoint - DEBUG - Sending http request: <AWSPreparedRequest stream_output=False, method=GET, url=http://rgw:7480/s3.data?list-type=2&delimiter=%2F&prefix=&max-keys=2&encoding-type=url, headers={'X-Amz-Content-SHA256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': 'AWS4-HMAC-SHA256 Credential=dataserver/20190503/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=f435d208ec85c441c5535da508cd83ed896314387cf87ce52b3c33276b7cdf07', 'X-Amz-Date': '20190503T202153Z', 'User-Agent': 'aws-cli/1.16.151 Python/2.7.12 Linux/4.4.0-145-generic botocore/1.12.141'}>
2019-05-03 20:21:53,943 - MainThread - urllib3.util.retry - DEBUG - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
2019-05-03 20:21:53,943 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): rgw:7480
2019-05-03 20:21:53,963 - MainThread - urllib3.connectionpool - DEBUG - http://rgw:7480 "GET /s3.data?list-type=2&delimiter=%2F&prefix=&max-keys=2&encoding-type=url HTTP/1.1" 200 960
2019-05-03 20:21:53,964 - MainThread - botocore.parsers - DEBUG - Response headers: {'Date': 'Fri, 03 May 2019 20:21:53 GMT', 'Content-Length': '960', 'x-amz-request-id': 'tx000000000000000000349-005ccca2e1-5e4a-default', 'Content-Type': 'application/xml'}
2019-05-03 20:21:53,964 - MainThread - botocore.parsers - DEBUG - Response body:
<?xml version="1.0" encoding="UTF-8"?><ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>s3.data</Name><Prefix></Prefix><Marker></Marker><NextMarker>0068643f1de06054b34ae82019342447</NextMarker><MaxKeys>2</MaxKeys><Delimiter>/</Delimiter><IsTruncated>true</IsTruncated><EncodingType>url</EncodingType><Contents><Key>005d5652323bee7374beac276426b247</Key><LastModified>2019-04-17T20:33:41.168Z</LastModified><ETag>"005d5652323bee7374beac276426b247"</ETag><Size>78010</Size><StorageClass>STANDARD</StorageClass><Owner><ID>dataserver</ID><DisplayName>RGW dataserver User</DisplayName></Owner></Contents><Contents><Key>0068643f1de06054b34ae82019342447</Key><LastModified>2019-04-17T20:33:45.184Z</LastModified><ETag>"0068643f1de06054b34ae82019342447"</ETag><Size>638622</Size><StorageClass>STANDARD</StorageClass><Owner><ID>dataserver</ID><DisplayName>RGW dataserver User</DisplayName></Owner></Contents></ListBucketResult>
2019-05-03 20:21:53,966 - MainThread - botocore.hooks - DEBUG - Event needs-retry.s3.ListObjectsV2: calling handler <botocore.retryhandler.RetryHandler object at 0x7f7d37717d90>
2019-05-03 20:21:53,966 - MainThread - botocore.retryhandler - DEBUG - No retry needed.
2019-05-03 20:21:53,966 - MainThread - botocore.hooks - DEBUG - Event needs-retry.s3.ListObjectsV2: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f7d37717dd0>>
2019-05-03 20:21:53,967 - MainThread - botocore.hooks - DEBUG - Event after-call.s3.ListObjectsV2: calling handler <function decode_list_object_v2 at 0x7f7d38ced8c0>
2019-05-03 20:21:53,967 - MainThread - botocore.hooks - DEBUG - Event after-call.s3.ListObjectsV2: calling handler <function enhance_error_msg at 0x7f7d3a2ae938>
2019-04-17 20:33:41 78010 005d5652323bee7374beac276426b247
2019-04-17 20:33:45 638622 0068643f1de06054b34ae82019342447
Total Objects: 2
Total Size: 716632
Any help appreciated!
Edit: I tried using s3cmd
and it appears to be behaving differently. I.e., it's sync'ing and ls'ing all files.
Looks like this behaviour is caused by RadosGW supporting list_objects
only, but AWS CLI now only supports list_objects_v2
, introduced in https://github.com/aws/aws-cli/commit/f7fa060afeddf20ad7f44ea00516f900587dbd9f, first available in version 1.16.15.
I have checked ls
as already stated above when using version 1.16.14, which appears to work.
I would be grateful if AWS CLI supported the old API as well.
Any way, it should IMHO return an error when the server does not support / return valid list_objects_v2
responses.
While the CLI may work with services from other vendors, we only guarantee it will work with official AWS services. If you need to use the older functionality then you will need to lock to an older version of the CLI.
Most helpful comment
Looks like this behaviour is caused by RadosGW supporting
list_objects
only, but AWS CLI now only supportslist_objects_v2
, introduced in https://github.com/aws/aws-cli/commit/f7fa060afeddf20ad7f44ea00516f900587dbd9f, first available in version 1.16.15.I have checked
ls
as already stated above when using version 1.16.14, which appears to work.I would be grateful if AWS CLI supported the old API as well.
Any way, it should IMHO return an error when the server does not support / return valid
list_objects_v2
responses.