Boto3: listing the top level contents of a s3 bucket with Prefix and Delimiter

Created on 17 Jun 2015  Â·  15Comments  Â·  Source: boto/boto3

Apologies for what sounds like a very basic question. In this example from the s3 docs is there a way to list the continents? I was hoping this might work, but it doesn't seem to:

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('edsu-test-bucket')

for o in bucket.objects.filter(Delimiter='/'):
    print(o.key)

However, the equivalent code using boto2 does seem to work the way I expect:

import boto

s3 = boto.connect_s3()
bucket = s3.get_bucket('edsu-test-bucket')

for o in bucket.list(delimiter='/'):
    print(o.name)
documentation enhancement

Most helpful comment

@edsu I ran into this as well. The "directories" to list aren't really objects (but substrings of object keys), so I do not expect them to show up in an objects collection. As a quick workaround, I list them via client.list_objects.

For non-public buckets (or buckets that you can explicitly access):

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('edsu-test-bucket')
result = bucket.meta.client.list_objects(Bucket=bucket.name,
                                         Delimiter='/')
for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))

Would print:

Europe/
North America/

This doesn't support anonymous calls, though. When I run this, I get an AccessDenied error.

For anonymous calls I haven't found a way to use a s3 resource at all (so far). But I can call list_objects on a low-level client:

import boto3
import botocore

client = boto3.client('s3',  # region_name='us-east-1',
                      config=botocore.client.Config(signature_version=botocore.UNSIGNED))
result = client.list_objects(Bucket='edsu-test-bucket',
                             Prefix='North America/',
                             Delimiter='/'
                             )
for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))                   

Not very beautiful, but it prints what I wanted

North America/Canada/
North America/USA/

All 15 comments

What is the way you expect it to look? Can you give sample output from running the two snippets of code?

@kyleknap: the boto2 sample will list only the top-level “directories” using the unique portion before the delimiter – i.e. given files like North America/United States/California and South America/Brazil/Bahia it would return North America and South America.

Thanks for taking a look @acdha & @kyleknap. Yes, if you assume the above snippets are a.py and b.py the output should look like this:

% ./a.py

% ./b.py
Europe/
North America/

I made s3://edsu-test-bucket public if you want to give it a try. This is what it looks like from aws-cli, but you can see for yourself since it is public.

% aws s3 ls s3://edsu-test-bucket
                           PRE Europe/
                           PRE North America/

% aws s3 ls --recursive s3://edsu-test-bucket
2015-06-18 14:41:31          0 Europe/France/Acquitaine/Bordeaux
2015-06-18 14:41:31          0 North America/Canada/Quebec/Montreal
2015-06-18 14:41:31          0 North America/USA/Washington/Bellevue
2015-06-18 14:41:31          0 North America/USA/Washington/Seattle

@edsu I ran into this as well. The "directories" to list aren't really objects (but substrings of object keys), so I do not expect them to show up in an objects collection. As a quick workaround, I list them via client.list_objects.

For non-public buckets (or buckets that you can explicitly access):

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('edsu-test-bucket')
result = bucket.meta.client.list_objects(Bucket=bucket.name,
                                         Delimiter='/')
for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))

Would print:

Europe/
North America/

This doesn't support anonymous calls, though. When I run this, I get an AccessDenied error.

For anonymous calls I haven't found a way to use a s3 resource at all (so far). But I can call list_objects on a low-level client:

import boto3
import botocore

client = boto3.client('s3',  # region_name='us-east-1',
                      config=botocore.client.Config(signature_version=botocore.UNSIGNED))
result = client.list_objects(Bucket='edsu-test-bucket',
                             Prefix='North America/',
                             Delimiter='/'
                             )
for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))                   

Not very beautiful, but it prints what I wanted

North America/Canada/
North America/USA/

@amatthies You raise a really good point – while the low-level interface is the same, this is definitely a different class of response. Maybe it's as simple as documenting list_objects as the best way to do this?

@amatthies is on the right track here. The reason that it is not included in the list of objects returned is that the values that you are expecting when you use the delimiter are prefixes (e.g. Europe/, North America) and prefixes do not map into the object resource interface. If you want to know the prefixes of the objects in a bucket you will have to use list_objects. However, I would suggest to use the pagination interface for this because this will allow you to iterate through all objects in the bucket without having to provide pagination tokens:

import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket='edsu-test-bucket', Delimiter='/'):
    for prefix in result.get('CommonPrefixes'):
        print(prefix.get('Prefix'))

As to your question as how to use anonymous clients for resources try the following. You will have to hook into the event system to disable signing:

import boto3

from botocore.handlers import disable_signing

resource = boto3.resource('s3')

resource.meta.client.meta.events.register(
    'choose-signer.s3.*', disable_signing)

I realize that this not documented anywhere. Documenting the event system and things you can do with it is on our list of thing that we want to do.

Based on the conversation, I see the following action items:

1) Improve documentation to use client when you are trying to get prefixes.
2) Document the event system and things you can do with the event system
3) Possibly adding a way to disable signing upon instantiation of the resource.

Let me know what you all think or if there is anything else that should be added to this list.

@kyleknap Great, thanks. client.get_paginator('list_objects') answers this question.
Looking forward to some hints on the event system.

Iam having a folder structure in my s3 bucket iam not able to access those sub-folders where my file have been located i have used boto3 and passed Delimeter=/ but not able to access.

Delimiter=/ is not working to restrict access to top level only during pagination.

@SathishRavichandran You can also provide Prefix to the paginate method if you want a specific "subfolder"

Hi @edsu ,
I have bucket name test , and i have 3 folders test1,test2,test3.

i have more than 100 files in each folder .

in that folders i have 2 files with the name test10302019(currentdate) and test10292019(previousdaydate)

now i want find those files in whether they have existed in 3 folders or not.

could you please help me how can i achieve it

@edsu you need to give the path that points to your bucket. specify the format with points to your bucket

eg:
bucket_name = bucket.get('bucket_name')
bucket_path =bucket.get('bucket_path')
bucket_obj = bucket_path + '/' + date_obj + '/'

This fails for me and i can't find anywhere one that works, no idea what's happening:

func_s3 = boto3.resource('s3')
    bucket = func_s3.Bucket( 'mybucket' )

    for object in bucket.objects.filter(Prefix='something', Delimiter='/'):
        srcKey = object.key

I'm getting:

Traceback (most recent call last):
  File "local_handle.py", line 7, in <module>
    ""
 line 195, in renameObjects
    for object in bucket.objects.filter(Prefix='2016-10-10', Delimiter='/'):
  File "/mydir/venv/lib/python3.6/site-packages/boto3/resources/collection.py", line 83, in __iter__
    for page in self.pages():
  File "/mydir/venv/lib/python3.6/site-packages/boto3/resources/collection.py", line 166, in pages
    for page in pages:
  File "/mydir/venv/lib/python3.6/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/paginate.py", line 332, in _make_request
    return self._method(**current_kwargs)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/client.py", line 634, in _make_api_call
    api_params, operation_model, context=request_context)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/client.py", line 680, in _convert_to_request_dict
    api_params, operation_model, context)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/client.py", line 712, in _emit_api_params
    params=api_params, model=operation_model, context=context)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/mydir/venv/lib/python3.6/site-packages/botocore/handlers.py", line 219, in validate_bucket_name
    if VALID_BUCKET.search(bucket) is None:
TypeError: expected string or bytes-like object

This works for me in local environment but failed when I tried to move it to AWS Lambda Python environment and no idea what's happening:

session = Session(aws_access_key_id=DECRYPTED_FEN9LI_AWS_ACCESS_KEY_ID,
                  aws_secret_access_key=DECRYPTED_FEN9LI_AWS_SECRET_ACCESS_KEY)
s3 = session.resource('s3')

bucket = s3.Bucket('mybucket')
for obj in bucket.objects.all():
    print(obj.key)

I's getting:

Response:
{
  "errorMessage": "must be str, not bytes",
  "errorType": "TypeError",
  "stackTrace": [
    [
      "/var/task/lambda-function.py",
      29,
      "lambda_handler",
      "for obj in bucket.objects.all():"
    ],
    [
      "/var/runtime/boto3/resources/collection.py",
      83,
      "__iter__",
      "for page in self.pages():"
    ],
    [
      "/var/runtime/boto3/resources/collection.py",
      166,
      "pages",
      "for page in pages:"
    ],
    [
      "/var/runtime/botocore/paginate.py",
      255,
      "__iter__",
      "response = self._make_request(current_kwargs)"
    ],
    [
      "/var/runtime/botocore/paginate.py",
      332,
      "_make_request",
      "return self._method(**current_kwargs)"
    ],
    [
      "/var/runtime/botocore/client.py",
      272,
      "_api_call",
      "return self._make_api_call(operation_name, kwargs)"
    ],
    [
      "/var/runtime/botocore/client.py",
      563,
      "_make_api_call",
      "operation_model, request_dict, request_context)"
    ],
    [
      "/var/runtime/botocore/client.py",
      582,
      "_make_request",
      "return self._endpoint.make_request(operation_model, request_dict)"
    ],
    [
      "/var/runtime/botocore/endpoint.py",
      102,
      "make_request",
      "return self._send_request(request_dict, operation_model)"
    ],
    [
      "/var/runtime/botocore/endpoint.py",
      132,
      "_send_request",
      "request = self.create_request(request_dict, operation_model)"
    ],
    [
      "/var/runtime/botocore/endpoint.py",
      116,
      "create_request",
      "operation_name=operation_model.name)"
    ],
    [
      "/var/runtime/botocore/hooks.py",
      356,
      "emit",
      "return self._emitter.emit(aliased_event_name, **kwargs)"
    ],
    [
      "/var/runtime/botocore/hooks.py",
      228,
      "emit",
      "return self._emit(event_name, kwargs)"
    ],
    [
      "/var/runtime/botocore/hooks.py",
      211,
      "_emit",
      "response = handler(**kwargs)"
    ],
    [
      "/var/runtime/botocore/signers.py",
      90,
      "handler",
      "return self.sign(operation_name, request)"
    ],
    [
      "/var/runtime/botocore/signers.py",
      160,
      "sign",
      "auth.add_auth(request)"
    ],
    [
      "/var/runtime/botocore/auth.py",
      368,
      "add_auth",
      "signature = self.signature(string_to_sign, request)"
    ],
    [
      "/var/runtime/botocore/auth.py",
      348,
      "signature",
      "k_date = self._sign(('AWS4' + key).encode('utf-8'),"
    ]
  ]
}

Request ID:
"a4d9a8db-052c-41d2-be53-07a1097d57f4"

Function Logs:
START RequestId: a4d9a8db-052c-41d2-be53-07a1097d57f4 Version: $LATEST
s3.Object(bucket_name='mybucket', key='lambda_function.zip')
must be str, not bytes: TypeError
Traceback (most recent call last):
  File "/var/task/provision-aws-resource-function.py", line 29, in lambda_handler
    for obj in bucket.objects.all():
  File "/var/runtime/boto3/resources/collection.py", line 83, in __iter__
    for page in self.pages():
  File "/var/runtime/boto3/resources/collection.py", line 166, in pages
    for page in pages:
  File "/var/runtime/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/var/runtime/botocore/paginate.py", line 332, in _make_request
    return self._method(**current_kwargs)
  File "/var/runtime/botocore/client.py", line 272, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 563, in _make_api_call
    operation_model, request_dict, request_context)
  File "/var/runtime/botocore/client.py", line 582, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/var/runtime/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/var/runtime/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/var/runtime/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/var/runtime/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/var/runtime/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/var/runtime/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/var/runtime/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/var/runtime/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/var/runtime/botocore/auth.py", line 368, in add_auth
    signature = self.signature(string_to_sign, request)
  File "/var/runtime/botocore/auth.py", line 348, in signature
    k_date = self._sign(('AWS4' + key).encode('utf-8'),
TypeError: must be str, not bytes

END RequestId: a4d9a8db-052c-41d2-be53-07a1097d57f4
REPORT RequestId: a4d9a8db-052c-41d2-be53-07a1097d57f4  Duration: 2803.29 ms    Billed Duration: 2900 ms    Memory Size: 128 MB Max Memory Used: 80 MB  Init Duration: 183.85 ms    

@edsu I ran into this as well. The "directories" to list aren't really objects (but substrings of object keys), so I do not expect them to show up in an objects collection. As a quick workaround, I list them via client.list_objects.

For non-public buckets (or buckets that you can explicitly access):

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('edsu-test-bucket')
result = bucket.meta.client.list_objects(Bucket=bucket.name,
                                         Delimiter='/')
for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))

Would print:

Europe/
North America/

This doesn't support anonymous calls, though. When I run this, I get an AccessDenied error.

For anonymous calls I haven't found a way to use a s3 resource at all (so far). But I can call list_objects on a low-level client:

import boto3
import botocore

client = boto3.client('s3',  # region_name='us-east-1',
                      config=botocore.client.Config(signature_version=botocore.UNSIGNED))
result = client.list_objects(Bucket='edsu-test-bucket',
                             Prefix='North America/',
                             Delimiter='/'
                             )
for o in result.get('CommonPrefixes'):
    print(o.get('Prefix'))                   

Not very beautiful, but it prints what I wanted

North America/Canada/
North America/USA/

This doesn't work with me. All I see is the equivalent of North America/

Was this page helpful?
0 / 5 - 0 ratings