Boto3: S3 list_objects documentation incorrectly uses NextMarker

Created on 3 Feb 2016  路  1Comment  路  Source: boto/boto3

The current documentation currently states that a NextMarker will be returned for s3.list_objects. The correct approach for iterating when you get a truncated response however, is to use the the of the last returned object as the NextMarker. The official documentation for ListObjects (http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html) explains the process:

Specifies the key to start with when listing objects in a bucket. Amazon S3 returns object keys in alphabetical order, starting with key after the marker in order.

Example of return from list_objects with MaxKeys=1

{
    "Name": "mrflibbleisverycross",
    "ResponseMetadata": {
        "HTTPStatusCode": 200,
        "HostId": "...",
        "RequestId": "..."
    },
    "MaxKeys": 1,
    "Prefix": "",
    "Marker": "",
    "EncodingType": "url",
    "IsTruncated": true,
    "Contents": [
        {
            "LastModified": "2016-02-03T19:50:29+00:00",
            "ETag": "\"d3b07384d113edec49eaa6238ad5ff00\"",
            "StorageClass": "STANDARD",
            "Key": "1.txt",
            "Owner": {
                "DisplayName": "...",
                "ID": "..."
            },
            "Size": 4
        }
    ]
}

To correctly iterate over all objects in a bucket, you would need something like:

#!/usr/bin/env python
import boto3.session

s3 = boto3.session.Session(profile_name='...').client('s3')

done = False
marker = None
while not done:
    if marker:
        data = s3.list_objects(Bucket='mrflibbleisverycross', MaxKeys=1, Marker=marker)
    else:
        data = s3.list_objects(Bucket='mrflibbleisverycross', MaxKeys=1)

    if data['IsTruncated']:
        marker = data['Contents'][-1]['Key']
    else:
        done = True
question

Most helpful comment

Note: This element is returned only if you have delimiter request parameter specified.

So you aren't seeing the token because you do not supply a delimiter. Thankfully we provide helpers so that you don't need to know the specifics of a service's pagination. For boto3, the simplest way to paginate with a client is to use the paginators:

import boto3

s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects')
for page in paginator.paginate(Bucket='bucket-name'):
    # Do something

An even easier method is to use the boto3 resource class, and its collections:

import boto3

s3 = boto3.resource('s3)
bucket = s3.Bucket('bucket')
for s3_object in bucket.objects.all():
    # Do something

Paginators are available for most services, and resource models are available for the most popular services. Also, if you ever need to drop down to the client from a resource, you can access the underlying client like so: s3.meta.client, bucket.meta.client, etc.

>All comments

Note: This element is returned only if you have delimiter request parameter specified.

So you aren't seeing the token because you do not supply a delimiter. Thankfully we provide helpers so that you don't need to know the specifics of a service's pagination. For boto3, the simplest way to paginate with a client is to use the paginators:

import boto3

s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects')
for page in paginator.paginate(Bucket='bucket-name'):
    # Do something

An even easier method is to use the boto3 resource class, and its collections:

import boto3

s3 = boto3.resource('s3)
bucket = s3.Bucket('bucket')
for s3_object in bucket.objects.all():
    # Do something

Paginators are available for most services, and resource models are available for the most popular services. Also, if you ever need to drop down to the client from a resource, you can access the underlying client like so: s3.meta.client, bucket.meta.client, etc.

Was this page helpful?
0 / 5 - 0 ratings