Aws-cli: AWS S3 ls wildcard support

Created on 5 Dec 2018 · 15Comments · Source: aws/aws-cli

Currently it seems there is no way to search for file(s) using ls and a wild card. For example:

aws s3 ls s3://bucket/folder/2018*.txt

This would return nothing, even if the file is present.

I have done some searching online, it seems the wildcard is supported for rm, mv & cp but not ls. The common solution to getting this done is to ls the entire directory then grep for the files you are searching for

aws s3 ls s3://bucket/folder/ | grep 2018*.txt

But come across this, I also found warnings that this won't work effectively if there are over a 1000 objects in a bucket.

To me, it appears it would be nice to have the aws s3 ls command to work with wildcards instead of trying to handle with a grep & also having to deal with the 1000 object limit.

guidance s3

Source

Bongani

👍17

Most helpful comment

Python implementation of s3 wildcard search:

import boto3
import re

def search_s3_regex(results, bucket, prefix, regex_path):
    s3_client = boto3.client('s3')
    wc_parts = regex_path.split('/')
    if len(wc_parts) == 1 and len(wc_parts[0]) == 0:
        results.append(prefix)
        return
    else:
        regex = re.compile(wc_parts[0])
        next_regex_path = '/'.join(wc_parts[1:])
        paginator = s3_client.get_paginator('list_objects')
        result = paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=prefix)
        for pref in result.search('CommonPrefixes'):
            if pref is None:
                # check files
                for k in result.search('Contents'):
                    res = k.get('Key')
                    search_prefix = res if len(prefix) == 0 else res.split(prefix)[1]
                    if re.match(regex, search_prefix):
                        results.append(res)
            else:
                # check paths
                res = pref.get('Prefix')
                search_prefix = res if len(prefix) == 0 else res.split(prefix)[1]
                if re.match(regex, search_prefix):
                    search_s3_regex(results, bucket, res, next_regex_path)

After that you can call the function like this:

res =[]
search_s3_regex(res, 'my_bucket', 'initial_prefix/blah/', 'b.{2}h/[0-9]{2}-.*-2019/.*')
for p in res:
     print(p)

nickorka on 11 Sep 2019

👍4 👀3 😄3 🚀2 ❤2 🎉2

All 15 comments

@Bongani - Thank you for reaching out and reporting this feature. This would be a feature requests for our service team. If the service team approves and adds this feature it will be exposed from the API to the CLI. I have submitted an internal request to our service team but would recommend any follow up via a case with AWS Support or reach out to the AWS Forum for S3.

justnance on 10 Dec 2018

@justnance Thanks for submitting the request.

Bongani on 11 Dec 2018

no problem at all.

justnance on 10 Jan 2019

As per our S3 Service team, _Thanks for the feedback, your feature request will be prioritized with other features planned for S3._

justnance on 23 Mar 2019

👍4

Thanks for the update. It's greatly appreciated.

Bongani on 23 Mar 2019

I would love this also.

dhalliday-shotspotter on 13 Jun 2019

👍1

In my case I have lots of files in S3 under date folders of the form

bucket/name/YYYY-MM/YYYY-MM-DD/filename.ext

I would love to say "aws s3 ls s3://mybucket//2016-03//" and list all files from all root prefixed for march 2016.

dhalliday-shotspotter on 13 Jun 2019

👍6 ❤1

So how this is going now? Is there any roadmap including this feature?

kalihman on 1 Jul 2019

👍3

Hi All. This was escalated internally and there is no other update at this time. This feature is controlled by the service team and not the CLI. I can escalated it again with the forum link if this feature request was posted on the S3 Service team's forum.

justnance on 10 Jul 2019

👎12

This is really essential. You can grep, but man... really?

I just had to do this to move a series of folders from being called .csv to .jsonl. This is ridiculous!

# Copy
for file in `aws s3 ls s3://stackoverflow-events/07-30-2019/|tr -s ' '|cut -d ' ' -f3`; 
do 
    aws s3 cp --recursive \
        s3://stackoverflow-events/07-30-2019/$file \
        `echo s3://stackoverflow-events/07-30-2019/$file|sed 's/csv/jsonl/'`;
done

# Now delete
for file in `aws s3 ls 's3://stackoverflow-events/07-30-2019/'|grep 'Questions.Stratified.*.csv/$'|tr -s ' '|cut -d ' ' -f3`;
do
    aws s3 rm --recursive s3://stackoverflow-events/07-30-2019/$file;
done

For shame!

rjurney on 1 Aug 2019

👍11

Python implementation of s3 wildcard search:

import boto3
import re

def search_s3_regex(results, bucket, prefix, regex_path):
    s3_client = boto3.client('s3')
    wc_parts = regex_path.split('/')
    if len(wc_parts) == 1 and len(wc_parts[0]) == 0:
        results.append(prefix)
        return
    else:
        regex = re.compile(wc_parts[0])
        next_regex_path = '/'.join(wc_parts[1:])
        paginator = s3_client.get_paginator('list_objects')
        result = paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=prefix)
        for pref in result.search('CommonPrefixes'):
            if pref is None:
                # check files
                for k in result.search('Contents'):
                    res = k.get('Key')
                    search_prefix = res if len(prefix) == 0 else res.split(prefix)[1]
                    if re.match(regex, search_prefix):
                        results.append(res)
            else:
                # check paths
                res = pref.get('Prefix')
                search_prefix = res if len(prefix) == 0 else res.split(prefix)[1]
                if re.match(regex, search_prefix):
                    search_s3_regex(results, bucket, res, next_regex_path)

After that you can call the function like this:

res =[]
search_s3_regex(res, 'my_bucket', 'initial_prefix/blah/', 'b.{2}h/[0-9]{2}-.*-2019/.*')
for p in res:
     print(p)

nickorka on 11 Sep 2019

👍4 👀3 😄3 🚀2 ❤2 🎉2

Just curious, about a year later, is there plan for native support? The scripts are nice, but it doesn't help in cases where there are tons of files in the directory that still have to be traversed on the client side. Even include/exclude filters which are already supported by cp/rm/mv would be nice.
https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters

kelek on 2 Mar 2020

👍6

You can use sync with --dryrun option:

aws s3 sync --dryrun --exclude '*' --include '*/.DS_Store' s3://mybucket ./

felipe1982 on 2 Apr 2020

👍7 ❤2 👀1

You can use sync with --dryrun option:
aws s3 sync --dryrun --exclude '*' --include '*/.DS_Store' s3://mybucket ./

This solution is such a great hack. You can distill it into single S3 urls by piping to awk:

aws s3 sync --dryrun --exclude '*' --include '*/.DS_Store' s3://mybucket ./ | awk '/s3:\/\//{print $3}'

alukach on 21 Aug 2020

| awk '/s3:///{print $3}'

This works well for me!

teaguexiao on 9 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

getting a certificate in the correct format

kangman · 3Comments

Unrecognized resource types: CognitoUserPool, CognitoUserPoolClient

ikim23 · 3Comments

Unable to install on Mac

rahul003 · 3Comments

I want to filter instances by matching a substring of a tag value

DrStrangepork · 3Comments

Policies must be valid JSON and the first byte must be '{'

ronaldpetty · 3Comments