Aws-sdk-java: AmazonS3Client doesBucketExist() should throw an exception if bucket access is forbidden

Created on 27 Jul 2017  路  13Comments  路  Source: aws/aws-sdk-java

I have an application that uploads data to S3, before uploading I check whether bucket exists in S3 or not, if it doesn't exist I throw an exception to notify the user to create one. To check if bucket exists I use doesBucketExist() of AmazonS3Client class. This check happens in the initialisation phase.

Today when I ran my application with incorrect AWSCredentials, to my surprise my application didn't fail in the initialisation phase rather it failed when my app tried to upload data to S3. Then I was wondering why it didn't fail in the initialisation phase where I check if bucket exists or not, then looking at the code of AmazonS3Client.doesBucketExist() I realised it was returning true when bucket access is forbidden or for a status code of 403 of AmazonServiceException

screen shot 2017-07-27 at 4 18 33 pm

The issue here is that code is making an incorrect assumption that bucket always exists when a client is unable to access a bucket or if the access is forbidden. In my test I tried to check if a bucket named "nirabafasfanjan-str" exists with incorrect AWS credentials.
These are the buckets available in my region

screen shot 2017-07-27 at 4 04 20 pm

As you can clearly see this is a false positive and in this case correct behaviour should be to throw exception rather than returning true when the client cannot ascertain bucket exists or not.

response-requested

Most helpful comment

Just a quick update. Due to some concerns about internal service teams being tightly coupled to the underlying API call we decided to rollback this change so that doesBucketExist continues to use headBucket. We've deprecated this method and introduced a doesBucketExistV2 which uses the new getBucketAcl approach. This will be changed in the next release of the SDK, version 1.11.201.

All 13 comments

I believe this would be a breaking change, we treat Forbidden as the bucket existing since that usually means it exists but you don't have access to call HeadBucket on it.

Hi Andrew,

we treat Forbidden as the bucket existing since that usually means it exists but you don't have access to call HeadBucket on it.

How does it usually mean bucket exists already? Can you expound if I'm missing anything obvious? I was able to produce a case where bucket doesn't exist for a forbidden exception.

I believe this would be a breaking change

Consider people using doesBucketExist() in their application, it already throws 2 checked exceptions AmazonClientException and AmazonServiceException. Since these are checked exceptions any application developer using this method would handle this or simply propagate it further. So I think it wouldn't be a breaking change after all or I couldn't think any other scenarios where this could really break.

I've a question with respect to current behaviour, is it possible to do any other client operation with an incorrect AWS credentials? Shouldn't any client operation fail when executed first?
Here not failing doesBucketExist() doesn't seem right to me and that too with an incorrect assumption that a bucket exists.

Regards,
Niranjan

How does it usually mean bucket exists already? Can you expound if I'm missing anything obvious? I was able to produce a case where bucket doesn't exist for a forbidden exception.

So right now, giving the precondition that you are using valid credentials, this method will tell you if a bucket exists regardless of whether you have access to it. If we change that to throw the forbidden exception then anyone relying on this method to return true for buckets they don't have access to are broken.

I'm not sure how we can differentiate between the bucket existing and not having access and just having invalid credentials. Would a suitable workaround be making another API call like list buckets to assert credentials are valid?

Hi Andrew,

No, I don't suggest giving a precondition that we are using valid credentials to test if a bucket exists or not, that's an incorrect assumption too.

If we change that to throw the forbidden exception then anyone relying on this method to return true for buckets they don't have access to are broken.

That's exactly I'm trying to emphasise here we shouldn't return true or false based on assumptions. Why would someone want to check a bucket exists if they don't have access for it? What would they do further with that bucket? I believe any client operation with incorrect credentials should fail. Let me iterate my question asked in my previous comment again, can any other client operation succeed with an incorrect AWS credentials?

I'm not sure how we can differentiate between the bucket existing and not having access and just having invalid credentials. Would a suitable workaround be making another API call like list buckets to assert credentials are valid?

Yes I'm thinking of such a workaround but I still believe doesBucketExist() could be made to return a consistent value always.

Regards,
Niranjan

Honestly, your best bet is to probably to deprecate doesBucketExist and encourage folks to just do whatever operation they're trying to do and catch the exception if the bucket does/doesn't exist. That avoids the ambiguity of does-it-exist-in-any-account (because I'd like to claim it if not) vs does-it-exist-in-my-account (because I'd like to read/write some data from it if it does), the lack of useful error code on the exception since it uses a HEAD request that doesn't return a payload, and race conditions where the bucket did/didn't exist when you called doesBucketExist, but gets created/deleted immediately afterwards. headBucket still exists if folks really want it, but without the implied semantics.

@niranjan-wa I'm exploring other implementations of doesBucketExists that avoids the invalid credentials issue. Below are some of my findings.

I'm having good luck with GetBucketAcl which returns an XML payload with more detailed error codes, scenarios I've run so far

1) Bucket exists, credentials valid, permissions to bucket
- Returns success response
2) Bucket exists, credentials valid, no permissions to bucket
- Throws 403 Access Denied
3) Bucket Exists, access key invalid
- Throws 403 InvalidAccessKeyId
4) Bucket does not exist, access key invalid
- Throws 403 InvalidAccessKeyId
5) Bucket exists, secret key invalid
- Throws 403 SignatureDoesNotMatch
6) Bucket does not exist, secret key invalid
- Throws 403 SignatureDoesNotMatch
7) Bucket Exists, anonymous credentials
- Throws 403 AccessDenied
8) Bucket does not exist, anonymous credentials
- Throws 404 NoSuchBucket

So given the above we can treat success response and access denied as the bucket existing, 404 NoSuchBucket as the bucket not existing, and everything else we can throw including the invalid credential exceptions.

I was also thinking it would be possible just to always use anonymous credentials for the HeadBucket call we make and treat 403 AccessDenied as bucket existing and 404 as not existing. I was getting a lot of SlowDown errors with this approach making me think throttling is based on identity and anonymous credentials puts you in a throttling bucket with other customers. If that's the case this is not viable.

HeadBucketRequest headBucketRequest = new HeadBucketRequest("no-such-bucket");
headBucketRequest.setRequestCredentials(new AnonymousAWSCredentials());
s3.headBucket(headBucketRequest);

Decided to go with the getBucketAcl approach. That has much more consistent behavior and will throw an exception when credentials are invalid. Hope to get this out next week.

Yes getBucketAcl approach makes sense and as you've listed it covers all or most of the scenarios.

Another potential consideration is that HEAD requests are free, but GET requests are not.

@fernomac that's a good point. I would be surprised if anyone is calling this method at a rate for that to be significant though and there is no contract on which API call(s) we are making when using this method.

@niranjan-wa just pushed this change, it will be included in our next release.

@shorea Thanks a lot.

Regards,
Niranjan

Just a quick update. Due to some concerns about internal service teams being tightly coupled to the underlying API call we decided to rollback this change so that doesBucketExist continues to use headBucket. We've deprecated this method and introduced a doesBucketExistV2 which uses the new getBucketAcl approach. This will be changed in the next release of the SDK, version 1.11.201.

Was this page helpful?
0 / 5 - 0 ratings