Azure-sdk-for-java: [bug] URL Encoding to support Blob names with special characters

Created on 9 Jul 2019  路  11Comments  路  Source: Azure/azure-sdk-for-java

Blobs are allowed to have special characters in their name which will need to be encoded when they are used as part of the service URL.

Questions stand on where the encoding should happen, there are a few places where a blob name is set

  • Setting the endpoint in the builder (this would likely be a URL)
  • Setting the blob name in the builder (this could be from a URL)
  • Getting a blob client from the container client (this could be from a URL)
Client Storage blocking-release

All 11 comments

A broader discussion will need to occur given the URL encoding issues found within Files.

Plan is to add documentation to the blobName setter methods in our builders to indicate that blob names need to be encoded.

If the encoding process is more complicated add in a helper method to perform the encoding and use that in the documentation.

Documentation should also go in the ContainerClient methods that construct any form of blob client.

We've used the internal Utility method in the past to special case /+/%20 to do the right thing. It has caused problems in the past with SAS signatures in the query parameters, but I can't say off hand whether this will be strictly necessary for the blob names. I tend to think not because we didn't internally encode in v8 and just let the customer do it themselves.

@vcolin7 please update this issue when a PR is ready so that the encoding related changes can be reviewed

I tested a few options including the Utility method Rick mentioned and that one seemed to be the best one for UTF-8 encoding while converting spaces into %20.

For testing I used all printable and extended ASCII characters. I also used a bunch of different Unicode characters for languages such as Chinese, Russian, Korean, Greek, Japanese and even more obscure ones like Esperanto. I found no issues when trying to upload or download blobs with names containing these characters while using Utility.urlEncode() to encode their names.

I also discovered a couple interesting things:

  • Azure Storage does not seem to encode the following ASCII characters when manually generating a SAS URL !'()*-./\_~.
  • Our utility method does not encode the following ASCII characters: *-._.
  • Forward slashes / and backslashes \ are both used for establishing virtual hierarchies by the Azure Storage service, which will finally parse them to /.
  • ASCII characters not allowed on blob names while using the web UI are: # and %.
  • Uploading files that contain an unencoded percentage symbol % is also not allowed through the Azure SDK.
  • However, you can use the Azure SDK to upload blob names that start with a pound/hashtag (#) with no encoding. The Azure Service would create a blob with no name in such a case.

Even though we incur in some over-encoding for 7 ASCII characters, I don't believe it should cause any issues and I think we could implicitly do the encoding for the user instead of leaving that responsibility to them. I have already discussed this possibility with @joshfree and @alzimmermsft and both seem to agree. What do you think @rickle-msft?

Thanks for doing such a thorough investigation! Since it seems quite safe to encode the names ourselves, it seems to make sense that we can just do it for the customer.

@vcolin7 can you give an update on this bug?

Hi @joshfree, I have addressed the comments and concerns raised by @rickle-msft, made sure all tests ran and added some of my own and I'm just waiting for his or @gapra-msft / @jaschrep-msft approval. In the meantime I'm just going to solve some merge conflicts. Also, I'm still not sure if we need to send this PR to adpship as well.

It would also be good if you or @alzimmermsft could review the latest changes.

Sent an email to adpship after getting Rick's approval.

Merged to master after adpship approval.

Thanks @vcolin7 for fixing this issue

Was this page helpful?
0 / 5 - 0 ratings