Blobs are allowed to have special characters in their name which will need to be encoded when they are used as part of the service URL.
Questions stand on where the encoding should happen, there are a few places where a blob name is set
A broader discussion will need to occur given the URL encoding issues found within Files.
Plan is to add documentation to the blobName setter methods in our builders to indicate that blob names need to be encoded.
If the encoding process is more complicated add in a helper method to perform the encoding and use that in the documentation.
Documentation should also go in the ContainerClient methods that construct any form of blob client.
We've used the internal Utility method in the past to special case
@vcolin7 please update this issue when a PR is ready so that the encoding related changes can be reviewed
I tested a few options including the Utility method Rick mentioned and that one seemed to be the best one for UTF-8 encoding while converting spaces into %20.
For testing I used all printable and extended ASCII characters. I also used a bunch of different Unicode characters for languages such as Chinese, Russian, Korean, Greek, Japanese and even more obscure ones like Esperanto. I found no issues when trying to upload or download blobs with names containing these characters while using Utility.urlEncode() to encode their names.
I also discovered a couple interesting things:
!'()*-./\_~.*-._./ and backslashes \ are both used for establishing virtual hierarchies by the Azure Storage service, which will finally parse them to /.# and %.% is also not allowed through the Azure SDK.#) with no encoding. The Azure Service would create a blob with no name in such a case.Even though we incur in some over-encoding for 7 ASCII characters, I don't believe it should cause any issues and I think we could implicitly do the encoding for the user instead of leaving that responsibility to them. I have already discussed this possibility with @joshfree and @alzimmermsft and both seem to agree. What do you think @rickle-msft?
Thanks for doing such a thorough investigation! Since it seems quite safe to encode the names ourselves, it seems to make sense that we can just do it for the customer.
@vcolin7 can you give an update on this bug?
Hi @joshfree, I have addressed the comments and concerns raised by @rickle-msft, made sure all tests ran and added some of my own and I'm just waiting for his or @gapra-msft / @jaschrep-msft approval. In the meantime I'm just going to solve some merge conflicts. Also, I'm still not sure if we need to send this PR to adpship as well.
It would also be good if you or @alzimmermsft could review the latest changes.
Sent an email to adpship after getting Rick's approval.
Merged to master after adpship approval.
Thanks @vcolin7 for fixing this issue