Query/Question
I haven't found any particular recommendations or examples of how SDK should be used to do the blob copy.
What are the right ways to implement a copy blob using Java v12 SDK?
More details about an issue I'm having here
Hi, @vzhemevko. Thanks for your question. Could you try attaching a sas token with appropriate permissions to the copy source url? That should allow you to successfully complete the copy. For more information, you can look at the REST doc, which states: "The source blob must either be public or must be authorized via a shared access signature". Also note, "The value should be URL-encoded as it would appear in a request URI."
As far as the clarity of the error message, I believe that one is left intentionally vague for security reasons.
Please let me know if you have any more questions or concerns
Hi, @rickle-msft . Thank you for your reply.
I missed this statement "_The source blob must either be public or must be authorized via a shared access signature_" and the SAS part. I believe that it's because I'm not using the REST API directly. I'm using only the SDK and I would appreciate it if you could point me to some example of how I can generate and attach SAS token to my source blob using only the SDK.
I also think that it would be very beneficial to have a separate item in the SDK example list for copy blob. It would be good to have an example of what are the recommended ways to do copy blob and the limitations and requirements, like attach SAS token for private blobs.
Regarding the issue. I received a suggestion to use this code for SAS token generation. I have tried to use it in my case and now I have the following message which is also a bit vague to me.
c.a.storage.blob.BlobServiceAsyncClient : Service client cannot be accessed without credentials
@vzhemevko Thanks for the update! The code you linked to looks fine for Sas generation. The exception you're seeing is thrown when a service client is constructed without credentials (aka an anonymous client) and then a method is called which requires credentials. That leads me to believe that you are generating an account sas? In which case, please double check that you set a sharedKeyCredential on the builder.
Thank you for your suggestion on adding samples to copy blob. We will look into adding more soon.
I also just created a separate issue to track adding the copy samples so this thread can focus on answering your questions.
@rickle-msft Thank you for the issue regarding updating the docs, I think that will be very useful.
Sorry for the delay in response, was a bit busy with other activities.
Thanks to your recommendations and this answer I finally managed to do the copy. My final solution looks something like this
String sasToken = sourceBlobClient.generateSasToken();
copyBlobClient.copyFromUrl(sourceBlobClient.getBlobUrl() + "?" + sasToken);
The sourceBlobClient.generateSasToken(); has the implementation based on the link I shared earlier.
However, I'm still not sure If I'm doing everything correctly since I'm doing copy in the same account and for BlobServiceClient I'm using MSI so I was expecting It should work without SAS. Could you confirm that this is a normal situation and provide details on the necessity of adding the token in the same account? Perhaps I have misconfigured something.
@vzhemevko I'm glad to hear you're having more success now. The docs for copyFromUrl state: "The source for a Copy Blob From URL operation can be any committed block blob in any Azure storage account which is either public or authorized with a shared access signature." So I think the behavior you're seeing is expected. If you start an async copy using the beginCopy method, you should be able to share authentication between the source and destination if they are in the same account.
@rickle-msft Thanks for the examples. I had a question but the samples posted had answered that.
On a somewhat side note, yet quite related to the CopyBlob, the aysnc version has a SyncPoller which would be querying the Copy Status of the Blob to determine when the operation should end. The docs mention the following under the billing section here https://docs.microsoft.com/en-us/rest/api/storageservices/copy-blob#remarks
The destination account of a Copy Blob operation is charged for one transaction to initiate the copy, and also incurs one transaction for each request to abort or request the status of the copy operation.
Seems like this pretty much answers my question but still wanted to confirm that the SyncPoller would be making requests that fall under the request status operation and would hence be _charged_ ?
If yes, then the default time of 1s to poll seems a bit aggressive. I would think it would be better for me to see how long these copy operations take on an average based on the blob size and maybe dynamically adjust the polling time to reduce the number of calls made. Thoughts?
@somanshreddy Glad to hear the samples answered your question :)
Your understanding is correct. We make a getProperties call to check the status, so billing will be standard for that api.
I think it is a fair claim that it is a rather aggressive default. The async copy only guarantees that the copy completes within two weeks, but most copies, especially for small blobs, complete very quickly, so it's frankly next to impossible to pick something that doesn't have major drawbacks in at least some scenarios.
Your idea is a good one. I'm not sure you'll be able to definitively predict how long a copy will take based on the size as it's not quite so deterministic, but collecting some data and tailoring your configurations based on your datasets is generally the right way to go. The tradeoffs of cost for more requests vs wasting time waiting for another poll operation when the copy is actually completed is something you'll have to determine for your scenario. If we can assist in determining that in any way, though do let us know.
@rickle-msft Thank you. I think I'll try out a few datasets and see how things go. Yes, I agree that it isn't deterministic but a small adjustment can be made based on size. Running a few tests would make things concrete. Thanks for offering help, I think I have understood the situation and I can keep this thread posted regarding the developments if you or anyone else might be interested.
@rickle-msft Also, I missed asking this. CopyFromUrl seems to have a restriction of 256 MB. Since my application would need to copy things in a sync manner, I think any copies below 256MB can be done via the CopyFromUrl to avoid the extra calls to poll right? And then the ones above 256 can be done via beginCopy with the poll time dynamically adjusted based on the results of some tests.
That is correct!
Closing since all questions have been resolved.
Most helpful comment
@vzhemevko I'm glad to hear you're having more success now. The docs for copyFromUrl state: "The source for a Copy Blob From URL operation can be any committed block blob in any Azure storage account which is either public or authorized with a shared access signature." So I think the behavior you're seeing is expected. If you start an async copy using the
beginCopymethod, you should be able to share authentication between the source and destination if they are in the same account.