azcopy version 10.0.9-Preview
Ubuntu 18.04
azcopy cp "https://storage.cloud.google.com/mybucket-name/mybucket-prefix" "https://myazaccount.blob.core.windows.net/migracion-blobstorage?st=AAA&se=BBB&sp=CCC&sv=DDD&sr=c&sig=EEE" --recursive
Found error:
INFO: Cannot infer source location of https://storage.cloud.google.com/mybucket-name/mybucket-prefix. Please specify the --from-to switch
failed to parse user input due to error: the inferred source/destination combination is currently not supported. Please post an issue on Github if support for this scenario is desired
Try azopy with a google cloud storage source
According to GCS documentation, it is possible to generate authenticated GCS requests in the same way than AWS S3 requests.
Thus, I created the AWS access and secret keys and set them with the corresponding Cloud Storage access key and secret key.
Then I tried to get azure-azcopy source and modified a couple of files (created a IsGCSURL() function for a basic url domain check and used it at validators.go: 123).
Finally I generated a new binary file and ran it. It worked at the beginning, but after that I got a new error:
"failed to perform copy command due to error: cannot start job due to error: Invalid S3 URL. AzCopy supports standard virtual-hosted-style or path-style URLs defined by AWS, E.g: https://bucket.s3.amazonaws.com or https://s3.amazonaws.com/bucket".
Surfing the source code, I realized it's not as easy as I expected and I have very little experience with go.
So it would be nice to support GCS import in azcopy
Thank you in advance
Hi @efcorpa, thanks for reaching out!
I'll add this item to our backlog and discuss with the Team to prioritize it accordingly.
@efcorpa - what is your scenario if we may ask ? Are you trying to move all your data ? Or are you trying to use two clouds ? For now I would recommend you to check out Azure Data Factory: https://docs.microsoft.com/en-us/azure/data-factory/connector-google-cloud-storage
Hi @seguler .
Thanks for your suggestion.
We need to move about 30 TB of near 3 millions of pdf documents from GCS to Azure BS.
There will be an initial massive copy and after it, both cloud storages will be synchronized (allways from GCS to AzBS) for about one-two months, so a rsync looks idoneous for this scenario. Finally GCS will be shut down.
Yes, we considered Azure Data Factory, but our first price forecast went to more than €5,000 (it's hard to estimate prices based on "€0.844 per 1,000 runs") which looks unaffordable at first sight.
We've explored another approaches:
Thank you @efcorpa - I understand you not only need a simple copy option, but also sync option to do diff copies between clouds. Sync is not supported for AWS to Azure copy today.. So that will be another feature request...
We'll consider both of these and update this thread.
@seguler Any update on this? We need GCS -> Azure blob storage (and corresponding metadata) asap.
As described here: https://cloud.google.com/storage/docs/migrating, apps that know how to talk to AWS should also be able to talk to GCS. Here on the AzCopy v10 team, we have not had time to test that yet. When tested, I expect that either it will just work, or else only very small changes will be needed.
@jtlz2, there's nothing to stop you doing your own testing of this. It might just work. But note that you'd be using the tool in a scenario which we don't currently test or support. I'll make some enquires within the team to see what plans, if any, we have about future support in that area.
Hi @JohnRusk : AFAIK, there is an explicit regex validation of the source url to be a valid "s3" url. I tried to bypass it but there are similar validations later in the code (func NewS3URLParts at s3URLParts.go). These validations seem to be necessary in order to use the minio s3 client.
Hence, GCS interoperability won't work and it seems that more than "very small changes" will be needed
Thank you in advance
Thanks @efcorpa. I believe the minio client itself its supposed to be compatible with any cloud provider that supports the S3 API, so I'd be surprised if there's any issue inside minio.
As for the issues in the AzCopy code, that's for pointing those out. We can't currently resource that change within the AzCopy team right now I'm sorry. If you want to, and are comfortable(ish) with Go coding, you're welcome to make the changes in your own fork of the code, use it, and contribute the changes back as a PR to this repo. Sorry we can't do it for you in the near future.
Hi @JohnRusk. I'll give it a chance and try to play with it in next vacations.
Thanks anyway
@efcorpa did you try azcopy for gcs? did it work for you?
@Fahadsaadullahkhan, as commented before, azcopy does not work with GCS. It's neccessary to make several changes in order to bypass S3 bucket url validations or (better) consider GCS url structure as well. I'll try to do something by the end of August
Is there any update of this thread? Can we use AzCopy to move data from GCS to Azure Blob Storage?
@rishabpoh ?
Most helpful comment
Hi @seguler .
Thanks for your suggestion.
We need to move about 30 TB of near 3 millions of pdf documents from GCS to Azure BS.
There will be an initial massive copy and after it, both cloud storages will be synchronized (allways from GCS to AzBS) for about one-two months, so a rsync looks idoneous for this scenario. Finally GCS will be shut down.
Yes, we considered Azure Data Factory, but our first price forecast went to more than €5,000 (it's hard to estimate prices based on "€0.844 per 1,000 runs") which looks unaffordable at first sight.
We've explored another approaches: