I have some sub directories a b c under directory "d"
how can I exclude them once?
Hi @zffocussss !
You can use the -x
flag to exclude many directories or files using a regex pattern. There's some more info in this doc: https://cloud.google.com/storage/docs/gsutil/commands/rsync
Here's more examples from the doc linked above:
-x pattern
Causes files/objects matching pattern to be excluded, i.e., any matching files/objects will not be copied or deleted. Note that the pattern is a Python regular expression, not a wildcard (so, matching any string ending in "abc" would be specified using ".*abc$" rather than "*abc"). Note also that the exclude path is always relative (similar to Unix rsync or tar exclude options). For example, if you run the command:
gsutil rsync -x "data./.*\.txt$" dir gs://my-bucket
it will skip the file dir/data1/a.txt.
You can use regex alternation to specify multiple exclusions, for example:
gsutil rsync -x ".*\.txt$|.*\.jpg$" dir gs://my-bucket
NOTE: When using this on the Windows command line, use ^ as an escape character instead of \ and escape the | character.
Please let me know if that helps or if you have any other questions!
Updated the comment above with a few more details specific to your question. :)
Updated the comment above with a few more details specific to your question. :)
Hi @catleeball ,I try it.
gsutil -d -x "a/|b/|c/" -r d gs://my-bucket
but it does not work.I check my bucket in GCP console,but a,b,c is still here.
I think -x just can exclude files not directories.
Hi @zffocussss ! It looks like the issue might be with your regex. Here's an example I just tested:
Given this local directory structure rsync-test
cball@cball:~$ tree rsync-test/
rsync-test/
โโโ dirA
โย ย โโโ bar.txt
โโโ dirB
โย ย โโโ baz.txt
โโโ dirC
โย ย โโโ baq.txt
โย ย โโโ dirCA
โย ย โโโ bat.txt
โโโ foo.txt
Let's say we want to upload everything except dirA
and dirCA
. We can do that by writing a regex to say "check the path string for substring 'dirA' or substring 'dirCA'". Here's one way to do that:
cball@cball:~$ gsutil rsync -r -x '^.*dirA.*$|^.*dirCA.*$' rsync-test gs://rsync-test-cball
Building synchronization state...
Starting synchronization...
Copying file://rsync-test/dirB/baz.txt [Content-Type=text/plain]...
Copying file://rsync-test/dirC/baq.txt [Content-Type=text/plain]...
Copying file://rsync-test/foo.txt [Content-Type=text/plain]...
/ [3 files][ 0.0 B/ 0.0 B]
Operation completed over 3 objects.
Now let's check and make sure the bucket looks like we want it to:
cball@cball:~$ gsutil ls gs://rsync-test-cball
gs://rsync-test-cball/foo.txt
gs://rsync-test-cball/dirB/
gs://rsync-test-cball/dirC/
cball@cball:~$ gsutil ls gs://rsync-test-cball/dirB/
gs://rsync-test-cball/dirB/baz.txt
cball@cball:~$ gsutil ls gs://rsync-test-cball/dirC
gs://rsync-test-cball/dirC/baq.txt
If it's helpful to you in writing your regex, I've found https://regex101.com/
to be a handy website for testing regexes. You can mouse over each part of the regex and it tells you what it does. ๐
Hi @zffocussss ! It looks like the issue might be with your regex. Here's an example I just tested:
Given this local directory structure
rsync-test
cball@cball:~$ tree rsync-test/ rsync-test/ โโโ dirA โย ย โโโ bar.txt โโโ dirB โย ย โโโ baz.txt โโโ dirC โย ย โโโ baq.txt โย ย โโโ dirCA โย ย โโโ bat.txt โโโ foo.txt
Let's say we want to upload everything except
dirA
anddirCA
. We can do that by writing a regex to say "check the path string for substring 'dirA' or substring 'dirCA'". Here's one way to do that:cball@cball:~$ gsutil rsync -r -x '^.*dirA.*$|^.*dirCA.*$' rsync-test gs://rsync-test-cball Building synchronization state... Starting synchronization... Copying file://rsync-test/dirB/baz.txt [Content-Type=text/plain]... Copying file://rsync-test/dirC/baq.txt [Content-Type=text/plain]... Copying file://rsync-test/foo.txt [Content-Type=text/plain]... / [3 files][ 0.0 B/ 0.0 B] Operation completed over 3 objects.
Now let's check and make sure the bucket looks like we want it to:
cball@cball:~$ gsutil ls gs://rsync-test-cball gs://rsync-test-cball/foo.txt gs://rsync-test-cball/dirB/ gs://rsync-test-cball/dirC/ cball@cball:~$ gsutil ls gs://rsync-test-cball/dirB/ gs://rsync-test-cball/dirB/baz.txt cball@cball:~$ gsutil ls gs://rsync-test-cball/dirC gs://rsync-test-cball/dirC/baq.txt
If it's helpful to you in writing your regex, I've found
https://regex101.com/
to be a handy website for testing regexes. You can mouse over each part of the regex and it tells you what it does. ๐
oh my god.thanks for your help.I know it is python regex.I used the pcre and shell regex.
you are right.I need to check my regex in gsutil.
Hi @zffocussss ! It looks like the issue might be with your regex. Here's an example I just tested:
Given this local directory structure
rsync-test
cball@cball:~$ tree rsync-test/ rsync-test/ โโโ dirA โย ย โโโ bar.txt โโโ dirB โย ย โโโ baz.txt โโโ dirC โย ย โโโ baq.txt โย ย โโโ dirCA โย ย โโโ bat.txt โโโ foo.txt
Let's say we want to upload everything except
dirA
anddirCA
. We can do that by writing a regex to say "check the path string for substring 'dirA' or substring 'dirCA'". Here's one way to do that:cball@cball:~$ gsutil rsync -r -x '^.*dirA.*$|^.*dirCA.*$' rsync-test gs://rsync-test-cball Building synchronization state... Starting synchronization... Copying file://rsync-test/dirB/baz.txt [Content-Type=text/plain]... Copying file://rsync-test/dirC/baq.txt [Content-Type=text/plain]... Copying file://rsync-test/foo.txt [Content-Type=text/plain]... / [3 files][ 0.0 B/ 0.0 B] Operation completed over 3 objects.
Now let's check and make sure the bucket looks like we want it to:
cball@cball:~$ gsutil ls gs://rsync-test-cball gs://rsync-test-cball/foo.txt gs://rsync-test-cball/dirB/ gs://rsync-test-cball/dirC/ cball@cball:~$ gsutil ls gs://rsync-test-cball/dirB/ gs://rsync-test-cball/dirB/baz.txt cball@cball:~$ gsutil ls gs://rsync-test-cball/dirC gs://rsync-test-cball/dirC/baq.txt
If it's helpful to you in writing your regex, I've found
https://regex101.com/
to be a handy website for testing regexes. You can mouse over each part of the regex and it tells you what it does. ๐
By the way,how do you test this regex format as they are in the path of the linux.they are not string.
By the way,how do you test this regex format as they are in the path of the linux.they are not string.
Hi @zffocussss ! When gsutil rsync
runs, it walks the directory tree of the source directory. If you include an exclusion pattern, each file / directory gets matched against your provided regex:
https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/rsync.py#L745
If you open the Python REPL, you can test your regex with something like this:
cball@cball:~$ python
Python 3.7.3 (default, Apr 25 2019, 13:07:15)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile('^.*dirA.*$|^.*dirCA.*$')
>>> dirs = ['rsync-test/dirA', 'rsync-test/dirB', 'rsync-test/dirC', 'rsync-test/dirC/dirCA']
>>> for d in dirs:
... if r.match(d):
... print('Regex matches: ' + d)
... else:
... print('Regex does not match: ' + d)
...
Regex matches: rsync-test/dirA
Regex does not match: rsync-test/dirB
Regex does not match: rsync-test/dirC
Regex matches: rsync-test/dirC/dirCA
Or if you're using the online regex tester, you can plug in different directories and see which ones match or don't. :slightly_smiling_face:
I hope that helps! Please let me know if you have any other questions @zffocussss !
It's also worth making use of the rsync command's -n
flag to run in dry-run mode. This will let you see if you would have copied files you didn't intend to.
Smart thinking, @houglum ! :bulb:
By the way,how do you test this regex format as they are in the path of the linux.they are not string.
Hi @zffocussss ! When
gsutil rsync
runs, it walks the directory tree of the source directory. If you include an exclusion pattern, each file / directory gets matched against your provided regex:https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/rsync.py#L745
If you open the Python REPL, you can test your regex with something like this:
cball@cball:~$ python Python 3.7.3 (default, Apr 25 2019, 13:07:15) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> r = re.compile('^.*dirA.*$|^.*dirCA.*$') >>> dirs = ['rsync-test/dirA', 'rsync-test/dirB', 'rsync-test/dirC', 'rsync-test/dirC/dirCA'] >>> for d in dirs: ... if r.match(d): ... print('Regex matches: ' + d) ... else: ... print('Regex does not match: ' + d) ... Regex matches: rsync-test/dirA Regex does not match: rsync-test/dirB Regex does not match: rsync-test/dirC Regex matches: rsync-test/dirC/dirCA
Or if you're using the online regex tester, you can plug in different directories and see which ones match or don't. ๐
I hope that helps! Please let me know if you have any other questions @zffocussss !
okay.I see.thanks.
It's also worth making use of the rsync command's
-n
flag to run in dry-run mode. This will let you see if you would have copied files you didn't intend to.
so nice advice.I can use this to see what will happen
By the way,how do you test this regex format as they are in the path of the linux.they are not string.
Hi @zffocussss ! When
gsutil rsync
runs, it walks the directory tree of the source directory. If you include an exclusion pattern, each file / directory gets matched against your provided regex:https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/rsync.py#L745
If you open the Python REPL, you can test your regex with something like this:
cball@cball:~$ python Python 3.7.3 (default, Apr 25 2019, 13:07:15) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> r = re.compile('^.*dirA.*$|^.*dirCA.*$') >>> dirs = ['rsync-test/dirA', 'rsync-test/dirB', 'rsync-test/dirC', 'rsync-test/dirC/dirCA'] >>> for d in dirs: ... if r.match(d): ... print('Regex matches: ' + d) ... else: ... print('Regex does not match: ' + d) ... Regex matches: rsync-test/dirA Regex does not match: rsync-test/dirB Regex does not match: rsync-test/dirC Regex matches: rsync-test/dirC/dirCA
Or if you're using the online regex tester, you can plug in different directories and see which ones match or don't. ๐
I hope that helps! Please let me know if you have any other questions @zffocussss !
r=re.compile('^./dirA/.$|^.*/dirA$|^dirA')
dirs = ['rsync-test/dirA', 'rsync-test/dirB', 'rsync-test/dirC', 'rsync-test/dirC/dirCA', 'a/dirAk/b', 'a/dirA/b','dirA/A/B/C']
In [18]: for d in dirs:
...: if r.match(d):
...: print('Regex matches: ' + d)
...: else:
...: print('Regex does not match: ' + d)
...:
Regex matches: rsync-test/dirA
Regex does not match: rsync-test/dirB
Regex does not match: rsync-test/dirC
Regex does not match: rsync-test/dirC/dirCA
Regex does not match: a/dirAk/b
Regex matches: a/dirA/b
Regex matches: dirA/A/B/C
I may find what I want.I need to consider "/",as it is a subdirectory.
I also suggest GCP gsutil team can provide more examples when operating regex,as it is a little complex but it is used actually.
Most helpful comment
It's also worth making use of the rsync command's
-n
flag to run in dry-run mode. This will let you see if you would have copied files you didn't intend to.