While trying to move directories recursively to GCS, I've run into this error on almost every directory:
CommandException: arg (./guix-backup) does not name a directory, bucket, or bucket subdir.
I'm guessing the tool detects these as files instead of directories for some reason. Here's output from two commands. One that works and one that doesn't.
build-staging ~ # ls -lsa /var/log
total 244
8 drwxr-xr-x. 4 root root 4096 Jun 13 08:49 .
8 drwxr-xr-x. 11 root root 4096 Aug 2 16:48 ..
4 -rw-------. 1 root utmp 0 Jun 13 08:49 btmp
4 -rw-r--r--. 1 root root 0 Jun 13 08:49 faillog
8 drwxr-sr-x. 4 root systemd-journal 4096 Jun 13 08:49 journal
8 -rw-r--r--. 1 root root 146292 Aug 10 18:46 lastlog
8 drwx------. 2 root root 4096 Jun 7 23:19 sssd
196 -rw-rw-r--. 1 root utmp 190080 Aug 10 18:46 wtmp
build-staging ~ # gsutil rsync -r /var/log gs://guix/logs
Building synchronization state...
Starting synchronization
build-staging ~ # ls -lsa /gnu
total 2884
8 drwxr-xr-x. 5 root root 4096 Aug 2 16:09 .
4 drwxr-xr-x. 19 root root 4096 Jul 12 08:50 ..
0 drwxrwxr-t. 1304 root hamann 0 Aug 9 21:44 store
488 drwxrwxr-t. 1053 root hamann 491520 Aug 2 14:25 store-old
2384 drwxr-xr-x. 35 root root 2433024 Jul 31 18:28 store2
build-staging ~ # gsutil rsync -r /gnu/store2 gs://guix/store-test
CommandException: arg (/gnu/store2) does not name a directory, bucket, or bucket subdir.
If you open up a Python interpreter and run os.path.isdir('/gnu/store2')
, does it return True or False? The only thing I can think that might cause this off the top of my head is IsDirectory()
(defined in gslib/storage_url.py) returning False.
>>> os.path.isdir('/gnu/store2')
True
The code branch here causes the failure: https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/rsync.py#L1365
Mhmm. I assume that if we follow that up the stack to copy_helper.ExpandUrlToSingleBlr, then these two lines are relevant:
if storage_url.IsFileUrl():
return (storage_url, storage_url.IsDirectory())
(unless IsFileUrl()
returns false, in which case something is borked).
...and finally to storage_url.py's IsDirectory()
method. The things that can cause that to fail are either we think the file is a stream, we think it's a fifo, or we don't think it's a directory. You could throw a breakpoint in (via import pdb; pdb.set_trace()
) before the return statement in _FileUrl's IsDirectory()
method and see if either self.IsStream()
or self.IsFifo()
return True for some reason (I assume IsDirectory()
will be True, given the output you commented above)
IsStream()
should only be True if you passed -
in as an argumentIsFifo()
returns True if this statement is truthy: stat.S_ISFIFO(os.stat(path).st_mode)
IsFifo()
is false here as well. So I'm confused. I'm using gsutil 4.26 by the way.
Does this also happen with a fresh installation of v4.27? If so, can you paste the output of gsutil version -l
? This should display some info about your Python version, OS you're using, checksum, and gsutil path (you may want to redact personal information in any local FS paths that might be in the output).
Without knowing what OS you're using, I do find it odd that this directory doesn't show up as taking up 4KiB (I assume this is your FS's block size, since .
and ..
take up 4KiB), or slightly more in the edge case that it might have contained lots of files at one time (this rarely occurs -- usually even something like /var/log is only a bit above block size, say 12K, but I don't think I've ever seen a directory file take up over 2 MiB).
This is on CoreOS. I'll try running gsutil inside a Docker container to compare. I'll also try with an install of 4.27.
@jsierles Did you ever figure this out? Someone else recently notified me of this same thing happening on a CoreOS system.
Nope - haven't revisited it lately, but will try again this week.
Finally got around to hunting this down; I think I've figured it out. Looks like GCE VMs have a _nifty_ alias set up for gsutil:
$ type gsutil
gsutil is aliased to `(docker images google/cloud-sdk || docker pull google/cloud-sdk) > /dev/null;docker run -t -i --net=host -v /home/<USER>/.config:/root/.config google/cloud-sdk gsutil'
In the above invocation, gsutil will run in a docker container, meaning it won't have access to the same file system hierarchy as the host system. So, it really isn't lying when it says "arg (./guix-backup) does not name a directory, bucket, or bucket subdir." -- that directory really doesn't exist within the container :)
To suggest a workaround: On my CoreOS instance, I cloned the gsutil repo, installed Python via some instructions I found here, and ran the local copy of gsutil rather than running it in a docker container -- that worked. You may want to give that a shot.
Thanks! This is easy enough to fix by creating another alias that mounts your home directory into the container. Maybe easier and more CoreOS-friendly than installing Python directly on the host.
Sounds sane :)
A major use case for using gsutil on CoreOS would be downloading some files (onto the host file system) needed to launch a container, IIUC. Given this, it doesn't seem like gsutil should default to running within a container and not having access to the host file system being referenced in the original arguments to gsutil. I've passed along this feedback to our GCE team.
For documentation's sake:
The issue being tracked with the GCE team is https://issuetracker.google.com/issues/70082703 (although visibility is limited to Google engineers only).
Most helpful comment
Sounds sane :)
A major use case for using gsutil on CoreOS would be downloading some files (onto the host file system) needed to launch a container, IIUC. Given this, it doesn't seem like gsutil should default to running within a container and not having access to the host file system being referenced in the original arguments to gsutil. I've passed along this feedback to our GCE team.