For example
gsutil -m mv gs://bucket/dir/subdir1 gs://bucket/dir/subdir2
sometimes correctly renames to gs://bucket/dir/subdir2
, but other times to gs://bucket/dir/subdir2/subdir1
Coulnd't pinpoint exactly conditions, but it seems it has nothing to do with trailing slashes. In any case there is nothing about such bahaviour in the docs.
If there's an existing subdirectory called gs://buciket/dir/subdir2 before you run that mv command it will put subdir1 under subdir2. That is correct behavior - it emulates similar behavior of Unix directory renames. Please see https://cloud.google.com/storage/docs/gsutil/commands/cp#how-names-are-constructed for more details.
@mfschwartz I understand but this was not the case. I tried it again just to make sure.
$ mkdir dir
$ touch dir/file
$ gsutil cp -r dir gs://bucket/
$ gsutil mv gs://bucket/dir gs://bucket/dir2
$ gsutil ls gs://bucket/dir2
gs://bucket/dir2/dir/
unfortunately, it's seems to be hard to reproduce though, I ran this once and it misbehaved, but then 3 times it worked as expected (with different names every time)
gsutil version: 4.19
Did the gsutil mv command fail partway through (or get interrupted, e.g., by ^C) and then you restarted, by chance? Please see https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork#potential-for-surprising-destination-subdirectory-naming
@mfschwartz no, action was successfully completed every time... But thanks for the link, makes things a little bit more clear. Will use rsync from now on.
Ok. If you can provide a way to reproduce the problem you saw please let us know.
@mfschwartz Please reopen, here's how to reproduce 100% of the time.
Basically when you rename a folder to something else, then you rename it back to the original name (which should not exist), it treats the original name as "already exists", therefore creating a subdirectory inside the incorrectly "existing" folder.
⟫ gsutil --version
gsutil version: 4.19
ceefour@cron:~⟫ mkdir test
ceefour@cron:~⟫ echo hello > test/hello
ceefour@cron:~⟫ gsutil cp -r test gs://snapshot.bippo.co.id/
Copying file://test/hello [Content-Type=application/octet-stream]...
Uploading gs://snapshot.bippo.co.id/test/hello: 6 B/6 B
ceefour@cron:~⟫ gsutil ls -r gs://snapshot.bippo.co.id/test
gs://snapshot.bippo.co.id/test/:
gs://snapshot.bippo.co.id/test/hello
ceefour@cron:~⟫ gsutil mv gs://snapshot.bippo.co.id/test gs://snapshot.bippo.co.id/test2
Copying gs://snapshot.bippo.co.id/test/hello [Content-Type=application/octet-stream]...
Copying gs://snapshot.bippo.co.id/test2/hello: 6 B/6 B
Removing gs://snapshot.bippo.co.id/test/hello...
ceefour@cron:~⟫ gsutil ls -r gs://snapshot.bippo.co.id/test2
gs://snapshot.bippo.co.id/test2/:
gs://snapshot.bippo.co.id/test2/hello
ceefour@cron:~⟫ gsutil mv gs://snapshot.bippo.co.id/test2 gs://snapshot.bippo.co.id/test
Copying gs://snapshot.bippo.co.id/test2/hello [Content-Type=application/octet-stream]...
Copying gs://snapshot.bippo.co.id/test/test2/hello: 6 B/6 B
Removing gs://snapshot.bippo.co.id/test2/hello...
ceefour@cron:~⟫ gsutil ls -r gs://snapshot.bippo.co.id/test
gs://snapshot.bippo.co.id/test/:
gs://snapshot.bippo.co.id/test/test2/:
gs://snapshot.bippo.co.id/test/test2/hello
Reopened. It's possible there is a bug that causes this. However, it's unlikely we can come up with a bulletproof fix, as due to eventual listing consistency it is possible that test/hello
will appear in the listing even after it has been deleted.
@thobrla Thanks for acknowledging this. I think the way to solve this is to not use listing at all. Note that my gsutil mv
command is completely within the GS cloud, no inter-cloud or local.
When renaming test2
to test
, instead of listing the bucket or parent directory, it can check the presence of test
directly, which I believe is consistent (rather than the eventually consistent listing operation). Done completely in-cloud, this should be as efficient as listing approach.
One might ask why would anyone want to rename to a just-deleted directory. We do it for backup rotation:
daily_next
daily_prev
daily
to daily_prev
(which was just deleted)daily_next
to daily
(which, again, just deleted in step 3)While the alternative is to use dated directories, we prefer this approach.
Checking for the presence of the prefix test
is not strongly consistent. Reading a single object is strongly consistent, but there is no object named test
in your use case.
Hello
We faced this issue today, and found out that this bug exists. Any news on this ?
GCS listing has since become strongly consistent, so I confirmed there is a bug here (using similar steps to @ceefour). @houglum can comment on the likelihood of the bug being fixed soon.
I've been unable to reproduce this with gsutil 4.28, running the script below:
#!/bin/bash
[ -z "$BUCKET" ] && echo "Need to set the BUCKET env var first!" && exit 1;
SCRIPT_DIR="$(dirname "$0")"
SCRIPT_DIR="$( (cd "$SCRIPT_DIR" && pwd) )"
# Create bucket if it doesn't already exist.
gsutil mb gs://${BUCKET}
# In case bucket already existed, remove all contents.
gsutil -m rm -r gs://${BUCKET}/*
# Create local file under "dir1" directory.
mkdir -p "${SCRIPT_DIR}/dir1"
touch "${SCRIPT_DIR}/dir1/file"
for i in {1..50}; do
# Move dir1 "directory" into bucket.
gsutil cp -r "${SCRIPT_DIR}/dir1" gs://${BUCKET}/
# "dir1" prefix is gone after `mv` -- only "dir2" prefix should exist.
echo "Moving dir1/ to dir2/"
debug_output="$(gsutil -D mv gs://${BUCKET}/dir1 gs://${BUCKET}/dir2 2>&1)"
output="$(gsutil ls gs://${BUCKET}/dir2)"
if [[ "$output" =~ "dir2/dir1/" ]]; then
echo "Reproduced the error! Output of ls:"
echo "$output"
echo
echo "Debug output:"
echo "$debug_output"
# Stop and leave bucket state as-is.
exit
fi
# Move objects starting with "dir2/" back to "dir1/". "dir2" prefix should be
# gone after `mv`.
echo "Moving dir2/ to dir1/"
debug_output="$(gsutil -D mv gs://${BUCKET}/dir2 gs://${BUCKET}/dir1 2>&1)"
output="$(gsutil ls gs://${BUCKET}/dir1)"
if [[ "$output" =~ "dir1/dir2/" ]]; then
echo "Reproduced the error! Output of ls:"
echo "$output"
echo
echo "Debug output:"
echo "$debug_output"
# Stop and leave bucket state as-is.
exit
fi
# Clean up "dir1" for next iteration.
gsutil -m rm -r gs://${BUCKET}/dir1
done
# Cleanup
gsutil -m rm -r gs://${BUCKET}
@thobrla pointed out that this is reproducible if one prefix is a substring of the other. I changed "dir1" to "dir" (a substring of "dir2") in my example above and got an instant repro.
Changed the name of this issue to clarify under which conditions it happens.
There was also a duplicate report of this where a user arrived at the same conclusion in https://issuetracker.google.com/issues/112817360
I see that this ticket is still open. I run into this issue on a daily basis. Is there any progress to report?
I'm working on this now. I've tracked down the issue, and if all goes well, I expect to have a fix committed by next week.
This is fixed in gsutil v4.43, which is now available in the pypi repo (https://pypi.org/project/gsutil/). We missed the cutoff for this week's Cloud SDK, but it should be in 265.0.0, scheduled for Tues, Oct 1.
still run into this issue with the gsutil version 4.53
@al-dann It's working as expected for me. Please provide steps to reproduce if this is still an issue.
Most helpful comment
@mfschwartz I understand but this was not the case. I tried it again just to make sure.
unfortunately, it's seems to be hard to reproduce though, I ran this once and it misbehaved, but then 3 times it worked as expected (with different names every time)
gsutil version: 4.19