Trying to run the command
dvc run -d s3://mybucket/data.txt -o s3://mybucket/data_copy.txt aws s3 cp s3://mybucket/data.txt s3://mybucket/data_copy.txt
Using AWS S3 remote cache.
It fails with the error message ERROR: unexpected error - Parameter validation failed: Invalid length for parameter Key, value: 0, valid range: 1-inf
Attached is the log with -v
turned on.
and out of pip freeze
@helger Could you also show us your $ dvc version
output, please?
@helger Could you also show us your
$ dvc version
output, please?
dvc version
DVC version: 0.90.2
Python version: 3.7.4
Platform: Linux-4.4.0-1104-aws-x86_64-with-debian-stretch-sid
Binary: False
Package: pip
Supported remotes: azure, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (workspace): ('ext4', '/dev/xvda1')
@helger Could you also show us your
$ dvc version
output, please?dvc version DVC version: 0.90.2 Python version: 3.7.4 Platform: Linux-4.4.0-1104-aws-x86_64-with-debian-stretch-sid Binary: False Package: pip Supported remotes: azure, gs, hdfs, http, https, s3, ssh, oss Filesystem type (workspace): ('ext4', '/dev/xvda1')
Output of pip freeze
This happens because we try to create parent when trying to link from cache to workspace (and, here, parent of s3://<bucket>/<name>
is ""
, i.e. empty).
https://github.com/iterative/dvc/blob/ee26afe65e8f9ebf11d7236fd3ee16a47f9b4fc6/dvc/remote/base.py#L380-L385
If the file is in s3://<bucket>/<namespaced>/<name>
, this won't fail.
I don't think, we need to create parent directories for object-based storages except when checking out directory.
Google Cloud Storage should equally be affected here as it's the same logic.
@skshetry indeed adding another level fixes the issue
@skshetry Great catch! :pray: In general we do need to create those, because s3 and s3 tools sometimes create dirs like that, it was discussed in https://github.com/iterative/dvc/pull/2683 . We don't need-need to do that, but it is a compromise to handle both approaches. But we definitely need to handle this corner case better.
@efiop, it depends on how you see it. We can handle this corner case, but, there's no need for us to create a directory in most of the cases (except of #2678 cases?).
i.e. dvc add s3://dvc-bucket/data/file
should not even try to create s3://dvc-bucket/data/
as a directory.
i.e. dvc add s3://dvc-bucket/data/file should not even try to create s3://dvc-bucket/data/ as a directory.
@skshetry It does that as a part of linking, so it is fine. What is actually wrong here is that corner case handling, indeed.
Most helpful comment
This happens because we try to create parent when trying to link from cache to workspace (and, here, parent of
s3://<bucket>/<name>
is""
, i.e. empty).https://github.com/iterative/dvc/blob/ee26afe65e8f9ebf11d7236fd3ee16a47f9b4fc6/dvc/remote/base.py#L380-L385
If the file is in
s3://<bucket>/<namespaced>/<name>
, this won't fail.I don't think, we need to create parent directories for object-based storages except when checking out directory.
Google Cloud Storage should equally be affected here as it's the same logic.