Dvc: HDFS command used is hardcoded and not compatible with newer versions

Created on 29 Jun 2019  路  6Comments  路  Source: iterative/dvc

hdfs fs -<command> is deprecated, however this command is hardcoded here: https://github.com/iterative/dvc/blob/832b834b65ec2665ece0e8d1cf756f8ed460441b/dvc/remote/hdfs.py#L37

I am running hadoop 2.7 on which hdfs fs is not supported. This stops me from being able to deploy a dvc pipeline, unless i work around it by mapping every hdfs fs to hdfs dfs command.

DVC: 0.50.1
Installation: apt-get
Platform:
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Hadoop version: 2.7.0

I was able to fix the issue by reverting my hadoop installation to v. 1.2.1.

But this is really annoying for users that use an hadoop version that does not support hadoop fs.

bug p1-important

All 6 comments

Hi @falcowinkler !

Does changing that to hdfs dfs - work for you as expected? You could try that by forking dvc repo, cloning it, adjusting and installing it on your machine. hdfs seems to have been available for a long time already (e.g. at least since 1.2), so we could just adjust to hdfs dfs -. Would you be so kind to contribute a PR with that change, please? :slightly_smiling_face:

For the record: related to https://github.com/iterative/dvc/issues/1629 , but it is much easier to adjust the command for now than re-implement HDFS driver with webhdfs.

@efiop i found that hadoop fs is actually supposed to work on 2.7.0. Was a local issue. hadoop fs should be fine.

if you want to change the command for other reasons, i can still create the pr, i tested it in various versions and hdfs dfs works just fine as well.

@falcowinkler Glad it worked for you! Does using hadoop fs show the deprecation warning? If so, hdfs dfs would still be a useful small patch, which we would be extremely happy to accept :slightly_smiling_face:

it doesn't show a deprecation warning and i believe hadoop fs is available in hadoop 1.x while hdfs dfs is not. therefore i would advise, we keep the command as it is

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shcheklein picture shcheklein  路  3Comments

ghost picture ghost  路  3Comments

ghost picture ghost  路  3Comments

prihoda picture prihoda  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments