Dvc: Unable to add files with unicode pathnames

Created on 22 Sep 2019  Β·  9Comments  Β·  Source: iterative/dvc

System environment

dvc version

DVC version: 0.59.2
Python version: 2.7.10
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Cache: hardlink - True, reflink - True, symlink - True

brew info dvc

iterative/dvc/dvc: stable 0.59.2
Git for data science projects
https://dataversioncontrol.com/
/usr/local/Cellar/dvc/0.59.2 (10,442 files, 258MB) *
  Built from source on 2019-09-16 at 00:58:51
From: https://github.com/iterative/homebrew-dvc/blob/master/Formula/dvc.rb
==> Dependencies
Build: pkg-config βœ”
Required: python βœ”

Steps to reproduce

Set up

$ mkdir test_dvc_unicode_filename
$ cd test_dvc_unicode_filename
$ git init
Initialized empty Git repository in /Users/bossliaw/test_dvc_unicode_filename/.git/

$ dvc init
Adding '.dvc/lock' to '.dvc/.gitignore'.
Adding '.dvc/config.local' to '.dvc/.gitignore'.
Adding '.dvc/updater' to '.dvc/.gitignore'.
Adding '.dvc/updater.lock' to '.dvc/.gitignore'.
Adding '.dvc/state-journal' to '.dvc/.gitignore'.
Adding '.dvc/state-wal' to '.dvc/.gitignore'.
Adding '.dvc/state' to '.dvc/.gitignore'.
Adding '.dvc/cache' to '.dvc/.gitignore'.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|              https://dvc.org/doc/user-guide/analytics               |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: https://dvc.org/doc
- Get help and share ideas: https://dvc.org/chat
- Star us on GitHub: https://github.com/iterative/dvc

Create a file with unicode pathname

$ mkdir 貓
$ echo "θ³‡ζ–™η‰ˆζœ¬ζŽ§εˆΆζͺ”名測試" > 貓/貓貓貓.txt
$ git config core.quotePath false
$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   .dvc/.gitignore
        new file:   .dvc/config

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        貓/

Error when dvc add

$ dvc add 貓/
Adding '貓' to '.gitignore'.
ERROR: unexpected error - 'ascii' codec can't decode byte 0xe8 in position 42: ordinal not in range(128)

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!

After this step, all DVC commands are failing:

$ dvc status
ERROR: failed to obtain data status - 'ascii' codec can't decode byte 0xe8 in position 42: ordinal not in range(128)

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!
$ dvc destroy
ERROR: failed to obtain data status - 'ascii' codec can't decode byte 0xe8 in position 42: ordinal not in range(128)

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!

I didn't find any information about pathname encoding limitation, so I decided to report this issue.

bug c5-half-a-day p0-critical research

Most helpful comment

Indeed! We have a problem somewhere:

virtualenv -p python2 .venv
source .venv/bin/activate
pip install dvc

dvc init --no-scm
mkdir 貓
echo "θ³‡ζ–™η‰ˆζœ¬ζŽ§εˆΆζͺ”名測試" > 貓/貓貓貓.txt
dvc add 貓/
What's next?
------------
- Check out the documentation: https://dvc.org/doc
- Get help and share ideas: https://dvc.org/chat
- Star us on GitHub: https://github.com/iterative/dvc

ERROR: unexpected error - 'ascii' codec can't decode byte 0xe8 in position 18: ordinal not in range(128)

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!

@Bossliaw , I'll take a deeper look! :)

All 9 comments

Hello, @Bossliaw ! Thanks for reporting this :)

Although, I'm unable to reproduce it right away:

dvc init --no-scm
mkdir 貓
echo "θ³‡ζ–™η‰ˆζœ¬ζŽ§εˆΆζͺ”名測試" > 貓/貓貓貓.txt
dvc add 貓/
What's next?
------------
- Check out the documentation: https://dvc.org/doc
- Get help and share ideas: https://dvc.org/chat
- Star us on GitHub: https://github.com/iterative/dvc

Saving '貓' to '.dvc/cache/59/33a5fbf2c4c1f394e9d316f8ace042.dir'.
Saving information to '貓.dvc'.

dvc pipeline show --ascii 貓.dvc

+-------+
| 貓.dvc |
+-------+

What's the output of the following command on your machine?

python -c 'import locale; print(locale.getdefaultlocale())'

Mine is: ('en_US', 'UTF-8')

Ohh, my bad! I'm running with Python 3 :see_no_evil: let me try to reproduce it again.

Indeed! We have a problem somewhere:

virtualenv -p python2 .venv
source .venv/bin/activate
pip install dvc

dvc init --no-scm
mkdir 貓
echo "θ³‡ζ–™η‰ˆζœ¬ζŽ§εˆΆζͺ”名測試" > 貓/貓貓貓.txt
dvc add 貓/
What's next?
------------
- Check out the documentation: https://dvc.org/doc
- Get help and share ideas: https://dvc.org/chat
- Star us on GitHub: https://github.com/iterative/dvc

ERROR: unexpected error - 'ascii' codec can't decode byte 0xe8 in position 18: ordinal not in range(128)

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!

@Bossliaw , I'll take a deeper look! :)

Thank you @mroutis !

@Bossliaw It is strange that you are saying that you've installed dvc from brew, but as we can see python 2 is used. Our brew formula depends on python brew package, which has been pointing to python3 for quite a while. So either you have an old brew installed or you've also installed dvc into your system python, which is python2. Could you please show us the output of these commands:

brew info python
which python
which pip
pip freeze | grep dvc
which dvc

? Just to be clear, the workaround for this issue is to use dvc with python3.

Great that you pointed that out, @efiop :)

Sorry for giving my reply so late, and thanks for taking a look to this issue.

This is the output requested by @efiop:

brew info python

$ brew info python
python: stable 3.7.4 (bottled), HEAD
Interpreted, interactive, object-oriented programming language
https://www.python.org/
/usr/local/Cellar/python/3.7.4 (3,922 files, 60.8MB)
  Poured from bottle on 2019-08-10 at 19:54:59
/usr/local/Cellar/python/3.7.4_1 (3,866 files, 60MB) *
  Poured from bottle on 2019-09-16 at 00:55:54
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/python.rb
==> Dependencies
Build: pkg-config βœ”
Required: gdbm βœ”, [email protected] βœ”, readline βœ”, sqlite βœ”, xz βœ”
==> Options
--HEAD
    Install HEAD version
==> Caveats
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have been installed into
  /usr/local/opt/python/libexec/bin

If you need Homebrew's Python 2.7 run
  brew install python@2

You can install Python packages with
  pip3 install <package>
They will install into the site-package directory
  /usr/local/lib/python3.7/site-packages

See: https://docs.brew.sh/Homebrew-and-Python
==> Analytics
install: 491,028 (30 days), 1,368,470 (90 days), 4,911,054 (365 days)
install_on_request: 247,901 (30 days), 677,811 (90 days), 2,594,892 (365 days)
build_error: 0 (30 days)

The rest of commands

$ which python
/usr/local/bin/python

$ which pip
/usr/local/bin/pip

$ pip freeze | grep dvc
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.

$ which dvc
/usr/local/bin/dvc

I think I got bitten by the python2 from the Jurassic age:

$ python -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')

$ pip --version
pip 19.1.1 from /usr/local/lib/python2.7/site-packages/pip (python 2.7)

$ python --version
Python 2.7.16

$ brew info python@2
python@2: stable 2.7.16 (bottled), HEAD
Interpreted, interactive, object-oriented programming language
https://www.python.org/
/usr/local/Cellar/python@2/2.7.16_1 (3,745 files, 51.4MB) *
  Poured from bottle on 2019-09-04 at 00:01:52
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/[email protected]
==> Dependencies
Build: pkg-config βœ”
Required: gdbm βœ”, [email protected] βœ”, readline βœ”, sqlite βœ”
==> Options
--HEAD
    Install HEAD version
==> Caveats
Pip and setuptools have been installed. To update them
  pip install --upgrade pip setuptools

You can install Python packages with
  pip install <package>

They will install into the site-package directory
  /usr/local/lib/python2.7/site-packages

See: https://docs.brew.sh/Homebrew-and-Python
==> Analytics
install: 135,572 (30 days), 308,287 (90 days), 1,675,907 (365 days)
install_on_request: 43,848 (30 days), 94,154 (90 days), 433,435 (365 days)
build_error: 0 (30 days)

I got both python2 and python3 installed !!

The reason why I got both installed:

$ brew uses --installed python@2
gdal                             numpy                            opencv

Long time ago, I was confused by the messy Python environment on OS X.

The official Homebrew has a note for it:

Python 3.x or Python 2.x

Homebrew provides one formula for Python 3.x (python) and another for Python 2.7.x (python@2).

The executables are organized as follows so that Python 2 and Python 3 can both be installed without conflict:

  • python3 points to Homebrew's Python 3.x (if installed)
  • python2 points to Homebrew's Python 2.7.x (if installed)
  • python points to Homebrew's Python 2.7.x (if installed) otherwise the macOS system Python. Check out brew info python if you wish to add Homebrew's 3.x python to your PATH.
  • pip3 points to Homebrew's Python 3.x's pip (if installed)
  • pip and pip2 point to Homebrew's Python 2.7.x's pip (if installed)

(Wondering which one to choose?)

To summarize, the default Python path /usr/local/bin/python is python2 for both installed situation.

Hope that helps !

In the mean while, I might use pip3 to install dvc instead ...

It does, @Bossliaw , thanks a lot!

For the record:

ERROR: unexpected error - 'ascii' codec can't decode byte 0xe8 in position 27: ordinal not in range(128)                                       
------------------------------------------------------------                                                                                   
Traceback (most recent call last):                                                                                                             
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/main.py", line 40, in main                    
    ret = cmd.run_cmd()                                                                                                                        
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/command/base.py", line 63, in run_cmd         
    return self.run()                                                                                                                          
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/command/add.py", line 24, in run              
    fname=self.args.file,                                                                                                                      
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/repo/scm_context.py", line 4, in run          
    result = method(repo, *args, **kw)                                                                                                         
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/repo/add.py", line 34, in add                 
    stages = _create_stages(repo, targets, fname, no_commit)                                                                                   
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/repo/add.py", line 66, in _create_stages      
    stage.save()                                                                                                                               
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/stage.py", line 711, in save                  
    out.save()                                                                                                                                 
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/output/base.py", line 234, in save            
    if not self.changed():                                                                                                                     
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/output/base.py", line 190, in changed         
    status = self.status()                                                                                                                     
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/output/base.py", line 181, in status          
    if self.changed_checksum():                                                                                                                
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/output/base.py", line 163, in changed_checksum
    != self.remote.save_info(self.path_info)[                                                                                                  
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/remote/base.py", line 302, in save_info       
    return {self.PARAM_CHECKSUM: self.get_checksum(path_info)}                                                                                 
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/remote/base.py", line 275, in get_checksum    
    checksum = self.state.get(path_info)                                                                                                       
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/state.py", line 404, in get                   
    path, self.repo.dvcignore                                                                                                                  
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/funcy/objects.py", line 28, in __get__            
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)                                                                          
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/repo/__init__.py", line 473, in dvcignore     
    return DvcIgnoreFilter(self.root_dir)                                                                                                      
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/ignore.py", line 69, in __init__              
    self._update(os.path.join(root, d))                                                                                                        
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/local/lib/python2.7/site-packages/dvc/ignore.py", line 72, in _update               
    ignore_file_path = os.path.join(dirname, DvcIgnore.DVCIGNORE_FILE)                                                                         
  File "/home/efiop/.pyenv/versions/2.7.15/envs/2.7.15-dvc/lib/python2.7/posixpath.py", line 73, in join                                       
    path += '/' + b                                                                                                                            
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 27: ordinal not in range(128)                                             
------------------------------------------------------------                                                                                   

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!                                                       

Thanks @Bossliaw ! That makes sense now :) The main issue with py2 is on our side anyway, so we'll fix it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jorgeorpinel picture jorgeorpinel  Β·  45Comments

JoeyCarson picture JoeyCarson  Β·  53Comments

yukw777 picture yukw777  Β·  45Comments

luchoPipe87 picture luchoPipe87  Β·  69Comments

dmpetrov picture dmpetrov  Β·  64Comments