Dvc: logging breaks in PowerShell

Created on 9 Oct 2019  路  19Comments  路  Source: iterative/dvc

DVC version: 0.61.2 as installed by dvc-0.61.2.exe on Windows 10.

When I type dvc add "Training Set" in a Powershell Window I get the error report below.

The Training Set directory is very large (8.22GB, 14186 files). Is this too large for the 0.61.2 version? Suggestions?

Thanks,

Larry

PS C:\Projects-Git\MasuTrainingAutoFocusTraining> dvc add "Training Set"
Computing md5 for a large number of files. This is only done once.
--- Logging error ---
Traceback (most recent call last):
File "dvc\logger.py", line 158, in emit
File "site-packages\tqdm\std.py", line 569, in write
File "contextlib.py", line 119, in __exit__
File "site-packages\tqdm\std.py", line 597, in external_write_mode
File "site-packages\tqdm\std.py", line 94, in release
AssertionError: attempt to release recursive lock not owned by thread
Call stack:
File "dvc__main__.py", line 10, in
File "dvc\main.py", line 42, in main
File "dvccommandadd.py", line 24, in run
File "dvc\repo__init__.py", line 33, in wrapper
File "dvc\repo\scm_context.py", line 4, in run
File "dvc\repoadd.py", line 45, in add
File "dvc\stage.py", line 711, in save
File "dvcoutputbase.py", line 243, in save
File "dvcoutputbase.py", line 199, in changed
File "dvcoutputbase.py", line 190, in status
File "dvcoutputbase.py", line 172, in changed_checksum
File "dvc\remotebase.py", line 329, in save_info
File "dvc\remotebase.py", line 318, in get_checksum
File "dvc\remotebase.py", line 234, in get_dir_checksum
File "dvc\remotebase.py", line 210, in _collect_dir
File "dvc\remotebase.py", line 187, in _calculate_checksums
File "logging__init__.py", line 1378, in info
File "logging__init__.py", line 1514, in _log
File "logging__init__.py", line 1524, in handle
File "logging__init__.py", line 1586, in callHandlers
File "logging__init__.py", line 894, in handle
File "dvc\logger.py", line 164, in emit
Message: 'Computing md5 for a large number of files. This is only done once.'
Arguments: ()
ERROR: unexpected error - failed to log

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

bug p1-important research

Most helpful comment

Lol when you try to debug an issue and can't reproduce it because you accidentally just fixed it...

On one hand I've wasted time. On the other hand, I don't have to waste more time!

Definitely in the "?" region because I ~have no idea how I got there~ am awesome

All 19 comments

Hi @LarryThermo !

Looks like a bug in our progress/logging subsystem. We'll take a look and will try to reproduce it soon.

In the meantime, could you try running that command from a regular cmd terminal, just to check if that one is also affected? Also, have you considered using terminal that comes with git? Or are you usually using PowerShell?

@efiop,

Thanks for the quick response. Its working from a cmd terminal, yea.

I also tried it from git bash as brought up from a Windows explorer context menu. Unfortunately git bash gives a "bash: dvc: command not found" error, so it is apparently not honoring my path variable which does have "C:\Program Files (x86)\Data Version Control" as its last entry.

I don't normally use Powershell but instead prefer TortoiseGit.

@LarryThermo Sorry for the delay. Let me clarify, so it doesn't work from Powershell but works fine from a regular windows cmd terminal, right?

I also tried it from git bash as brought up from a Windows explorer context menu. Unfortunately git bash gives a "bash: dvc: command not found" error, so it is apparently not honoring my path variable which does have "C:\Program Files (x86)\Data Version Control" as its last entry.

That is right. You could try calling it directly, though, to see if it works and then adjust PATH in it, if you decide to use it in the meantime.

I don't normally use Powershell but instead prefer TortoiseGit.

I'm not familiar with TortoiseGit, is it a GUI wrapper? Are you using it with some IDE?

Hi @efiop,

Yes, it did not work from powershell due to the logging error. But it did work with no logging error from the cmd prompt. It also did not work initially in either Git Bash or WSL linux due to pathing problems with finding dvc.exe when "dvc" is typed at the prompt.

I did discover that if I created a symbolic link inside C:/Program Files (x86)/Data Version Control both Git Bash and WSL linux started working. Precisely I typed "mklink dvc dvc.exe" inside a Powershell window with administrative permission. This also enabled the "dvs install" extension to start working.

TortoiseGit is a GUI wrapper. Many on our team use it, some prefer the command line. Others use the AtlAssian SourceTree product.

@LarryThermo

@LarryThermo Thanks for clarifying! :slightly_smiling_face: As a workaround, consider using git bash or wsl, if you are used to *nix workflow. We'll try to reproduce this issue in powershell.

For the record, this only breaks with binary installs. pip works fine. If you are expriencing this, try uninstalling the binary package and installing from pip or conda.

@casperdcl Looks like this is tqdm again:

Traceback (most recent call last):
File "dvc\logger.py", line 158, in emit
File "site-packages\tqdm\std.py", line 569, in write
File "contextlib.py", line 119, in exit
File "site-packages\tqdm\std.py", line 597, in external_write_mode
File "site-packages\tqdm\std.py", line 94, in release
AssertionError: attempt to release recursive lock not owned by thread

maybe you have any ideas what's wrong? 馃檪

@casperdcl To be clear, this only reproduces with our pyinstaller-built binary, so it is definitely something on their side, I'm just wondering if you've seen this previously.

@efiop yes looks like some odd implementation of threading by pyinstaller. I've seen issues like AWS Lambda and Jython not implementing multiprocessing properly. Perhaps we could catch and ignore the error in this edge case?

@casperdcl Actually thinking about dropping pyinstaller-built binaries, since they bring quite a bit of hustle. I will take a closer look once more soon. Thanks! 馃檪

Hello, I'm also getting this on my installation in Ubuntu 19.04.

DVC installation information

$ dvc version
CEST
DVC version: 0.63.3
Python version: 3.7.3
Platform: Linux-5.0.0-31-generic-x86_64-with-debian-buster-sid
Binary: False
Cache: reflink - False, hardlink - True, symlink - True

I installed DVC using PIP, but my tqdm installation is:

$ conda list tqdm
tqdm                      4.36.1                     py_0    conda-forge

error message

$ dvc add datasets/soccer-teams-shirts-santa-catarina/figueirense/
Mo 21 Okt 2019 22:18:07 CEST
Computing md5 for a large number of files. This is only done once.
--- Logging error ---
Traceback (most recent call last):
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/logger.py", line 158, in emit
    msg, file=self.stream, end=getattr(self, "terminator", "\n")
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/tqdm/std.py", line 569, in write
    fp.write(end)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/contextlib.py", line 119, in __exit__
    next(self.gen)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/tqdm/std.py", line 597, in external_write_mode
    cls._lock.release()
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/tqdm/std.py", line 94, in release
    lock.release()
AssertionError: attempt to release recursive lock not owned by thread
Call stack:
  File "/home/edisongustavo/miniconda3/envs/fastai/bin/dvc", line 10, in <module>
    sys.exit(main())
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/main.py", line 43, in main
    ret = cmd.run()
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/command/add.py", line 24, in run
    fname=self.args.file,
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/repo/__init__.py", line 33, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/repo/add.py", line 44, in add
    stage.save()
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/stage.py", line 711, in save
    out.save()
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/output/base.py", line 243, in save
    if not self.changed():
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/output/base.py", line 199, in changed
    status = self.status()
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/output/base.py", line 190, in status
    if self.changed_checksum():
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/output/base.py", line 172, in changed_checksum
    != self.remote.save_info(self.path_info)[
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/remote/base.py", line 326, in save_info
    return {self.PARAM_CHECKSUM: self.get_checksum(path_info)}
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/remote/base.py", line 315, in get_checksum
    checksum = self.get_dir_checksum(path_info)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/remote/base.py", line 231, in get_dir_checksum
    dir_info = self._collect_dir(path_info)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/remote/base.py", line 207, in _collect_dir
    new_checksums = self._calculate_checksums(not_in_state)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/remote/base.py", line 184, in _calculate_checksums
    "Computing md5 for a large number of files. "
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/logging/__init__.py", line 1383, in info
    self._log(INFO, msg, args, **kwargs)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/logging/__init__.py", line 1519, in _log
    self.handle(record)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/logging/__init__.py", line 1529, in handle
    self.callHandlers(record)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/logging/__init__.py", line 1591, in callHandlers
    hdlr.handle(record)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/logging/__init__.py", line 905, in handle
    self.emit(record)
  File "/home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/logger.py", line 164, in emit
    self.handleError(record)
Message: 'Computing md5 for a large number of files. This is only done once.'
Arguments: ()
ERROR: unexpected error - failed to log <LogRecord: dvc.remote.base, 20, /home/edisongustavo/miniconda3/envs/fastai/lib/python3.7/site-packages/dvc/remote/base.py, 184, "Computing md5 for a large number of files. This is only done once.">

hmm this looks like it's a logging handler error. Perhaps the handler is using a lock while the thread has finished? the problem would be

https://github.com/iterative/dvc/blob/22742618081c6b8bf8ac672459dc59654f2d15f0/dvc/logger.py#L157-L159

which internally uses Tqdm.get_lock().

is there any way we can insist on a thread not dying before the log messages get flushed?

@casperdcl So sorry for such a late response. Didn't dive into it yet, maybe you could re-assign it and take a closer look, please?

I wonder, is this still an issue after the latest release? We've changed the locking mechanism a bit...

@casperdcl Indeed, fixed now! Thank you! :tada:

@LarryThermo @edisongustavo Guys, please try 0.67.1 (you can find the exe installer here https://github.com/iterative/dvc/releases/tag/0.67.1 ) and let us know if you are still experiencing that issue. Thanks for the feedback! :slightly_smiling_face:

Lol when you try to debug an issue and can't reproduce it because you accidentally just fixed it...

On one hand I've wasted time. On the other hand, I don't have to waste more time!

Definitely in the "?" region because I ~have no idea how I got there~ am awesome

It looks like it is fixed. I've added a large dataset with dvc add and had no issues.

This is my version:

```
$ dvc version
DVC version: 0.68.1
Python version: 3.7.5
Platform: Linux-5.3.0-21-generic-x86_64-with-Ubuntu-19.10-eoan
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - True

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danfischetti picture danfischetti  路  41Comments

Casyfill picture Casyfill  路  56Comments

ynop picture ynop  路  41Comments

dmpetrov picture dmpetrov  路  64Comments

kevin-hanselman picture kevin-hanselman  路  37Comments