Dvc: ui / logger: rethink verbosity

Created on 27 Jul 2019  路  8Comments  路  Source: iterative/dvc

Here's an old idea that I wanted to experiment with but never made time for it :sweat_smile: , now feel like sharing and spark some discussion :)

The output of some commands _narrate_ what is happening underneath, this is awesome for someone curious that wants to take a peek behind the curtain, but might result tiring such verbosity when you already know (or just don't care) what the command did. Let's take dvc add for example:

$ dvc add hello

Saving 'hello' to '.dvc/cache/b1/946ac92492d2347c6235b4d2611184'.
Saving information to 'hello.dvc'.

My suggestion would be to create different verbose levels, -vvv (INFO, DEBUG, DATABASE?) and make the output of the first level serve the purpose to know what's going on:

==> Adding 'hello' file...
  -> computing checksum: b1946ac92492d2347c6235b4d261118
  -> moving file to cache '.dvc/cache/b1/946ac92492d2347c6235b4d261118'
  -> linking '.dvc/cache/b1/946ac92492d2347c6235b4d261118' to 'hello'
  -> generate stage file 'hello.dvc'

There are different levels to group operations:

:: H1
==> H2
  -> Content
  ---> SQL queries in case that we need those? Maybe information about inodes?

>> prompt

Creating some flexibility on how to organize the output so it make sense

:: Add (command / operation)
==> foo (file / target)
  -> Stuff about adding foo
==> bar
  -> Stuff about adding bar

Arrows might be a little bit confusing but if you add some color they look really great:
(the idea is taken from the way pacman/yay organizes its output).

Notice the bold on the headers that represent operations occurring during an update:
2019-07-27-13:52:39

Also notice that they are not masking the outputs of the download, probably was a curl)
2019-07-27-13:57:40

We can keep the color code for warnings/errors:
2019-07-27-14:01:27.

This can integrate very well with progress bar.

Possible implementation

  • Keep track of the indentation level on the Logger itself
  • Modify the -v so you can express the level of verbosity with several ones -vvv

The cool part of it is that it would make it clearer (even for the ones working with the code) to inspect what's going on when running a command, the downside is that we will need to organize the code in such way that these operations are possible :sweat_smile:

ui

Most helpful comment

We could use -vv for verbose-verbose logs, kinda like many other tools do (e.g. pytest). And so state and other noisy debug messages would be hidden under it.

All 8 comments

The current --verbose output for an add is flood with SQL statements without any context of where are they ocurring:

DEBUG: Trying to spawn '['/home/mroutis/src/iterative/dvc/.venv/bin/python', '-m', 'dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['/home/mroutis/src/iterative/dvc/.venv/bin/python', '-m', 'dvc', 'daemon', '-q', 'updater']'
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Skipping copying for '/home/mroutis/tmp/hello', since it is not a symlink or a hardlink.
DEBUG: cache '.dvc/cache/b1/946ac92492d2347c6235b4d2611184' expected 'b1946ac92492d2347c6235b4d2611184' actual 'None'
WARNING: Output 'hello' of 'hello.dvc' changed because it is 'not in cache'
DEBUG: Path hello inode 806523223
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: Path hello inode 806523223
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: {'hello': 'modified'}
DEBUG: Path hello inode 806523223
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1564263056952132096', '6', 'b1946ac92492d2347c6235b4d2611184', '1564263071017960960')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: Computed stage 'hello.dvc' md5: 'cb2b804ef4e1b8ef1eb47cadab2a0be5'
DEBUG: cache '.dvc/cache/b1/946ac92492d2347c6235b4d2611184' expected 'b1946ac92492d2347c6235b4d2611184' actual 'None'
Saving 'hello' to '.dvc/cache/b1/946ac92492d2347c6235b4d2611184'.
DEBUG: cache '.dvc/cache/b1/946ac92492d2347c6235b4d2611184' expected 'b1946ac92492d2347c6235b4d2611184' actual 'None'
DEBUG: Created 'reflink': .dvc/cache/b1/946ac92492d2347c6235b4d2611184 -> hello
DEBUG: Path hello inode 806461538
DEBUG: REPLACE INTO link_state(path, inode, mtime) VALUES (?, ?, ?)
DEBUG: Path hello inode 806461538
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: Path .dvc/cache/b1/946ac92492d2347c6235b4d2611184 inode 806523223
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1564263056952132096', '6', 'b1946ac92492d2347c6235b4d2611184', '1564263071018625536')]
DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
Saving information to 'hello.dvc'.

Maybe we can prepend the SQL statements with a [state] message; I would question the usefulness of knowing the statements itself instead of the file information, for example, I would find more easy to parse something like the following:

==> Adding 'hello' file...
  -> computing file statistics:
      -> checksum: b1946ac92492d2347c6235b4d261118
      -> inode: 806523223
      -> mtime: xxx
      -> ctime: xxx
  -> querying state database:
      -> No result.
  -> moving file to cache '.dvc/cache/b1/946ac92492d2347c6235b4d261118'
  -> linking '.dvc/cache/b1/946ac92492d2347c6235b4d261118' to 'hello'
  -> generate stage file 'hello.dvc'

If we need to know the user_version I would add it to the version command instead

@mroutis Good points! But it is important to note that much of is debug output(or should be moved to debug, dvc is currently too verbose), for which there is no need in introducing so much complexity for it, it would be extremely hard to write a code for that. Also, there is no reason to do stuff like

  -> computing file statistics:
      -> checksum: b1946ac92492d2347c6235b4d261118
      -> inode: 806523223
      -> mtime: xxx
      -> ctime: xxx

in debug messages, since it only bloats them. Putting all of those in a single line is not making it too long :) What we should do with state is to introduce a separate logger for it and disable by default, so it could only be enabled with something like DVC_STATE_DEBUG=1 env var.

Agree, @efiop , I think a good first step could be going through all the commands, making notes about which output should be treated as debug, info, warning, or error.

As for the complexity of introducing such change (the one with the suggested syntax) would turn out to be quite problematic:

the downside is that we will need to organize the code in such way that these operations are possible

I like the idea of disabling the state database by default :+1: (created an issue for it ^ https://github.com/iterative/dvc/issues/2332)

(cc: @shcheklein , talked offline about this one, what's your take?))

I tend to agree with @efiop (at least the way I understand hit point). I think it's too early (and unnecessary) for us to have multiple levels of verbosity, especially considering that we already have a few of those INFO, DEBUG, etc.

To actually solve the issue, I think we should think and redesign the way we handle commands UI (output) in the first place. Ideally, it should be a separate layer, probably not logger-based, etc. This should help us isolate presentation logic (UI) for the commands and think about it separately from the levels of loggers. While in debug mode you can keep all these different levels and use them to adjust level of verbosity is needed.

We could use -vv for verbose-verbose logs, kinda like many other tools do (e.g. pytest). And so state and other noisy debug messages would be hidden under it.

@shcheklein @efiop has this been reasonably fixed by #3806?

@casperdcl Yes, let's close it for now. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dnabanita7 picture dnabanita7  路  3Comments

gregfriedland picture gregfriedland  路  3Comments

ghost picture ghost  路  3Comments

shcheklein picture shcheklein  路  3Comments

siddygups picture siddygups  路  3Comments