dvc: split logger and stdout/stderr

Created on 25 Apr 2019  路  8Comments  路  Source: iterative/dvc

This is weird in itself using logging to communicate with user, this can also cause issues. When we manipulate log level we also change the tool output, which can lead to surprising behavior.

p3-nice-to-have refactoring

Most helpful comment

Use print() to communicate to user, logger for logging ) One can propose using two loggers, but this has some of the same issues.

I can't see any upsides of using logging for user communication, do you have several levels of messages? Do you need centrally planned orchestration of what goes where?

All 8 comments

@Suor , there was a refactor a few weeks ago #1753 it can give you a little bit more context of why we decided to implement it that way. What are your suggestion, tho?

Use print() to communicate to user, logger for logging ) One can propose using two loggers, but this has some of the same issues.

I can't see any upsides of using logging for user communication, do you have several levels of messages? Do you need centrally planned orchestration of what goes where?

I'm not sure that the raw print is the only possible solution. I would take a look how other CLI tools deal with it - npm, hg for example.

But this thing has been bothering me a lot. It's weird indeed to use loggers to print progress bars and all other user output.

@Suor , there's a when to use logging section in the Python documentation:

Logging provides a set of convenience functions for simple logging usage. These
are :func:`debug`, :func:`info`, :func:`warning`, :func:`error` and
:func:`critical`. To determine when to use logging, see the table below, which
states, for each of a set of common tasks, the best tool to use for it.

+-------------------------------------+--------------------------------------+
| Task you want to perform            | The best tool for the task           |
+=====================================+======================================+
| Display console output for ordinary | :func:`print`                        |
| usage of a command line script or   |                                      |
| program                             |                                      |
+-------------------------------------+--------------------------------------+
| Report events that occur during     | :func:`logging.info` (or             |
| normal operation of a program (e.g. | :func:`logging.debug` for very       |
| for status monitoring or fault      | detailed output for diagnostic       |
| investigation)                      | purposes)                            |
+-------------------------------------+--------------------------------------+
| Issue a warning regarding a         | :func:`warnings.warn` in library     |
| particular runtime event            | code if the issue is avoidable and   |
|                                     | the client application should be     |
|                                     | modified to eliminate the warning    |
|                                     |                                      |
|                                     | :func:`logging.warning` if there is  |
|                                     | nothing the client application can do|
|                                     | about the situation, but the event   |
|                                     | should still be noted                |
+-------------------------------------+--------------------------------------+
| Report an error regarding a         | Raise an exception                   |
| particular runtime event            |                                      |
+-------------------------------------+--------------------------------------+
| Report suppression of an error      | :func:`logging.error`,               |
| without raising an exception (e.g.  | :func:`logging.exception` or         |
| error handler in a long-running     | :func:`logging.critical` as          |
| server process)                     | appropriate for the specific error   |
|                                     | and application domain               |
+-------------------------------------+--------------------------------------+

I don't think that it is _that_ weird to use logging to communicate with user.

I can't see any upsides of using logging for user communication

Actually, we are currently benefiting from using logging instead of print by not flooding the code with if not quiet: print(); all the custom logic is easy to find because it is located on dvc/logger.py. We have a custom formatter for exceptions and it is adapted to co-exist with our progress bar implementation.

Do you have several levels of messages?

Currently using info for reporting what's going on and things that the user should know (like suggesting optimizations or letting the user know that their version of DVC is outdated)

Do you need centrally planned orchestration of what goes where?

Could you elaborate more, I don't get what you mean by _centrally planned orchestration_.
If you mean having different logger instances for each module, then it is a nice thing to have.
A lot of the noise when you use dvc --verbose is produced by SQL operations triggered on the state module, we could introduce another verbosity level, like -vv when we need them or maybe another formatter that make sense for logging SQL operations, I don't know.

But this thing has been bothering me a lot. It's weird indeed to use loggers to print progress bars and all other user output.

@shcheklein , what has been bothering you exactly?

The logger is really flexible, you have the notion of handlers and formatters that are really useful, we are currently only using the standard stream handler for stdout and stderr.

@mroutis from docs you quoting - we should use print :)

I see the argument with quiet/verbosity, we can make our wrapper to not use ifs everywhere.

The practical reason why I am against it is that I lost significant amount of time when I removed logLevel("DEBUG") in tests and got failed them occasionally because some message to the user hadn't been printed by logger.info(). Sure that could be fixed, but this shows the entanglement that shouldn't be there from the start.

@Suor , got it. I agree that the tests are flawed, I mean, if they are relying on an implicit state (a _debug_ logger level) they are not _sound_ enough.
We started using this caplog thingy to assert that a message was being displayed during certain operations, maybe it was not the correct approach.

I guess there's not a consensus about what should be the correct approach, for example, docker-compose uses print for the cli/main.py module but the rest of the library uses logging.
bup uses only print while pip uses logger extensively.
yum is also a mess with print / log :confused:

You can even check this example from conda: https://github.com/conda/conda/blob/1cade00ff57b403e9c7a5ac2e4b61f4175721c18/conda/exports.py#L323-L324

def symlink_conda(prefix, root_dir, shell=None):  # pragma: no cover
    print("WARNING: symlink_conda() is deprecated.", file=sys.stderr)

They are doing a deprecation warning to stderr when the stdlib provides a warnings module with a DeprecationWarning.


That being said, I don't think moving from logger to print would improve the source code in any way (maybe a code example about how it would look like could give me more stuff to consider)

Don't get me wrong, @Suor, your concerns are totally valid, and we need to address those faulty tests; also, our source code is far from perfect and any suggestions to improve it are more than welcome!
I don't want you to get discouraged about proposing new ideas! :)

It's just a matter of clear separation of concepts. I think this post summarized it pretty well:

https://softwareengineering.stackexchange.com/questions/270352/printing-to-out-vs-logging-given-the-nature-of-this-app

Basically, you need to separate UI from the business logic (I think it's more or less done in our case). But then you need also to have two separate ways to output (or read) stuff that is related to UI and output stuff that is related to tracing the business logic (this is done via logging usually).

The last part is not done very well and isolation is broken. It most likely creates problems with testing, it most likely makes one thing to affect another (format is shared and the necessity to omit INFO makes logs non uniform). I can't even add an INFO message as far as I understand into logs without it being printed to the UI. Logs also look ugly - they are not uniformly formatted. I'm not sure about colorization - does it create any problems if we log stuff into a file?

Another thing that should be moved into some "UI" module is dvc.prompt (it's not using logger btw no to print the input). When we get to the point when we are ready to release an API we will have to move it into some wrapper most likely.

And I think, @Suor 's point was not to remove the logger and replace it with print. It's just use a separate abstraction (that internally can use even a second logger if there are reasons for that) to interact with user. An example would be Mercurial's code - they have a repo.ui class that has a lot of stuff, but among them log and write. log outputs stuff into registered loggers (and you can register many with different formatting), and write outputs stuff to the screen. I'm oversimplifying, but idea is clear.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddygups picture siddygups  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments

nik123 picture nik123  路  3Comments

jorgeorpinel picture jorgeorpinel  路  3Comments

TezRomacH picture TezRomacH  路  3Comments