Openlibrary: Remove calls to pprint() from in use catalog endpoints

Created on 29 Oct 2018  路  13Comments  路  Source: internetarchive/openlibrary

Most helpful comment

Thanks @YBthebest for looking into these! Much appreciated.

I agree with your comment to remove the find_works.py#L601 assertion failures pprint.

Also agree with @cclauss on the catalog/marc/simple.py pprint is OK since that file looks to be a stand alone script with command line args and its output is information to stdout.

In the load.py example above where pprint() is used to output some data then raise is used, TBH I would remove the whole try except block since any exceptions are raised anyway, and that's how we would discover them. If it had been a problem, these exceptions would have been handled properly. As it stands, the try is not currently handling the exception in any useful way, so it may as well not be there.

In general, calls to pprint() are only ok if they are part of some standalone script that is used from the command line and is supposed to send output to stdout. The main Open Library application has no reason to send information to stdout (except maybe some brief output on startup) as in normal usage it is not collected or monitored.

All 13 comments

Looks like there are still a few instances remaining: https://github.com/internetarchive/openlibrary/search?q=pprint&unscoped_q=pprint

uses of pprint() in the main OL application will not be captured anywhere useful. They look like leftovers from debugging, or form when code had an independent existence as a standalone script.

If the output is important, it should go to logging

This seems like a great easy first issue for me, can I get this one done? Thanks !

So from what I have seen, it is being used mainly in four cases:

  • in 'except' constructions
  • before asserts, with an if that causes it to only print if the assert will failed
  • in impossible places (i.e after a continue() statement, or a sys.exit() call)
  • or just in general for debugging purposes

should I remove the last two cases and move the two first cases to logging?

The second form sounds like a debugging case to me as well and could probably be addressed by converting to a 2 argument assert statement

assert 1==2,"One didn't equal Two"

Should I simply remove the pprint statement in those cases then, after conversion? Or change them to a logging?

Assertion failures are presumably going to get logged, so I think the pprints can be removed, but if there are lots of them, perhaps you want to point at an example before doing a lot of work.

It feels to me that in this case, if the assertion failures are being logged anyway, the if statement with the pprint can be entirely removed, right?

I'm trying to get this done at the moment. I was wondering about such an example:
https://github.com/internetarchive/openlibrary/blob/d3f63c1d6f8864d1a57f7e439bc2e02ad0a8411f/openlibrary/catalog/marc/simple.py#L74
The simple.py script seems to be used only for testing purposes. What should be done in such a case with the pprint() call?

Secondly, in cases as the following:
https://github.com/internetarchive/openlibrary/blob/d3f63c1d6f8864d1a57f7e439bc2e02ad0a8411f/openlibrary/catalog/importer/load.py#L183
Where the pprint is use in the "except" of a try, should it be logged instead? If so, how?
Thanks for your help @tfmorris and others !

My vote would be that __pprint()__ is OK for this use case.

Thanks @cclauss. What would be a more general heuristic of the cases in which the pprint() calls are not OK then? When they are clearly used for debugging, etc.?

Thanks @YBthebest for looking into these! Much appreciated.

I agree with your comment to remove the find_works.py#L601 assertion failures pprint.

Also agree with @cclauss on the catalog/marc/simple.py pprint is OK since that file looks to be a stand alone script with command line args and its output is information to stdout.

In the load.py example above where pprint() is used to output some data then raise is used, TBH I would remove the whole try except block since any exceptions are raised anyway, and that's how we would discover them. If it had been a problem, these exceptions would have been handled properly. As it stands, the try is not currently handling the exception in any useful way, so it may as well not be there.

In general, calls to pprint() are only ok if they are part of some standalone script that is used from the command line and is supposed to send output to stdout. The main Open Library application has no reason to send information to stdout (except maybe some brief output on startup) as in normal usage it is not collected or monitored.

To get the ball rolling I submitted a PR. There are a couple of instances of pprint left in places that seem to satisfy the criteria you mentioned @hornc. Let me know if it looks decent to you ! Thanks for all the help, appreciate it 馃憤

Was this page helpful?
0 / 5 - 0 ratings