Originally reported by: BitBucket: hayne, GitHub: @hayne?
Pylint (1.2.1) gives a W1401 warning ("anomalous backslash in string ...") about the following code
where the backslash is inside a docstring.
I don't think this is desirable (hence I'm listing this as a bug).
There should be at least an option to disable the warning in this specific context.
def getCurrentResultFolderPath():
"""
Return the full path to the current result folder.
Example of use:
currResultFolderPath = getCurrentResultFolderPath()
# gives something like: "C:\MyStuff\TestA\Results\Run_2013-11-08_2159_59"
"""
pass
_Original comment by_ BitBucket: rneu31, GitHub: @rneu31?:
Could you re-post your issue using the Code button so we can see the problem more clearly? The problem is hard to identify as is.
_Original comment by_ BitBucket: hayne, GitHub: @hayne?:
Here's the code from above but using the Code button (didn't know about it before):
#!python
def getCurrentResultFolderPath():
"""
Return the full path to the current result folder.
Example of use:
currResultFolderPath = getCurrentResultFolderPath()
# gives something like: "C:\MyStuff\TestA\Results\Run_2013-11-08_2159_59"
"""
pass
_Original comment by_ BitBucket: rneu31, GitHub: @rneu31?:
An easy solution (have not tested it) would be to make the docstring a raw string by adding 'r' to the front.
#!python
def getCurrentResultFolderPath():
r"""
Return the full path to the current result folder.
Example of use:
currResultFolderPath = getCurrentResultFolderPath()
# gives something like: "C:\MyStuff\TestA\Results\Run_2013-11-08_2159_59"
"""
pass
_Original comment by_ BitBucket: hayne, GitHub: @hayne?:
That "solution" (making the docstring a raw string by prefacing it with 'r') does work in the sense of preventing PyLint from warning about the backslashes.
I knew about that possibility before.
But putting an 'r' in front of the docstring destroys the aesthetics - it would make every reader of the code pause and wonder - "why is that 'r' there" when skimming through the code.
As such it is a worse cure than the disease.
I'm hoping that others will agree that PyLint shouldn't be warning about things that happen in the "privacy" of a docstring - which of course means that PyLint needs to be able to recognize docstrings as special.
_Original comment by_ Mike Frysinger (BitBucket: vapier, GitHub: @vapier?):
i don't think docstrings should be granted blanket immunity. if you do introspection on the code, then it might render incorrectly:
>>> s = """You need to use \n\r sequences when you want ..."""
>>> def func(): pass
>>> func.__doc__ = s
>>> help(func)
Help on function func in module __main__:
func()
You need to use
sequences when you want ...
surely in that case you want to use a r prefix else any tooling will show it wrong.
the posted case though doesn't use backslashes with any known sequences (\P will get you a \P), so perhaps split the warning into known & unknown escape sequences ? that way people can filter out the issue when the case is truly harmless while still getting warnings when there might be a surprising difference.
possible bucketing:
_Original comment by_ BitBucket: hayne, GitHub: @hayne?:
But in Mike Frysinger's example above, the warning would (and should) occur on the line that assigns to the variable 's'.
There is no need to also have a warning on any other line.
I.e. we are in agreement that there should be a warning for that line.
But I don't see any problem in not warning about docstrings as defined as:
_Original comment by_ Mike Frysinger (BitBucket: vapier, GitHub: @vapier?):
mmm, that's incorrect. sorry if my example is confusing if you aren't familiar with Python internals. having docstrings immediately after a definition _is_ assigned to something -- __doc__. thus my example using the python interp is equivalent to this:
$ cat test.py
def func():
"""You need to use \n\r sequences when you want ..."""
$ python
>>> import test
>>> test.func.__doc__
'You need to use \n\r sequences when you want ...'
>>> help(test.func)
<output is wrong>
it's basically a shorthand for this:
def func():
__doc__ = """..."""
so docstrings should not be immune to this check.
_Original comment by_ BitBucket: hayne, GitHub: @hayne?:
What I was suggesting is that PyLint should pick out docstrings via a special parsing of the source code - not rely on the internal representation of that code.
I don't know if PyLint makes use of the internal representation or not.
But this is precisely what I meant by PyLint needing to recognize docstrings as special.
But in your example, I would venture to suggest that what is wrong is the behaviour of the 'help' function.
I submit that the 'help' function should display the docstring precisely the way it appears in the source code.
In any case, my opinion is that PyLint should not warn (ever) about docstrings. It is a much lesser evil to have incorrect interactive help for those few cases where the docstring has special escape sequences than to have spurious warnings for docstrings. I guess my position could be summed up by saying that I consider PyLint's job to warn about problems that affect execution of the code, not interactive introspection of the code.
_Original comment by_ Mike Frysinger (BitBucket: vapier, GitHub: @vapier?):
i disagree that python's behavior is wrong, but in all likelihood, it's never going to change. fundamentally, a docstring is merely an assignment of a string to doc (usually, but not required to be, triple quotes). which means it should behave that way.
some code utilizes doc normally during runtime too. it's not uncommon for setting up argparse module like so:
parser = argparse.ArgumentParser(description=__doc__)
in which case, the execution of the code is wrong and pylint should be warning.
_Original comment by_ Steven Myint (BitBucket: myint, GitHub: @myint?):
@hayne, the warning is correct. If you print your docstring, you will get garbage out if your Windows-style path happens to contain a \n in it. I suggest you use an r prefix as @vapier suggested or use normal slashes rather than Windows-specific backslashes.
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
I agree that we cannot expect help() to suddenly change and start escaping special characters. However, I think pylint should not warn about unknown backslash sequences in docstrings. That would limit the need to add 'r' to docstrings only to those docstrings that are shown by help() incorrectly.
_Original comment by_ Mike Frysinger (BitBucket: vapier, GitHub: @vapier?):
as i mentioned, help() is merely one example -- anything that uses introspection will be wrong. that includes runtime debugging or automatically generated docs. i don't think we'll find a default that satisfies everyone -- imo the current behavior is correct and i want pylint warning about it. thus the only way to make it work for everyone would be to split the warnings, perhaps as outlined in my earlier comment. that'd allow people who are fine with their docstrings being broken to disable the warning in pylintrc.
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
I understand that help() is just one example and that "known backslashes" are interpreted in docstrings. But I had to read this discussion to fully realize the implications of having backslashes in docstrings. I'm fine with doing nothing, but I feel that this issue will annoy users and they will need to find this information to understand why they should add 'r' to their docstrings. Maybe there should be a separate warning for docstrings that users would be able to look up.
_Original comment by_ Mike Frysinger (BitBucket: vapier, GitHub: @vapier?):
i think that's a general problem with pylint messages. there's plenty of warnings that users wrangle with. there should be an official wiki like http://pylint-messages.wikidot.com/ that is referenced by pylint itself and is better curated :).
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
Yes. If a program is telling people what they do wrong, it can be expected that they would complain. I feel enlightened after reading this bug, but there should be a way to enlighten others. If pylint showed a separate message for docstrings, hopefully the users won't complain that pylint doesn't recognize docstrings. They would stop for a moment and look for information online.
_Original comment by_ Claudiu Popa (BitBucket: PCManticore, GitHub: @PCManticore):
Hey, guys, sorry for not chiming in earlier. I don't want Pylint to stop emitting this warning for docstrings, but I agree that a separate warning would benefit the users more, since they can disable this particular warning, without losing other possible occurences for them. So, if anyone of you wants to propose a patch for this, I'll be glad to review it and integrate it (I don't have that much time to tackle this right now).
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
Claudiu, thank you for your support. I looked at the code. It looks like StringConstantChecker would need to implement IAstroidChecker. ITokenChecker supplies tokens to process_tokens() that don't contain information which strings are docstrings.
SpellingChecker could serve as an example how to work docstrings. On the other hand, SpellingChecker doesn't check every string. StringConstantChecker would need to suppress duplicate messages for docstrings. I see two approaches. One would be to record all docstrings with positions and skip them when iterating over tokens. Another approach would be to rely on IAstroidChecker only (no idea of it's possible to get all strings for a node).
I have no idea if I'll be able to do that. Just recording where I'm leaving for now.
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
It turns out there are two problems with IAstroidChecker. It gives us processed strings without the prefixes. I don't see any way to get the original text for the node (if there is one, we are all set). So then the idea is to record the locations of the docstrings to classify the strings later. But I don't see how to get those locations. For visit_function(), node.doc is just a string, without location information. visit_const() provides locations, but it actually skips docstrings (not sure if it's safe to assume that all unreported string constants are docstrings). Another problem - IAstroidChecker slots are called after process_tokens(). I could make a list of all strings and delay processing until the docstrings are known, but that would be ugly.
_Original comment by_ Mike Frysinger (BitBucket: vapier, GitHub: @vapier?):
do you have details about the variable/node it's assigned to ? i.e. whether it's assigned to "doc" or something else.
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
When we get tokens, there is no information what is a docstring and what is not. When we get IAstroidChecker calls, we know the docstrings (i.e. what was implicitly assigned to __doc__), but I haven't figured out to find their original form in the source code.
_Original comment by_ Claudiu Popa (BitBucket: PCManticore, GitHub: @PCManticore):
You can't find the original form in the source code, since we are using the ast module for the low level parsing part and the parsing with this module is lossy. You lose a lot of informations about the code, such as comments or the original form of the docstrings. This most likely needs to be fixed using a token checker and maintaining a state of the docstrings.
_Original comment by_ Pavel Roskin (BitBucket: pavel_roskin):
It surprises me how pylint even works without a non-lossy parser! To find potential problems, we want to know both the original code and its interpretation. OK, it works nevertheless, defying all logic, but my petty patch cannot be written without that feature :)
Unfortunately, we learn what strings are doctrings way too late. It can be worked around, but I'm afraid the result would be too ugly for my taste.
Providing another example of why this rule does not work in my case. I'm using regular expression to search for these special characters in my script and remove them.
characters_to_be_removed = re.compile('\[|\]|\(|\)|\.|"')
Pylint gives the 'anomalous backslash' warning, but I can not use 'r' here to remove the backslashes (as then re would not understand my pattern correctly).
not sure why you think r doesn't work, but it does, and you should be using it here.
>>> import re
>>> characters_to_be_removed = re.compile('\[|\]|\(|\)|\.|"')
>>> characters_to_be_removed.sub('', """this string: [ ] ( ) . " stuff""")
'this string: stuff'
>>> characters_to_be_removed = re.compile(r'\[|\]|\(|\)|\.|"')
>>> characters_to_be_removed.sub('', """this string: [ ] ( ) . " stuff""")
'this string: stuff'
if you have a concrete example, please post it.
Maybe we can use a double backslash \\ to avoid the W1401 warning. Considering the popularity of using \\ (i.e, C:\\Documents), so I think it might not destroy the docstring's aesthetics very much, different from putting an 'r' in front of it.
_Original comment by_ Steven Myint (BitBucket: myint, GitHub: @myint?):
@hayne, the warning is correct. If you print your docstring, you will get garbage out if your Windows-style path happens to contain a
\nin it. I suggest you use anrprefix as @vapier suggested or use normal slashes rather than Windows-specific backslashes.
What if it is already an f-string?
@nurettin
What if it is already an f-string?
It can be both:
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: fr'hello \n {3/2} world'
Out[1]: 'hello \\n 1.5 world'
In [2]: print(_)
hello \n 1.5 world
This issue should be closed. Anomalous backslash escapes have been officially deprecated since Python 3.6 and are on a path to be an error in the future.
I completely disagree about closing this issue.
The comments seem to conflate two separate things:
1) what is appropriate for strings in general
2) what is appropriate for docstrings
To my mind, the whole point of this issue is that docstrings should be treated differently than other strings.
Docstrings should be treated like comments - anything that doesn't interfere with the delimiters should be allowed (without warnings).
It might be better if there were a separate syntax for docstrings, but in the absence of that, PyLint should simply treat them differently.
Python only knows strings. when it makes it an error, docstrings will fail too. hence @willsALMANJ's point that this request is moot.
Yes, I am sympathetic to the idea of having a different message for docstrings, but it seems that that would require a major architectural change to pylint based on what is said in this comment above. Also, note this question about whether the deprecation should apply to docstrings and Guido's one word answer following it.
Most helpful comment
_Original comment by_ BitBucket: hayne, GitHub: @hayne?:
That "solution" (making the docstring a raw string by prefacing it with 'r') does work in the sense of preventing PyLint from warning about the backslashes.
I knew about that possibility before.
But putting an 'r' in front of the docstring destroys the aesthetics - it would make every reader of the code pause and wonder - "why is that 'r' there" when skimming through the code.
As such it is a worse cure than the disease.
I'm hoping that others will agree that PyLint shouldn't be warning about things that happen in the "privacy" of a docstring - which of course means that PyLint needs to be able to recognize docstrings as special.