Pylint: Detect when f-string-syntax is used in a string, but not marked as an f-string

Created on 21 Sep 2018  路  5Comments  路  Source: PyCQA/pylint

Problem statement

Consider the following codeblock

name = "Katie"

stmt1 = f"Hello {name}"
stmt2 = "Hello {name}"

While the block is correct syntax-wise, the content is almost certainly logically invalid, as seen below:

>>> print(stmt1)
Hello Katie
>>> print(stmt2)
Hello {name}

stmt2 is a valid string, but declares string interpolation formatting that is compatible with f-strings, but the string does not have the f prefix to invoke the interpolation.

Describe the solution you'd like

PyLint should detect strings that use f-string-like syntax, but have not been declared as f-strings, in Python 3.6+

This could either be by:

  • attempting to evaluate the content within {}, and if valid Python, return a warning
  • attempting to evaluate the entire string as an f-string, and if the evaluation does not equal the original string, return a warning

Edge Case One: Having { and } characters are perfectly valid, but on the edge case that the content within them is a reference to an in-scope variable, or some other expression, it is likely the string was meant to be an f-string. This would be desirable to catch as a linting error.

Edge Case Two: f'...' is effectively shorthand for '...'.format(locals()); so this:

mystring = 'hello {name}'
print(mystring.format(name='Katie'))

is entirely legal, and isn't an error. There is a possibly design decision about whether this linting check should be

  • enforced by default
  • enforceable as an option
  • enforceable if the string that triggers the problem doesn't have '.format' invoked on it in the same method.

Additional context

I understand PyLint already warns against f-strings in logging, but I can't locate the base functionality to detect f-strings in the first place. I'd be willing to help implement this feature, but would require some mentoring.

checkers enhancement

All 5 comments

Hey @glasnt !

This makes sense, but will be tricky due to the string potentially being used as a formatting string later on down the line. But we can definitely try to look for this pattern as it might become more prevalent now with the adoption of f-strings.

Here's what we can do. Let me know if something of the following doesn't make sense.

  • f-strings are internally marked as astroid.JoinedStr (because we use typed_ast to parse the source code, there's currently a bug where some f-strings are marked as FormattedValue, but we can check just JoinedStr for now until that gets fixed)

  • being a new check regarding strings, this can go in pylint/checkers/strings.py, where you only have to implement a def visit_str(self, node) method to visit every string in the analyzed program. This does not include JoinedStr/f-strings, but it's exactly what we want here.

  • next we'll probably need to figure out that a given string was intended as an f-string in the first place. As you mentioned, we can take a hint from the local scope, which you can grab with node.scope(). If you have the variables names extracted from the string, you can grab the relevant nodes that they represent with node.scope().locals.get(name). .locals is just a dictionary from strings to nodes. Once we know that all the names are there (or in the global scope, which you can access with node.root(), there's only the matter of figuring out if the said string was formatted later on via .format(**locals()) or something akin to that construct. You might be able to check by looking for Attribute nodes in the same scope where the left hand side of the attribute is our string, something along the lines of:

# the skip_class is to make sure we don't go into inner scopes, but might not be needed per se
for attr in scope.nodes_of_class(astroid.Attribute, skip_class=(astroid.FunctionDef, )):
    if isinstance(attr.expr, astroid.Name): # LHS of `name.format`
      # Check if it's using `.format`
       if attr.expr.name == 'the name you are looking for' and attr.expr.attrname == 'format':
           # then the string was being used to be formatted after all so stop checking

  • as mentioned early, you'll need most likely to extract the variables that the said string contains. You can use pylint.checkers.strings.parse_format_method_string for that, which is already being used in the mentioned file.

  • another note is that node.parent and node._astroid_fields are your friends. .parent walks up the tree to the parent, so if you need to grab the name of the string that was supposed to be used as a f-string, just call .parent until you get to an Assign node. If it has a name, it might be possible though that it was intended for .format() to be called on it, but you can follow the above instructions to make sure that's the case. And _astroid_fields is usually a sequence that shows what AST children a node can have. There's also _other_fields which is useful as well, as it contains non-AST members of a node. For instance Attribute.attrname, which is the name of the attribute being accessed, will be found in _other_fields. Another useful utility is node.repr_tree() which shows exactly how the subtree looks for that node.

That would be all that's needed to get this going. Let me know if there's anything else I can do to help.

I would also like this check. Missing those dastardly little 'f's is a common occurrence in my experience, and while the edge cases noted in this thread are hypothetically valid, as best I can tell, not a single one of them apply to the code bases I worked in day to day. So I'd be more than happy to have at least a way to opt-in to this check (or, of course, having this be the default).

I think this would be a very useful addition to have - +1

Though it's worth noting that there are edge cases where { and } exist in a string that is not an f string, and that's a very valid case - i.e. s = "awk '{print $1}'"

I still think some analysis on these would be an improvement

The analysis would need to check if the contents are

  1. valid python and
  2. any referenced variables are in locals/globals

Example of relevance: Many examples on https://thedailywtf.com/series/errord

One of the most common errors people run into is interpolation code not being interpolated, resulting in things like

Dear {customer.displayname},
...

or frustrating things like

An error has occurred: {errormessage}

Such examples have been posted from many sources, including some of the largest IT companies, so missing interpolation seems to be a widespread and very user-facing bug type.

Was this page helpful?
0 / 5 - 0 ratings