Operating system: Ubuntu 16.04
Python version: 3.7.0
Black version: 18.6b4
Does also happen on master: yes
For this minimal example, the syntax error is reported on the second line instead of the first:
print(f"spam", ham="eggs")
pancakes = [,,]
It happens with at least several other syntax errors too (eg pancakes = [..], pancakes =).
It happens no matter how many valid statements are added before, after, or between the lines.
It only happens with the print function, and only when passing it a keyword argument.
What are you doing exactly? Black is not meant for reporting syntax errors, it's a code formatter.
This is because of how black tries multiple possible python grammars when encountering a syntax error. It can't tell the "real" cause of the syntax error, because the last grammar it tries doesn't accept print as a function.
What are you doing exactly?
Just trying to format some code, and getting confusing error messages.
Black is not meant for reporting syntax errors
Black reports syntax errors, and not by accident. I would argue that it is meant to report syntax errors, and if it reports them incorrectly that's a bug, even if reporting syntax errors is not listed in the mission statement.
black tries multiple possible python grammars ... the last grammar it tries doesn't accept print as a function
Interesting. Although it apparently does accept print as a function, just not when it is given keyword arguments. Is that a bug in the last grammar?
Is that a bug in the last grammar?
No. Python 2 and Python 3 are different in this regard as you might now. Black can be used to format both Python 2 and Python 3 code. We default to the Python 3 grammar because we're not Luddites. However, for users who are still on Python 2 we try to be helpful to them, too, and when the Python 3 grammar fails to parse, we attempt Python 2 and see what happens.
The problem here is a bit tricky to solve because out of many syntax errors that we see, we don't know which syntax error is "the worst", in other words which is the one we should report to the users.
Let me give you an example:
async def g():
print 'Hey'
This is clearly an invalid file but Black can't know what the user meant, e.g. which part is wrong and should be addressed by the user. It can be either.
You might think, maybe a sensible heuristic would be to choose the syntax error out of all available errors that comes from the farthest line in the file? That means the parser was able to get through all the previous "errors" that other grammars found problematic! But in this case, if somebody puts a Python 2-only piece of code somewhere high in their file, we will report valid Python 3 code below as invalid syntax. So, there's no silver bullet.
I think Jelle is right here. If your code cannot be compiled by Python, you need to address this.
I will leave this open if anybody has a better idea, especially that I want to add more grammars (Cython!) so this is only going to be a bigger problem in the future.
Ok thank you for the explanation. I think Black's behavior makes sense here, and maybe a slightly more detailed error message would be enough to avoid confusion.
For example, the error and the grammar could both be printed for each grammar tried.
$ echo "print(f'spam', ham='eggs'); pancakes = [,,]" > test.py
$ black test.py
error: cannot format test.py:
Failed parsing as python 3: 1:30: pancakes = [,,]
Failed parsing as python 2: 1:18: print(f"spam", ham="eggs")
All done! 馃挜 馃挃 馃挜
1 file failed to reformat.
@ambv @zsol : Isn't the fix obvious?
Allow the user to restrict black regarding which grammars it will try (can be restricted to: pick one, default any)?
The current behaviour is really problematic, especially when:
Then you can basically forget about getting any indication on why your code is not formatted...
... and this even if you try get black have that last py2 grammar check not fail via from __future__ import print_function - and also even if you try have black do py36 syntax only:
# echo -e 'from __future__ import print_function\nlambda: print()\nif i:\ni=1' | black --py36 -
from __future__ import print_function
lambda: print()
if i:
i=1
error: cannot format -: Cannot parse: 2:8: lambda: print() # <--- aaaargh
Yes, it is a formatter and not a syntax checker - but it was just so cool to have both steps done in one and it was a low hanging fruit for you to deliver also that, considering the advanced way you do things. It may be not intended by you but it is one big selling point.... So taking the handwaving route (of @JelleZijlstra) on that one would be really sad...
Now:
__future__ import would be hard to see and apply for you.--py36=> So, why not give the user the chance to say: Hey, do apply grammar xyz (e.g. python3.7) only?
Ok, I had a quick check in the code and it would be pretty simple, unless I overlooked sth.
Not digging into click/options, just use an env var (same testfile than above):
# black r.py
error: cannot format r.py: Cannot parse: 2:8: lambda: print() # <-- bad
All done! 馃挜 馃挃 馃挜
1 file failed to reformat.
# export BLACK_GRAMMAR=py3
# black r.py
error: cannot format r.py: Cannot parse: 7:4: i = 1 # <---- good
All done! 馃挜 馃挃 馃挜
1 file failed to reformat.
really easy, too small for a PR - basically give the grammars a name, so that the user can pick one:
GRAMMARS = [
- pygram.python_grammar_no_print_statement_no_exec_statement,
- pygram.python_grammar_no_print_statement,
- pygram.python_grammar,
+ ["py3", pygram.python_grammar_no_print_statement_no_exec_statement],
+ ["py2_with_futures", pygram.python_grammar_no_print_statement],
+ ["py2", pygram.python_grammar],
]
+# TODO: should be click option:
+user_req_grammar = os.environ.get("BLACK_GRAMMAR")
+if user_req_grammar and not [g for g in GRAMMARS if g[0] == user_req_grammar]:
+ user_req_grammar = None # fall back to checking all
def lib2to3_parse(src_txt: str) -> Node:
"""Given a string with source, return the lib2to3 Node."""
- grammar = pygram.python_grammar_no_print_statement
if src_txt[-1:] != "\n":
src_txt += "\n"
- for grammar in GRAMMARS:
+
+ for grammar_name, grammar in GRAMMARS:
+ if user_req_grammar and grammar_name != user_req_grammar:
+ continue
drv = driver.Driver(grammar, pytree.convert)
try:
At least you should, imho, use pygram.python_grammar_no_print_statement last - can't see a reason why you let errors seen by this one be hidden by errors seen by the "dumbest" of them all (dumb=ignores future print_function imports), which is pygram.python_grammar in your current version.
Ok, I did make a PR: https://github.com/ambv/black/pull/528 now, really would like to have it in if possible...
Since #618 --target-version allows you to choose the grammar, so the original minimal example reports a syntax error on the first line with -t py27, and the second line with -t py36
Most helpful comment
No. Python 2 and Python 3 are different in this regard as you might now. Black can be used to format both Python 2 and Python 3 code. We default to the Python 3 grammar because we're not Luddites. However, for users who are still on Python 2 we try to be helpful to them, too, and when the Python 3 grammar fails to parse, we attempt Python 2 and see what happens.
The problem here is a bit tricky to solve because out of many syntax errors that we see, we don't know which syntax error is "the worst", in other words which is the one we should report to the users.
Let me give you an example:
This is clearly an invalid file but Black can't know what the user meant, e.g. which part is wrong and should be addressed by the user. It can be either.
You might think, maybe a sensible heuristic would be to choose the syntax error out of all available errors that comes from the farthest line in the file? That means the parser was able to get through all the previous "errors" that other grammars found problematic! But in this case, if somebody puts a Python 2-only piece of code somewhere high in their file, we will report valid Python 3 code below as invalid syntax. So, there's no silver bullet.
I think Jelle is right here. If your code cannot be compiled by Python, you need to address this.
I will leave this open if anybody has a better idea, especially that I want to add more grammars (Cython!) so this is only going to be a bigger problem in the future.