for example (play)
let foo = r##"bar"###;
results in
error: expected one of `.`, `;`, `?`, or an operator, found `#`
--> src/main.rs:2:25
|
2 | let foo = r##"bar"###;
| ^ expected one of `.`, `;`, `?`, or an operator here
while too few has a much nicer error:
error: unterminated raw string
--> src/main.rs:3:15
|
3 | let baz = r##"quxx"#;
| ^ unterminated raw string
|
= note: this raw string should be terminated with `"##`
This issue has been assigned to @rcoh via this comment.
Gonna work on it (my first work on the compiler internals, so please bear with me :) )
Feel free to ask for help!
I feel both of these errors could be tweaked slightly:
error: expected one of `.`, `;`, `?`, or an operator, found `#`
--> src/main.rs:2:25
|
2 | let foo = r##"bar"###;
| -- ^
| | |
| | the raw string needs 2 trailing `#`, but found 3
| | help: remove this `#` (this should be a hidden suggestion for rustfix' sake)
| the raw string has 2 leading `#`
and
error: unterminated raw string
--> src/main.rs:3:15
|
3 | let baz = r##"quxx"#;
| ^ -
| | |
| | the raw string needs two trailing `#`, but found 1
| | help: close the raw string: `##`
| unterminated raw string
The later might be harder to accomplish, needing to keep track of _possible_ but incorrect closing spans (probably just looking for "#
in the string being processed). This seems like a different enough task, and harder enough to be worthy of a separate PR.
I indeed have a question @estebank. The fatal error itself is emitted at https://github.com/rust-lang/rust/blob/80e7cde2238e837a9d6a240af9a3253f469bb2cf/src/libsyntax/parse/parser.rs#L729 but my logic is currently somewhere around https://github.com/rust-lang/rust/blob/80e7cde2238e837a9d6a240af9a3253f469bb2cf/src/libsyntax/parse/lexer/mod.rs#L1163-L1240
I don't see how I can add labels to that error message without actually creating my own fatal which would most likely will end up in code duplication.
Solutions I see:
This is what my solution currently looks like.
error: The raw string needs 2 trailing `#`, but found 3
--> /home/op/me/rust2/src/test/ui/parser/raw/raw-literal-too-many.rs:2:26
|
LL | let _foo = r##"bar"###;
| ^ remove this `#`
error: expected one of `.`, `;`, `?`, or an operator, found `#`
--> /home/op/me/rust2/src/test/ui/parser/raw/raw-literal-too-many.rs:2:26
|
LL | let _foo = r##"bar"###;
| ^ expected one of `.`, `;`, `?`, or an operator here
error: aborting due to 2 previous errors
In addition to too many #
, I was looking at the case where there are less #
than expected, but not in the same line, but over a lot of lines (in my newly added testcase there are 18 lines now).
I would like to improve that error message as well, something like:
error: unterminated raw string
--> $DIR/raw-string-long.rs:2:13
|
LL | let a = r##"This
| ^
| started here with 2 `#`
...
|
LL | ends"#;
| ^ ended here with only 1 `#`
| help: close the raw string with `"##`
error: aborting due to previous error
I'm not sure about how to do that, if I just can use a span from the very beginning to the end and it will leave out the intermediate lines automatically or do I have to do something about that?
On the other hand, is that even possible to detect? I don't know if one can find the end of this, except there an EOF. What do you think about this?
I looked at the first implementation of this cc #48546 @GuillaumeGomez . What do you think about this?
That might make things more clear, indeed. :+1:
I'm not sure about how to do that, if I just can use a span from the very beginning to the end and it will leave out the intermediate lines automatically or do I have to do something about that?
@hellow554, you should be able to synthesize a span that covers the entire closing sigil ("####
) where we look for it and break if we don't find it. We should hold a vec of spans that we add to every time we find them, because it is possible that we indeed have a raw string that contains "#
multiple times and we can't assume that the first we find is correct. We could have some extra smarts in case we were writing something like r###"r##"r#""#"##"###
or r###"r##"r#""#"#"#
to avoid suggesting all three closing spans, but for now the na茂ve way should be fine.
When supplying a span that covers multiple lines, the cli output will show the first 5(?) and last two lines of the covered span and draw the ascii art pointing at the start and end. The output you envision would happen by providing independent spans, and it's what I would prefer.
For the too many #
at the end, you could proactively look for them before the break
, by following case 1 and using something like
let lo = self.span;
let mut too_many = false;
while self.ch_is('#') {
too_many = true;
self.bump();
}
if too_many {
let sp = lo.to(self.span);
let mut err = self.sess.span_diagnostic.struct_span_err(sp, "too many `#` when terminating raw string");
err.span_label(sp, "too many `#` for raw string");
err.hidden_span_suggestion(
sp,
"remove the unneeded `#`"
String::new(),
Applicability::MachineApplicable,
);
err.emit();
}
By doing this, the parser (the lexer, rather) will continue on its merry way and parse the rest of the file as if nothing had happened.
Thanks estebank. That was very helpful indeed. I haven't thought of consuming the bad characters so that the lexer can do the rest for me. Very nice.
I'm a bit undecided where the journey goes. Currently I have this:
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
|
2 | let foo = r##"bar"####;
| ^^^^^^^^^^--
| | |
| | help: remove the unneeded `#`
| The raw string needs 2 trailing `#`, but found 4
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
|
2 | let a = r##"This
| _____________^
3 | | is
4 | | a
5 | | very
... |
19 | | lines
20 | | "###;
| | ^
| | |
| |________help: remove the unneeded `#`
| The raw string needs 2 trailing `#`, but found 3
This is nice and looks very neat :) This is not a hidden suggestion as you see, is that okay for you as well? I mean, yes, it's kind of obvious which ones to remove, but I also like the expressiveness here.
I also tried to add the text "The raw string has {} leading #
", but it looks very crowded to me, when the raw string only is one line.
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
|
2 | let a = r##"This
| ^-- The raw string has 2 leading `#`
| _____________|
| |
3 | | is
4 | | a
5 | | very
... |
19 | | lines
20 | | "###;
| | ^
| | |
| |________help: remove the unneeded `#`
| The raw string needs 2 trailing `#`, but found 3
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
|
2 | let foo = r##"bar"####;
| ^--^^^^^^^--
| || |
| || help: remove the unneeded `#`
| |The raw string has 2 leading `#`
| The raw string needs 2 trailing `#`, but found 4
Should I add an check if the raw string only spans over one line? If yes, how to do that? ^^
There are two options, either we don't use the full string's span (showing only the start and the end), or use is_multiline
to provide different output depending on the visual result.
I think a good output using the former approach would be:
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
|
2 | let foo = r##"bar"####;
| -- ^^^^
| | |
| | ...but it's closed with 4
| the raw string has 2 leading `#`...
With a hidden suggestion pointing at the last two #
to avoid clutter.
This is not a hidden suggestion as you see, is that okay for you as well? I mean, yes, it's kind of obvious which ones to remove, but I also like the expressiveness here.
I'm ok with using visible suggestions when possible, but for borderline cases where the course of action is obvious and the output would be too cluttered, I'd lean towards hiding the suggestion. They will still be visible in VSCode and actionable by rustfix.
That looks very promising!
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
|
2 | let a = r##"This
| ^-- The raw string has 2 leading `#`...
| _____________|
| |
3 | | is
4 | | a
5 | | very
... |
19 | | lines
20 | | "###;
| |______--^
| |
| ...but is closed with 3.
= help: remove the unneeded `#`
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
|
2 | let foo = r##"bar"####;
| ^--^^^^^----
| | |
| | ...but is closed with 4.
| The raw string has 2 leading `#`...
= help: remove the unneeded `#`
The only thing is (that bugs me a little bit) are the ^^^
between the ----
.
but I haven't found a way to get rid of them. I think it's okay, but they are not needed IMHO. Time to work on the other suggestion!
Reducing the error span to only the ending `# symbols does not look nice for multi line string literals.
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many.rs:2:23
|
2 | let foo = r##"bar"####; //~ERROR too many `#` when terminating raw string
| -- ^^^^ ...but is closed with 4.
| |
| The raw string has 2 leading `#`...
= help: remove the unneeded `#`
error: aborting due to previous error
op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many-long.rs:20:6
|
2 | let a = r##"This
| -- The raw string has 2 leading `#`...
...
20 | "###; //~ERROR too many `#` when terminating raw string"
| ^^^ ...but is closed with 3.
= help: remove the unneeded `#`
error: aborting due to previous error
Therefore I'm afraid that's not an option. Any other way to get a nice result?
Wow.. TIL the code for a raw string, aka r#"foo"#
and a raw byte string aka br#foo"#
are actually two different code paths... Let me try to merge them :/ (https://github.com/rust-lang/rust/blob/1962adea6ad9b992516ae56ad7f8c5bc33b951cb/src/libsyntax/parse/lexer/mod.rs#L1369-L1427 vs https://github.com/rust-lang/rust/blob/1962adea6ad9b992516ae56ad7f8c5bc33b951cb/src/libsyntax/parse/lexer/mod.rs#L1139-L1216)
I'm really going forward in this issue and I really like the architecture of the compiler (although it's confusing sometimes, because sometimes you need to pass a &str
but sometimes a String
which is kind of weird and has no reason, but meh... 馃檮)
If you could me assist further with the error messages I would be happy.
If you could me assist further with the error messages I would be happy.
@hellow554 Of course! Feel free to reach out with whatever problem you might face.
As for the output, I think the following should be ok for all cases (even though it hides a big of content in between that isn't strictly relevant):
``
error: too many
#when terminating raw string
--> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
|
2 | let a = r##"This
| -- this raw string has 2 leading
#
...
20 | "###;
| ^^^ this raw string is closed with 3 trailing
#, but should be 2
= help: remove the unneeded
#`
@rustbot claim
Ummm.... not what I expected, but okay.... @pietroalbini can you take a look at this? ^^
@hellow554 that's the correct behavior: you can't be assigned to the issue directly because GitHub only allows organization members to be assigned to issues. The only way we have to get around that is to assign the bot itself and add a note in the top message that you're actually the one assigned.
@hellow554 that's the correct behavior: you can't be assigned to the issue directly because GitHub only allows organization members to be assigned to issues. The only way we have to get around that is to assign the bot itself and add a note in the top message that you're actually the one assigned.
Thanks for the clarification. I haven't seen the "This issue has been assigned to @hellow554 via this comment." in the top post. Thanks for that.
@rustbot claim
Working on this now, should have a diff in a few days
Most helpful comment
Gonna work on it (my first work on the compiler internals, so please bear with me :) )