Curly quotes “ (U+201C) and ” (U+201D) should be supported for opening and closing quotes respectively, and also japanese quotes 「 (U+300C) and 」 (U+300D), and so on.
I'm not quite sure how to handle all the differences without allowing weird things (opening with a closing quote, closing with an opening quotes), but there's no need to support nesting. All Unicode quotes should be backslash-escapable.
I'm only asking this because I had my fair share of end users complaining that their config files and stuff stopped working after they edited them with notepad.exe and the OSX text editor.
should be supported
Why?
@SimonSapin the motivation was given:
I'm only asking this because I had my fair share of end users complaining that their config files and stuff stopped working after they edited them with notepad.exe and the OSX text editor.
This suggestion was met with ridicule on IRC as well, but I do think it's a reasonable suggestion. At least we shouldn't pretend there is no possible motivation. Other mainstream languages don't support smart quotes, but other mainstream languages also don't support statically assured data race freedom!
Real world example: the code examples in this blog post (from reddit) will not compile if simply copy-pasted, because they contain smart quotes. This is arguably the blog author's fault, but hey.
Now, there are disadvantages too, like adding complexity and making it hard to read someone else's code (if your editor can't display the smart quotes) or edit it (if you don't know how to type them).
Now, there are disadvantages too, like adding complexity and making it hard to read someone else's code (if your editor can't display the smart quotes) or edit it (if you don't know how to type them).
To avoid too much complexity we should just have an opening list and a closing list. Things that can be used as both would be in both, such as ASCII ", and according to wikipedia also “ (U+201C) (german uses this as closing quote), and so on. To avoid not being able to type them, we shouldn't force the opening script to match the closing script.
@SimonSapin the motivation was given
(That paragraph was added after I wrote my comment.)
Is this actually a common problem? Worth mention that we already display help text when one uses 'smart' quotes instead of 'dumb' quotes:
fn main() {
let a = “foo”;
}
<anon>:2:13: 2:14 error: unknown start of token: \u{201c}
<anon>:2 let a = “foo”;
^
<anon>:2:13: 2:14 help: unicode character '“' (Left Double Quotation Mark) looks much like '"' (Quotation Mark), but it's not
<anon>:2 let a = “foo”;
^
(playpen)
@SimonSapin ok, sorry for jumping to conclusions!
On Sun, Jun 19, 2016 at 7:11 PM, Simon Sapin [email protected]
wrote:
@SimonSapin https://github.com/SimonSapin the motivation was given
(That paragraph was added after I wrote my comment.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rfcs/issues/1655#issuecomment-227026195,
or mute the thread
https://github.com/notifications/unsubscribe/AAC3nyPJnkUA8uQZwFB1-UmTdrAngffVks5qNcySgaJpZM4I5OnH
.
One major disadvantage to supporting multiple codepoints for opening/closing characters: bulk edits using CLI tools could be significantly more difficult. Granted, Rust shouldn't be stringly-typed, but if I have a &str literal (or pattern of literals) in various places, and I was to grep/sed around for them, allowing fancy quotes (more likely even if unintentional if they compile successfully) into a codebase could cause that edit to miss instances of the string.
@dikaiosune \q or [[:quotation_mark:]] or something? (does POSIX have a "quotation mark" character class yet?)
Only laziness has prevented me from making such an RFC before.
My suggestion, upon consideration: support curly quotes (at least “” and ‘’, others I personally don’t mind about) with a warning. And then don’t worry too much about syntax highlighters (they can add it if they want, or not as the case may be).
See, the main time this comes up is with newcomers; this is a feature to make things a bit easier for people when they’re copying from slideshows. Most cases where it occurs will be one-off code which is immediately thrown away after the point is demonstrated. If we can make it easier for people at low cost, that’s good. Public opinion seems to be against supporting curly quotes as a first-class syntactic element (though personally I’d be rather interested to see a language which _only_ supported curly quotes!), perhaps this form (accepted as a discouraged form with a warning to match) might be acceptable?
@SoniEx2 Quite possibly, but I don't think I've ever seen anything other than s/\"foo\"/\"bar\"/ or similar in use before. Supporting non-ASCII quotes seems like a bit of a footgun to me, and only has a tiny marginal benefit.
@dikaiosune It would however make Rust stand out. First language to break strings or something. :stuck_out_tongue: Probably not the best way to stand out...
What about having a compiler mode/flag (e.g. --convert-quotes, or default mode) that silently converts UTF-8 quotes to dumb quotes whenever it feels like it’s necessary?
@phoenixenero People that already know how to add a compiler flag don’t need curly quote support. Think of the children (in Rust experience)!
That string literals always use the same delimiter (with not even a choice of double or single quotes like some languages) make them much easier to grep.
Grepping for strings that you see in a program’s output is an important technique for navigating a large or unfamiliar code base. I’ve done it for example when reporting a Cargo bug, to find the relevant code.
It's not like adding these different quotes to Rust can solve
end users complaining that their config files and stuff stopped working
since it is those config languages that do not support other quotes.
@louy2 It would set precedent.
@SoniEx2
It is too irrelevant a precedent to be set. You are much better off changing the parser of whatever config language you are using, or write a setting UI for your users instead of requiring them editing the config files by hand. If you are facing true end-users that is. If you are facing semi-technical people, it is better they learn the differences between different quotes.
@chris-morgan I think children generally have no problem learning about different quotes...... Think of the seniors!
Switching quoting styles could be used to avoid escaping quotes (think alternation of double- and single-quotes in English nested quotations).
@Ericson2314 In english nested quotes and surrounding quotes are visually different but semantically identical. In programming language escaped quotes and surrounding quotes are visually identical but semantically different. Escape is not replaceable by switching quoting style.
We have raw strings for quote nesting problem already: r#"<as "many" as you want unescaped "quotes" go "here">"#
While I definitely don't want to see Unicode curly-quotes allowed in code, I'd suggest that we could do a bit more to help with the original motivation here:
I'm only asking this because I had my fair share of end users complaining that their config files and stuff stopped working after they edited them with notepad.exe and the OSX text editor.
To make it easier to catch this kind of error, how about having the compiler detect Unicode open and close quotes, and produce a more specific error telling the user to use " instead?
@joshtriplett that already happens:
error: unknown start of token: \u{201c}
--> a.rs:2:13
|
2 | let a = “a”;
| ^
|
help: unicode character '“' (Left Double Quotation Mark) looks much like '"' (Quotation Mark), but it's not
--> a.rs:2:13
|
2 | let a = “a”;
| ^
Would "smart english quotes do string interpolation" be a better idea?
@SimonSapin Perfect, thanks.
@SoniEx2 Having curly quotes delineate a _different_ kind of string seems even worse and more surprising behavior than having them delineate a standard quoted string.
@joshtriplett What about "japanese quotes do string interpolation"?
But, why? I mean, why would you do this, and is it worth the loss? What do you get from fancy interpolation when you got format!?
I just tested the bracket-quotes mentioned in the initial issue, and they produce the following:
test.rs:2:14: 2:15 error: unknown start of token: \u{300c}
test.rs:2 println!(「Hello world!」);
^
test.rs:2:14: 2:15 help: unicode character '「' (Left Corner Bracket) looks much like '[' (Left Square Bracket), but it's not
test.rs:2 println!(「Hello world!」);
^
If people commonly use these to quote strings, it might help to improve the help string here, to not just mention square brackets but also mention string quoting.
@ticki Well... When I mentioned it on IRC (a while ago) I came up with being able to call macros in strings...
It's nicer/less messy than format, altho more magicky...
I do think it'd be nice to have a shorthand for string interpolation, similar to Python's new format-strings (f""); I'd love to see an RFC for that. But I don't think that shorthand should use Unicode delimiters that most people won't have their keyboard configured to type, and that many people will not recognize as string delimiters. Nor should it use Unicode curly quotes that look very similar to the standard doublequote character; that seems far too subtle.
In addition, those particular bracket delimiters, as double-width characters, take up multiple columns in a fixed-width terminal, which further increases confusion and inconvenience.
I'd prefer this problem is solved with tooling, rather than changing the language.
@rust-lang/lang I'd like to propose putting this RFC in FCP, with disposition to close.
Based on the extensive discussion of this already, and the existing diagnostics in rustc to help detect the copy/paste scenario, I don't see much support for this proposal.
@joshtriplett Just a procedural note: since this is an issue rather than an RFC PR, we don't have any FCP procedure; the issue can just be closed.
Eh, I still think it'd be neat, but alright. Farewell Unicode.
Most helpful comment
Is this actually a common problem? Worth mention that we already display help text when one uses 'smart' quotes instead of 'dumb' quotes:
(playpen)