I would like to propose Go add support for a HEREDOC syntax to make adding literals of particular precarious strings easier.
A common syntax in many programming language is <<< (boundary)
to open and a line containing just said boundary to close.
I would propose something along the lines of:
sql := <<< SQL
SELECT `foo` FROM `bar` WHERE `baz` = "qux"
SQL
My personal reasoning is for MySQL queries.
Myself and my company work with MySQL a great deal. Backticks are used to quote tables and fields in MySQL. Our queries will often contain both numerous quotes and backticks - particularly queries generated by tooling.
There is no way to escape a backtick in a backtick string in Go, so we end up either a using double quotes string and escaping all the quotes within or using backticks and breaking out of the string on backtick (ala `x` + "`" + `y`
)
Currently we end up with something like
sql := "SELECT `foo` FROM `bar` WHERE `baz` = \"qux\""
or in cases with massively more quotes than backticks I'll do something like
sql := `SELECT foo FROM bar WHERE `+"`baz`"+` = "qux"`
These examples are toys obviously, but this become much more of an issue on large 30+ line report queries - and more importantly makes copying queries out of code and into a MySQL client a real pain.
One possible approach that doesn't require changing the language is something like
sql := strings.ReplaceAll(`string with doubled quotes where ""bar"" == ""quuz""`, `""`, "`")
This is not a compelling use case, one should not be constructing sql query strings by hand. ~dragons~vulnerabilities lurk there
@davecheney writing the query with bind parameters doesn't mean there aren't backticks, right?
SELECT `foo` FROM `bar` WHERE `baz` = ?
Or do you literally mean that developers shouldn't write SQL at all and should construct queries via ORMs or other tools? (In which case that sounds quite subjective.)
Number one of the top ten OWASP security vulnerabilities is SQL injection. I think the OPs case would be strengthened by choosing a different example.
@davecheney I don't believe that addresses my questions. Avoiding SQL injections doesn't imply that you never write SQL text in Go code. OWASP's own SQL injection cheat sheet has plenty of SQL text including column names.
This example is perhaps not ideal because foo
, bar
, and baz
don't actually need any backtick quoting in MySQL as far as I know.
But I am sympathetic to the problem that the OP has text they would like to copy-paste into Go code and cannot simply do it without some kind of O(n) text transformation. I have run into this in the past when copy-pasting text into a Go program (I believe it was Go source code that included backticks). The copy-paste problem isn't helped much by @ianlancetaylor's workaround, either.
Perhaps, as in some other languages, double backticks could become an escape for a backtick? This would likely be a backwards compatible approach that doesn't require a new string quoting type. Eg:
const query = `SELECT ``foo`` FROM ``bar`` WHERE ``baz`` = "qux"`
Maybe use double back quotes?
sql := `` SELECT `foo` FROM `bar` WHERE `baz` = "qux" ``
But I am sympathetic to the problem that the OP has text they would like to copy-paste into Go code and cannot simply do it without some kind of O(n) text transformation
This to me is the crux of the problem. I canāt paste from my tooling into Go, and I subsequently canāt paste from Go into my tooling. Suggested solutions that still involve some form of escaping donāt solve that.
Further I donāt think the sql injection argument is relevant here. My example is not injecting any user data into the string. There are plenty of reasons one would want literals in their queries.
As someone who has dealt with Shell syntax for years, please don't add heredoc strings to Go. They are a can of worms.
For example, a number of questions start popping up:
I donāt see any of those questions as actual problems perse with heredoc.
Some of them like the internal white space arenāt really in question at all. Some are up to the language designer as implementation details. None are problems.
- What happens to any tabs indenting the heredoc body? Are they included as part of the string?
- What about other whitespace, like leading or trainling spaces?
All white space within the boundaries is part of the string. Thatās not a hard question, thats not in question. Thatās the point of heredoc. Itās a literal āhere be the documentā, WYSIWYG.
- Can leading or trailing whitespace be used in the line that finishes the heredoc?
Depends on the language rules, Iāve used languages that allow it, Iāve used languages that donāt. Iād personally vote no, I think allowing leading whitespace on the closing boundary just adds trouble.
func main() {
sql := <<< SQL
Everything between the boundaries taken
As literal
String
Content
SQL
}
- What if the heredoc is never finished?
Same as if you donāt close a quoted or backtick string? Syntax error. I donāt think thatās a legitimate question. Is there a language where this is not the case?
- What is a valid heredoc delimiter? Any valid identifier? What if the identifier is already declared in the scope?
Another implementation detail. Iād vote any non whitespace containing series of runes. Maybe limit it to Unicode letter and number runes? Itād be nice for internationalization though to allow non ascii for sure,
Itās just a boundary for the string. Itās scope begins and ends with the string.
I am not suggesting heredoc syntax as the only solution. But sometimes the inability to escape a backtick in a raw string literal becomes very painful depending on the type of application one is writing.
As a general rule, I do not like to use concatenation to write string literals. So whenever I have a situation where a string has a mix of backticks and double-quotes, I have to jump some mental hoops in calculating whether the string has more double-quotes or does it have more backticks, and which escaping scheme do I use to make the code more readable.
There have been past proposals which all have been declined https://github.com/golang/go/issues/24475, https://github.com/golang/go/issues/23228, https://github.com/golang/go/issues/18221. The suggestion throughout seems to be to use + to join the strings. But readability wise, I personally think, it takes more mental effort to read a +
concatenated multi-line string, than a single \
escaped string (or any other non-concatenating mechanism).
Better SQL example where quoting is actually required (on MySQL at least):
SELECT `group` FROM my_table;
Yāall need to find a better justification for this change.
https://dev.mysql.com/doc/refman/8.0/en/string-literals.html
On 24 May 2019, at 19:25, MOZGIII notifications@github.com wrote:
Better SQL example where quoting is actually required under MySQL:
SELECT
group
FROM my_table;
ā
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
I think it's good enough. There are a few practical cases where the lack of the ability to use both quotes and backticks in the strings is a real pain point. Working around it is currently possible, but ugly. The lack of a sound and complete way to specify string literals in the language is a real issue, ignoring it is a poor response. I don't think it requires justification, because to me it's obvious. Is it not to everyone else?
It boils down to the fact that using the following for coding backtick really sucks:
`+"`"+`
This is what we have to use right now, and it derives naturally from the language rules.
However, it's too long to type and is extremely ugly.
Yes, it's possible to use string replace instead, but it hurts readability too, cause you still can't put actual backticks where they're supposed to be. This sucks, because not only the runtime has to do additional work to evaluate the actual value, but humans reading and writing the code too have to do additional processing in mind. This immediately skyrockets the difficulty of working with a particular code base that uses those tricks, compared to regular Go code.
There are other workarounds, but all of them have serious drawbacks if you think about them. That is why I think it's really a language flaw.
Now, the natural solution to this issue would be heredoc syntax. I don't see any argument on why you wouldn't want that to be in the language, except the "it's not justified" one. Is it hard to implement, does it violate some design constraints on the lexer? Why not just add it?
First of all, I think many of us can agree that having a nicer way to have multiline string literals without worrying about quotes would be nice. All I'm saying is that I don't think heredocs is a good solution.
Can we please have a civil discussion about it without resorting to "this sucks", "not a legitimate question", or dismissals like "to me it's obvious" and "why not just add it"?
Thatās not a hard question, thats not in question. Thatās the point of heredoc.
Well, that's not how heredocs work in shell. See https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/V3_chap02.html#tag_18_07_04, in particular <<-
, which has support for stripping tab indentation.
It's fine if you just want the equivalent of <<
, but then clarify that in your proposal. If you borrow the language from POSIX shell, I'm going to wonder if the equivalent of <<-
is also supported.
I think allowing leading whitespace on the closing boundary just adds trouble.
Not allowing any leading indentation is a possible outcome, but it means that heredocs within indented code would look out of place. It's a tradeoff, and the proposal should be clear about what side it decides on.
Is there a language where this is not the case?
Yes, bash, which I presume is your point of reference.
Another implementation detail. [...] Itās just a boundary for the string. Itās scope begins and ends with the string.
Sorry, but I disagree. A proposal to make such a large change to the language spec should be very clearly defined. This includes what can be a valid delimiter token/word/identifier/etc.
Please excuse me, bad day :)
I'd prefer to borrow heredoc syntax from Ruby, it has very nice properties: https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Here+Documents
Also, here's an alternative from @rogpeppe almost ten years ago: https://groups.google.com/forum/#!msg/golang-nuts/IVyT2ovIljQ/KJggKkrYGCMJ
var render = Parsetemplate("|
|<html>
|<body>
|<h1 $header>
|$text $etc
|</body>
|</html>
")
It wasn't implemented, and it hasn't been formally proposed since, but it has the nice properties that it supports indentation and nesting.
You still can't use ` and " in the same string with that, can you?
EDIT: nvm, it doesn't use ` at all.
I still think a better solution would simply be to figure out a way to allow backticks in a raw string rather than add yet another way to declare strings. I feel that improving the current features is a better idea than introducing new, non-orthogonal features.
The pipe format doesnāt solve one of my at least biggest wishes - that format still would not fulfill my goal of being able to paste text to and from a literal string without transformation.
@mvdan I apologize. I didnāt mean offense and have never used heredoc in bash. Only PHP, Perl and Ruby. I wrongly assumed some things constants of the classification that apparently are not.
The pipe format doesnāt solve one of my at least biggest wishes - that format still would not fulfill my goal of being able to paste text to and from a literal string without transformation.
This is program text. If you want to be able to include arbitrary text in your program, you're always going to need some transformation unless you put the text in some external file.
That said, the pipe format does make it trivial to take arbitrary text and put it in your program: take your text and pipe it through:
sed 's/^/|/'
(alternatively use whatever "indent-by-a character" functionality you might have available in your editor of choice).
The transformation can be applied to the surrounding code though, and not to the actual text you want to embed in the program text. Heredoc is one particular implementation of an idea that for any finite text we can build a sequence of codepoints that won't appear in that text, and use that as a boundary on both ends of the text. This has one interesting property that the text between the boundaries can be left intact. Heredoc is not the only implementation of this, Rust, for example, utilizes the same idea for raw string literals: https://rahul-thakoor.github.io/rust-raw-string-literals/. Another example if this is mime format: boundaries are exactly this. The downside of this approach is the boundary string is non-universal and in theory it's difficult to come up with a unique character sequences. In practice though, it's often trivial when the code is hand-written.
Escaping is another approach, it's more compact but it actually requires transformations to be applied to the literal you're embedding.
The approach with prefixing every line with a char (like |
) is yet another approach. It still requires transformations applied to the string, but those are more consistent and only applied in the beginning of the line. This is kind of similar how we're used to format code comments actually. Despite it looking very odd, we may want to give it a shot.
Since Go is UTF-8 native, perhaps there is a "rare, very unlikely to ever be in any naked text symbol" that we could add as a synonym for back-tick. I recently used...
const (
sCotangentL = "[" // "\u005b" left square bracket
sCotangentR = "]" // "\u005d" right square bracket
sMeasureL = "ā¦" // "\u3014" left tortoise shell bracket
sMeasureR = "ā¦" // "\u3015" right tortoise shell bracket
sPrimeL = "{" // "\u007b" left curly bracket
sPrimeR = "}" // "\u007d" right curly bracket
sHistoryL = "Ā«" // "\u00ab" left-pointing double angle quotation mark
sHistoryR = "Ā»" // "\u00bb" right-pointing double angle quotation mark
)
...the left and right tortoise shell brackets and double angle quotation marks, for example, in a textual data format. Double angle quotation marks appear too frequently in German text to be used here, but I bet collisions between tortoise shell brackets and things people want to quote in Go code is slight. (My const declaration above being the one counterexample. ;-)
But we can do better. Choose a symbol or symbol pair that "nobody" uses. Here is my candidate for a pair:
sql := ā¶SELECT `foo` FROM `bar` WHERE `baz` = "qux"⬱
Is it not abundantly clear that something special is happening between the THREE RIGHTWARDS ARROWS and the THREE LEFTWARDS ARROWS? They have a solidity and suggest to my mind "everything between _here_ and _here_."
If that makes you think too much of Neptune's trident, we could look to math. Here is what "is identical to" looks like...
sql := ā”SELECT `foo` FROM `bar` WHERE `baz` = "qux"ā”
...which is nicer from the meaning standpoint and suits the single-marker nature of back tick. This single-character raw text UTF synonym is my vote:
_proposed: add U+2261 IDENTICAL TO as an alternative to back tick in marking raw text._
A further general comment is to recall a subtle feature Ken Thompson built into ed, sed, etc. We all know the basic syntax for substitutions -- "s/old/new/" -- but how do you deal with the awkwardness of slash in the pattern or replacement text? People seem quick to reach for the '\' to escape the special meaning, but ken built a simpler way, the '/' is whatever character follows the s. (Sed is just as happy with "s5old5new5" and "s|old|new|".) The result is that so long as you know a safe character, it can be your marker. Perhaps there could be some way to say "here comes raw text with the next code point as my marker." (Still, though, in the days of UTF let's just use a really-low-conflict symbol or symbol pair, even though ken is a genius.)
I personally don't like the idea of just picking an arbitrary UTF8 symbol and using that as a quotation. I think the idea of letting the user define a symbol is an okay idea, however I still believe that figuring out a way to improve what we already have with raw strings is a better idea. Go already has two ways to define a string literal, and it doesn't need a third.
Perhaps if any identifier appears directly before a raw string, the raw string must also terminate with the identifier. So that way you would do something like:
oldRawString := `this
still
works
`
moreConvenient := SQL`
SELECT `foo` FROM `bar` WHERE `baz` = "qux"
`SQL
newRawString := × `this
also
works
you can also use × in the string as long as it isn't preceded by a backtick!
`×
I think this brings the best of both worlds, making it so that we aren't introducing an entirely new way to define a string. It's similar to heredoc, but in my opinion it works a bit better since it's improving on existing features rather than introducing yet another way to define a string
Although it's now been withdrawn (or perhaps postponed), JEP 326 is still a good read about the problems of adding raw string literals to Java. Heredocs get a mention towards the end but not in a particularly favorable light.
As far as Go is concerned, it seems to me that the only really satisfactory solution to embedded back-ticks is to have an alternative syntax for raw string literals. Introducing a form of escaping for back-ticks simply doesn't cut it as far as copy/pasting is concerned and would compromise the verbatim nature of raw strings.
That means you need a different kind of delimiter which preferably doesn't involve single quotes, double quotes or back-ticks as they're often used together. Possible candidates are #, $ or ~
which I don't think are currently used for anything else.
The question, of course, is whether it's really worthwhile introducing an alternative syntax to solve what for many people isn't a serious problem. The answer to that is that it probably isn't worthwhile but that I think is the reality of the situation.
We don't have to use a new character. We could, for example, emulate C++ and write
sql := R"delimiter(SELECT `foo` FROM `bar` WHERE `baz` = "qux")delimiter"
where delimiter
can be any string that does not appear in the string constant itself.
I'd like to say that one thing I find awkward about backquote-quoted strings is that you can't indent the contents along with the rest of the code. Reading code with multiline backquoted strings in it can be confusing.
That's why I have a soft spot for my idea outlined above, because with that, you can easily indent without worrying about messing with the string contents.
I feel like that could be made into a separate proposal, maybe a function in the strings
package. I definitely like the idea, I've used it quite a bit in Kotlin.
Well, I suppose you could mimic the C++11 way of doing things but, for a simple language like Go, that really would be using a sledgehammer to crack a nut :)
@rogpeppe - IIUC, your proposal does not address the escaping concerns voiced in this thread.
Especially from this line -
Other than that, all the rules are the same as for the
other kinds of string literals.
Have I missed something ?
@ianlancetaylor - Any thoughts on @deanveloper's proposal ? That does nearly what you say but extending the backtick syntax and using a user-picked identifier as the terminating string.
I agree that we don't need a third way to define a string. Backticks are the ones most commonly used for raw string literals. I am leaning towards extending the backtick syntax to allow `, ", ' anywhere. But I think allowing a user-picked identifier to terminate things will make codebases non-uniform. Different developers will choose different identifiers depending on their use-cases.
How about we choose a fixed identifier to start the string, but stop only if we encounter a single backtick as the last line, immediately followed by a newline ?
Something like -
raw`
hi this is `a cool text
with back`ticks or "" allowed anywhere
even ' quotes
closing with
`
I think this takes care of the most common cases where backticks are usually mixed with other characters in a single line. This does have a glaring issue that strings will always end with a newline, which I don't like. But maybe this can be improved upon.
To me, one of the key features would be to allow encoding of the arbitrary text, as long as boundary identifiers are not in that text. The reason it's so important to me is because it would be a generic solution.
This means that the codebases will become non-uniform, indeed, but I feel that functionality is more important here. To make code more uniform with user-defined identifiers, we could apply certain restrictions to the identifiers. For example, similar to how it's done it rust, we could only allow the use of #
in the identifier. This would make identifiers configurable, yet user-specified, which is merely enough to cover all possible cases.
Example:
str := ##`
My value!
Can contain ` and # (and other stuff like " ' etc...)
`# is even allowed, since it doesn't have two #-s
... string literal still continues
`##
It is always possible to add more #
if there's a need - therefore every possible string can be encoded.
@agnivade I like that idea too, however it means that all of those raw strings' data will always end in a newline. This also may not be intuitive:
worksFine := raw`
Hey look, this works!
`
compileError := raw`my test string`
What if one needs to encode a string that contains a single backtick on a line followed by an empty line? Or a string with multiples of those? I don't think it'd be very convenient to escape all such cases.
@agnivade @deanveloper Yes, I think
delimiter`string`delimiter
also works.
@agnivade
@rogpeppe - IIUC, your proposal does not address the escaping concerns voiced in this thread.
Especially from this line -
Other than that, all the rules are the same as for the
other kinds of string literals.Have I missed something ?
Yeah, I think that sentence is misleading. I believe I intended to mean that they behave just like normal string constants from a language point of view. There is no quoting necessary between the start-of-text character |
and the terminating newline. Specifically, it's OK to have any number of other |
, \
, or any other unicode characters except newline without further interpretation.
So you can take any text that holds valid UTF-8 with every line is terminated with a newline character, and quote it by inserting a |
character at the start of each line. This includes text that's already quoted that way (so quoting Go programs that contain this kind of string constant is easy, which isn't the case for backquotes).
@MOZGIII -
What if one needs to encode a string that contains a single backtick on a line followed by an empty line? Or a string with multiples of those?
Yes I know. I believe that is a corner case, and since it is a single character in the entire line, concatenating doesn't hurt that bad. Anyways, I don't like it. I like the Rust approach better. Have a single identifier, yet make it configurable. Strikes a good balance IMO.
@rogpeppe - Got it. While I think that is a neat solution, it involves writing an extra character for every newline and does not allow for copy pasting text easily. I think the tradeoffs are a bit skewed towards drawbacks. I would also like to address your point (a) (if it spans more than a page, it's not easy to see where string stops and program starts
). I think with syntax highlighting, it is not very hard. Atleast for me.
I think
anyIdent`string`anyIdent
or
configurableIdent`string`configurableIdent
both are good candidates with varying tradeoffs.
I am very slightly leaning towards configurable ident because while writing the string, I don't have to think much on what identifier to choose. I can start off with a normal backtick literal, and if I see that I need to escape a backtick, just add a configurableIdent
on both ends. If that does not suffice, add more. While in the case of anyIdent
, there is a slight mental overhead of scanning the string and coming up with the best identifier. Especially while copy-pasting text.
I think perhaps allowing any identifier, but having convention on what identifier to pick, would be good.
For instance, something like "raw" would be conventional:
raw`contents`raw
However, you are _allowed_ to change it to something else if you want. Sorta like for-loop indices - people _typically_ use i
, but can of course change it if they want.
Perhaps some other convention such as using all caps should be put in place too? That way it's much easier to see where the string starts and ends. I'm not entirely sure about this one, but I think it could be a good idea. So instead the above example would become:
RAW`contents`RAW
However this choice was mainly targeted towards long strings:
RAW`
invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
address:
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
ship-to: *id001
product:
- sku : BL394D
quantity : 4
description : Basketball
price : 450.00
- sku : BL4438H
quantity : 1
description : Super Hoop
price : 2392.00
tax : 251.42
total: 4443.52
comments: >
Late afternoon is best.
Backup contact is Nancy
Billsmer @ 338-4338.
`RAW
Implementation-wise, the problem with user-specified delimiters is that they have to be handled during lexing, which adds complexity to a simple, though still somewhat involved, stage and would need a lot of explanation in the language spec.
Using doubled-up backticks to escape the backtick in a raw string is much easier to handle during lexing and specification.
This is not meant as endorsement or to say that one way is better than the otherājust to note the differences in complexity.
still pleased at the simplicity of using a unicode replacement/alternative
to back tick (as in my previous comment. the "is identical to" is pretty,
has logical meaning, and i bet would never have a conflict in the future
history of go.)
On Thu, May 30, 2019 at 10:38 AM jimmyfrasche notifications@github.com
wrote:
Implementation-wise, the problem with user-specified delimiters is that
they have to be handled during lexing, which adds complexity to a simple,
though still somewhat involved, stage and would need a lot of explanation
in the language spec.Using doubled-up backticks to escape the backtick in a raw string is much
easier to handle during lexing and specification.This is not meant as endorsement or to say that one way is better than the
otherājust to note the differences in complexity.ā
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/32190?email_source=notifications&email_token=AB4DFJLEPUJXMOUDPOT7IGTPYAGJRA5CNFSM4HOXWGAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWS7KLI#issuecomment-497415469,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4DFJJPUKCGPGW7BNJOEWLPYAGJRANCNFSM4HOXWGAA
.
--
Michael T. Jonesmichael.[email protected] michael.jones@gmail.com
i bet would never have a conflict in the future history of go.
Until you're embedding Go code in Go code more than two levels deep...
fair enough.
On Fri, May 31, 2019 at 5:01 AM Roger Peppe notifications@github.com
wrote:
i bet would never have a conflict in the future history of go.
Until you're embedding Go code in Go code more than two levels deep...
ā
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/32190?email_source=notifications&email_token=AB4DFJPUZP2TC7DEEUOK2XDPYEHTHA5CNFSM4HOXWGAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVAYEI#issuecomment-497683473,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4DFJNFEIJ6VIEWUAJHHU3PYEHTHANCNFSM4HOXWGAA
.
--
Michael T. Jonesmichael.[email protected] michael.jones@gmail.com
We are not going to adopt "here documents". That is not a Go-like syntax.
If this is a problem worth addressing, it would be nice to adopt an approach that makes the current raw string literals a simple case of the overall approach. For example, perhaps an odd number of backquotes could start a raw string literal that must end with the same number of backquotes.
```This raw string can contain ` characters```
We permit any odd number so that the string can contain any sequence of backquotes.
We do not permit an even number because
``
is a valid string literal today (the empty string).
We are not going to adopt "here documents". That is not a Go-like syntax.
I don't personally feel
foo := <<< title
This is my string
title
is any more out of place in the language than
title:
for {
break title
}
That said, I relent.
Also re: the
```foo```
purposal, what if the string I want to encapsulate starts or ends with a backtick? Ending as such is likely common in ~SQL~ MySQL / MariaDB.
Fair point. (Although SQL doesn't seem like a good example since it also permits double quotes.)
How about > 2 backticks instead of just odd number?
Four backticks are syntax error currently, so it's safe to introduce.
@MOZGIII i don't like it personally, because allowing any even numbers of backticks introduces ambiguities:
var str1 = ``+`` // Go 1 says this is "", but under Go 2 this might be "+"
var str2 = ````+```` // Increase to 4 backticks, but still could be "" or "+"
So Go would have two options:
str1
would be "+"``+``
is "", which would make even numbers of backticks unusable because they would just be parsed as empty stringsExample of what I mean by the second option:
var str3 = ````
this is my raw string
````
There would be a syntax error on line 2. This is because str3
would be set equal to an empty string, and then the compiler would see this is my raw string
, which is not a valid line of Go code, and fail compilation.
@deanveloper you used two backticks in the example, but I proposed more than two. Two backticks clearly does not fit for the reasons you gave. However, with 3, 4 and any greater number of backticks there are no ambiguities - in Go 1 all of them are syntax errors.
The edge case here would probably be the ability to encode an empty string with this notation. I'd just prohibit that altogether - it'd probably be easier for the lexer that way, and it doesn't seems like a big issue to me. At least I could live with that.
What I don't like is the collision with the Markdown notation for code decoration. It may complicate using Go code in Markdown code sections. Maybe I can live with that too - but for me personally it kind of matters more than the ability to represent an empty string...
What do you think? I'm ok even with the must-be-odd number of backticks proposal - it's better than nothing, and it has it's own advantages.
Sorry for all of the edits to my comment which probably looks confusing post-edits.
I'm personally still a fan of my initial idea with requiring an identifier immediately before/after the respective opening/closing backtick. I don't know how parsing/lexing/compiling/etc works however, so I'm not sure of the severity about how much it would complicate the compiler. But it definitely doesn't collide with Markdown code fences :wink:
I'm not upset with the odd number of backticks rule however, it just seems "odd" to only allow odd numbers.
Actually, there is another pretty serious downside with the backticks - and that's collision with the backticks inside the string itself:
"`test`" != ````test````
This is a serious problem for me, cause it suffers from the kind of similar issue to the one that regular ` have - and that is support for representing backticks inside the string itself.
This brings me back to how well though this is in ruby. It has enough string literal forms to cover every case I can think of.
@MOZGIII Yeah, that was brought up by @donatj a few posts ago.
Also speaking of which, @ianlancetaylor, backticks and quotes are not synonymous in SQL. Backticks are used for quoting identifiers in order to make sure that you can select tables/columns named after keywords (or contain strange characters such as spaces and commas), while quotes are used for string literals.
For instance:
-- valid (MariaDB)
SELECT * FROM `database`.`table`
-- invalid (MariaDB)
SELECT * FROM "database"."table"
It would be great to have "is identical to" (or another Unicode marker) as a way to ultra-backtick ASCII text.
sql := ā”SELECT foo
FROM bar
WHERE baz
= "qux"ā”
...feels categorically simpler than a gang of backticks or any other in-band signaling. Thatās the core issue, that using an ASCII delimiter for arbitrary ASCII text always has exceptions by definition, and a surprisingly high incidence of them in cases like this SQL example (and Markup and ...) where youāre quoting something that is likely to already be quoting things. Recursive quoting tempts fate because āsmart people just like usā had the same ideas for quoting their thing, so when Go wants to quote it, the probability of collisions is very high.
OTOH, if Go used āmangoā at each end, and SQL used āroseā, then the space would be huge and collisions rare. It would look dumb, of course, but would not have the collisions of everyone using the same four quotation delimiters.
This is why I propose adding U+2261 IDENTICAL TO as an alternative to back tick in marking raw text.
I don't like rarely-used single rune unicode sequences mainly for the following reasons:
A further possibility which I don't think has been mentioned so far is to introduce a new single-character escape \` which would only be valid within raw string literals.
This would be analogous to the existing escapes \' (only valid in rune literals) and \" (only valid in 'ordinary' string literals).
The new escape wouldn't be ideal from a 'cut and paste' perspective as you'd need to go through and prepend each back-tick with a slash. However, this would be easier than having to split each back-tick out into an ordinary string literal and (at least to my eye) would stand out more than simply doubling each back-tick as well as being a rarer combination of symbols.
Compared to solutions which involve using an odd number of back-ticks as a delimiter, it also has the advantage that leading and trailing back-ticks are easier to read.
@alanfo Then you would have to escape \
as well, which is not backwards compatible.
Perhaps it would be best to continue the discussion elsewhere, as it's drifting further apart from the original heredoc proposal. It's always simple to file another, separate proposal once another idea has fully formed.
@mibk I don't follow why you would have to escape \
as well. Unless it was followed by a back-tick, a slash would be treated literally as it is now.
Also it would be backwards compatible as, at present, a raw string literal can't include a back-tick at all.
@alanfo Consider this example:
`\`
Is it an unterminated raw string with an escaped backslash, or a raw string containing a single backslash?
It's a raw string containing a single backslash as it is now.
I'll admit it's an awkward case for the parser to deal with but `\`` would be fine (a raw string containing a single back-tick) whereas ```` might be problematic.
@mvdan
Given that @ianlancetaylor said we're not going to adopt "here documents" but didn't close the issue and indeed came up with a suggestion on how else to deal with the same problem, I don't see why we shouldn't continue the discussion here. Otherwise the same points will have to be made all over again and there doesn't seem to be a consensus on an alternative proposal in any case.
@mvdan what @alanfo said, and also note that @ianlancetaylor retitled the issue to be more generic.
Of the ideas listed here, @deanveloper's original idea (now unfortunately hidden in the fold) seems by far the best to me.
All of @MichaelTJones's Unicode suggestions don't really work for quoting Go code itself, and are awkward for many of us to type.
The "more backticks" ideas discussed by @ianlancetaylor and others has the problem that it does not work for text that begins or ends with a backtick.
@deanveloper's idea doesn't have these issues. Really, the only one I see is the one pointed out by @jimmyfrasche: it adds a certain complexity to lexing that's different from anything in the language today. But I think that might be fundamental to any syntax which allows quoting arbitrary text.
I personally think that these syntaxes are quite a bit different from the original proposal which was asking for a feature that other languages implement, while the current discussion is simply about improving raw strings rather than implementing HEREDOC. I'll start a new proposal, which will include a lot of the discussion from this post.
@deanveloper It seems to me that most of these suggestions are things that other languages implement, or similar to them. Most current languages have some form of raw string literal these days.
My only concern with
delimiter`raw string with ` characters`delimiter
is that it doesn't lead with the fact that it is a string. C++ (R"delim( string )delim"
)) and Rust (r#" string "#
) and Swift (#" string "#
) are more clear as to when a string is starting.
My only concern with
is that it doesn't lead with the fact that it is a string.
That's a valid concern, it's a bit hard to see where the string starts and ends with long delimiters. However, with short delimiters, it seems to be much less of a problem:
// keep it with short delimiters
var x = raw`this is a string with ` characters`raw
// or all-capital letters? not previously seen as convention anywhere else in Go?
// this would make it much easier to see that it is representing a raw string.
var y = RAW`this is a string with ` characters`RAW
Perhaps establishing some sort of convention to use brief delimiters, maybe all-capital as well (such as SQL
and RAW
) is a good idea. Maybe golint
should enforce something like this? I'm not 100% it's a good idea to enforce it with golint
, but I do think that having a convention to use short and possibly capital delimiters would help with that aspect significantly.
I brought up the idea of this convention in this comment: https://github.com/golang/go/issues/32190#issuecomment-497315188 although it was for a different reason.
By the way I think I can partially revive my broken earlier suggestion by saying that writing N backquotes (N >= 2) followed by a double quote is a raw string literal that is terminated by a double quote followed by N backquotes.
s := ``"this is a `raw` "string" literal "``
fmt.Print(s)
prints
this is a `raw` "string" literal
It doesn't collapse nicely to the current raw string literals, but it does have the advantage of sticking to existing string quotation characters. Unless I've missed something again.
I actually like that idea. My only real issue with the original N backticks idea was that the "N is odd-only" restriction made it seem very inconsistent. It also fixes the issues with how badly the other syntax played with Markdown. I'll make sure to bring up this one in the proposal that I am working on (along with others that were in this thread).
I think the only real concern is that it would make current raw strings that start or end with quotes (ie `"my string"`
) confusing to look at for future learners of Go who do not know the history of raw strings.
I just bumped on a non-SQL use case regarding this, which I wanted to add as a datapoint.
I have an html template which I am storing as a backtick-quoted string. Now in that template, I have
Most helpful comment
@davecheney I don't believe that addresses my questions. Avoiding SQL injections doesn't imply that you never write SQL text in Go code. OWASP's own SQL injection cheat sheet has plenty of SQL text including column names.