This proposal was branched off of #32190, which was a proposed HEREDOC syntax for Go. It was concluded that HEREDOC was not the correct syntax for Go to use, however the proposal did point out a large problem that Go currently has:
There is only one option to use raw strings, which is the backtick. The nature of how raw strings works means that raw strings themselves cannot contain backticks, meaning that the current workaround for including a backtick in a raw string is:
var str := `My backtick is `+"`"+` hard to use`
Raw strings are often used for storing large strings, such as strings containing other languages, or Go code itself. In many languages, the backtick has significant meaning. For instance:
SELECT * FROM `database`.`table`
fun `a method with spaces`() { ... }
let str = `HELLO ${name.toLocaleUpperCase()}`
Of course there are far more examples of languages where the backtick is a significant character in the language. This makes embedding these languages in Go very hard.
If there were a fixed number of ways to declare raw strings, the problem would, no matter what, arise that you would be unable to put Go code inside of Go code without some kind of need to transform the code. This means that there needs to be a variable way to create raw strings.
This proposal highlights one brought up here. It essentially improves on the current way to declare raw strings, allowing the following syntax:
var stmt = SQL`
SELECT `foo` FROM `bar` WHERE `baz` = "qux"
`SQL
var old = `
this, of course, still works
`
var new = 고`this
also works
you can also use 고 AND `backticks` (separately) in the string!
`고
Essentially, raw strings can be prefixed with a delimiter, and the string is then terminated with a backtick followed by the same delimeter.
Strings which are densely populated with words and backticks may make it hard to pick a word to use as the delimiter for the raw string, as the word may appear inside the string, which would end the string early and cause a syntax error. Allowing _any identifier_ to be used as a delimiter would allow non-ascii characters to be used as well, meaning that in special cases, when it's _really_ needed, one can use a non-ascii character as their delimiter.
@jimmyfrasche https://github.com/golang/go/issues/32190#issuecomment-497415469
Implementation-wise, the problem with user-specified delimiters is that they have to be handled during lexing, which adds complexity to a simple, though still somewhat involved, stage and would need a lot of explanation in the language spec.
I don't like the idea of complicating the language. I do not work with the internals of the language, so I am unsure of the magnitude of complication to the lexer that this change would bring. If it is too much, I don't think that it would at all be worth it, and maybe one of the alternatives below would be a better fit.
@ianlancetaylor https://github.com/golang/go/issues/32190#issuecomment-501508497
My only concern with [syntax] is that it doesn't lead with the fact that it is a string. C++ (R"delim( string )delim")) and Rust (r#" string "#) and Swift (#" string "#) are more clear as to when a string is starting.
I share this sentiment. My response to this here was that establishing a convention to use short, noticable identifiers (ie RAW
, JS
, SQL
, etc) help with noticing where the string starts and ends. This could (possibly) be enforced by golint
, but I'm not sure if that is a good idea or not.
In #32190, there were several other alternatives that tried to achieve the same goal:
Essentially, you could start the raw string with a certain number of backticks, and it would have to end with the same number of backticks.
`````
A raw string which can contain up to 4 ```` backticks in a row inside of it
`````
This solution still had problems though. Strings cannot start with an even number of backticks, because any even number of backticks could also be interpreted as an empty string, introducing ambiguities. It also causes developers a bit of fuss when trying to get it to work inside of markdown, as markdown uses multiple backticks in order to signify a block of code.
Also, the strings could not start or end with backticks, which would be an unfortunate consequence.
This one is a breaking change, however I think it is my favorite solution out of all of the alternatives. It's the exact same as the previous one, but Go also introduces a breaking change to disallow empty raw strings. There is no need for raw strings to be used to represent an empty string, since the normal ""
can do that, and is much more preferable. The _only_ code this would break is people who have used a raw string to define an empty string by doing something like x := ``
or funcCall(``, ...)
. It may be good to do some research on if empty raw strings are ever used in real code.
This solution still has the issue of being annoying to use with markdown's code fences. The argument was used that we shouldn't make language decisions based on other languages, however I personally do not like this argument. Sharing code is part of what a programmer does, and Markdown is a very widely used markup language that uses multiple backticks in a row to define a code fence. This feature may make it a bit difficult to share Go code over anything that uses Markdown (slack, github, discord, and other services).
Despite making it difficult to share code via markdown-enabled chats, it is still easy to share code via something like gist.github.com
or play.golang.org
. If my original proposal proves to not work very well (doesn't feel Go-like, too difficult to implement, etc) I would love for this solution to be accepted in place.
This proposal is actually pretty nice. It's similar to the previous proposal. Essentially, the starting is N backticks (N >= 2) followed by a quotation mark, and the ending delimiter is a quotation mark followed by the same number of backticks. Example:
s := ``"this is a `raw` "string" literal"``
fmt.Print(s)
// prints:
// this is a `raw` "string" literal
This syntax is actually very nice in my opinion. It fixes the "odd-number-only" ambiguity from the previous example, as well as fixing the Markdown issue (as code fences must occur on their own line). It also fixes the "strings starting/ending with backticks" issue.
The only issue with this syntax is that it doesn't seem to work well with existing raw strings. I don't personally have data about how often this occurs, but I'd imagine that there are several times where raw strings are used to describe strings with quotes in them, making code like x := `"this is a string"`
common. Newcomers to Go may see this and think that the `"
is the delimiter to the raw string, when in reality the `
is the delimiter and the "
is part of the string.
However that critique may be a bit nitpicky. I do like this syntax a lot.
This alternative stated that Go should add another symbol to use to declare raw strings in Go. For instance, ⇶
to start the string and ⬱
to end the string. Go code is defined to be UTF8 so file formatting issues should not happen. Another proposed idea was ≡
(U+2261 IDENTICAL TO
).
This solution also has problems. What if our string has both backticks AND strange symbols (for instance if you were defining a list of mathematical symbols)? Or, what if you were trying to embed Go syntax inside of your strings? Also, the symbol is hard to type and not easy to find, so it may not be a good fit as a string delimiter.
In https://github.com/golang/go/issues/32590#issuecomment-687854491, another solution that I quite like was brought up, using a variable number of special characters. They propose using ^
, and then the delimiters for the string become ^`
and `^
, where the number of ^
symbols is variable. They also created an implementation of it here.
For example:
s := ^^`
func main() {
sql := ^`SELECT `foo` FROM `bar` WHERE `baz` = "qux"`^
fmt.Println(sql)
}
`^^
fmt.Print(s)
// prints:
//
// func main() {
// sql := ^`SELECT `foo` FROM `bar` WHERE `baz` = "qux"`^
// fmt.Println(sql)
// }
//
R"delim(string)delim"
r#"string"#
#"string"#
It's important that we have _some kind_ of variable delimiter, as that way if the string we are embedding somehow contains it, it is easy to change the string's delimiter in order to avoid the issue.
The delimiter doesn't have to be an identifier like it is in this main proposal, it could also be varying the number of backticks like the one a few paragraphs up.
Raw strings in Go are often used to be able to copy-paste text to be used as strings, or to embed code from other languages (such as JS, SQL, or even Go) into Go. However, if that text contains backticks, we need some way to make sure that those backticks do not terminate the string early.
I believe that the way to do this is allowing an identifier to precede the string, and to make sure that the terminating backtick must be followed by the same identifier in order to terminate the string.
var markdown = MD`
### Thank you for reading :)
`MD
Just found this as well: #24475
I did not see this one while making this proposal, my bad. But it does contain another interesting suggestion by @bcmills https://github.com/golang/go/issues/24475#issuecomment-377963413
Specifically, we could treat as a raw string any sequence of characters which:
- begins with a QUOTATION MARK character in Unicode catogory Pi, and
- ends with the corresponding QUOTATION MARK character in Unicode category Pf.
SQL uses backticks to signify a string that represents an identifier, such as a database name or table name.
I do not believe this is correct. Certainly for databases like MySQL and Postgress their quoting character is '
, ASCII 0x27.
Ref: https://dev.mysql.com/doc/refman/8.0/en/string-literals.html Section 9.1.1
Ref: https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS Section 4.1.2.1
@davecheney these databases use single quotes for strings, yes. (Some allow "
as well.)
Backticks are used by MySQL (and sqlite? any others?) if you need to quote identifiers: https://dev.mysql.com/doc/refman/8.0/en/identifiers.html (search for "backtick").
(Postgres uses "
for this purpose.)
I like @deanveloper's main proposal, but on reflection I like @ianlancetaylor's idea (``" a `raw` "string" "``
) even more. It is less flexible than using an arbitrary identifier, but that extra power seems largely pointless and introduces another place where people will have to choose/argue about a name.
@cespare Thank you for the correction.
I continue to assert that back ticks for SQL string construction is not a valid argument for a language change proposal -- users should not be constructing SQL query strings by hand. The fact that
a. Alternative syntax like '
(which I believe is part of the SQL standard, but I couldn't find a linkable reference) is supported
b. You only need to use back tick in cases where you have to quote reserved words, and the alternative exists now +"
"+`. Its not pretty, but that's because people shouldn't be constructing SQL strings by hand.
Gives weight to this position.
@davecheney I gave other examples, such as JavaScript and Go, which are both languages that I can see being put into raw strings. JavaScript, of course because Go is used a lot for web backends, so, depending on the circumstance, it could make more sense to embed a small script into a raw string rather than create an entire separate file for it. Go may make sense for code generation purposes if you wish to detect a raw string in a file.
Also, there is the possibility that someone may just need to encode a string that has many backticks in it.
I still personally believe that SQL is a valid use-case for this. SQL Injection attacks are caused by people using string concatenation in SQL queries, not necessarily because they are crafting queries by hand. The database/sql
library actually takes care of sql injection with the ?
syntax, which allows people to pass in user-generated arguments with automatic sanitation of inputs. It's a pretty well-known feature as far as I'm aware. Asking all users to use third-party libraries in order to query tables is unreasonable IMO, especially when Go already comes with standard library support for database/sql
.
a. Alternative syntax like ' (which I believe is part of the SQL standard, but I couldn't find a linkable reference) is supported
Double quotes are technically supported by MySQL, but only if ANSI_QUOTES mode is enabled (as shown by the page linked by @cespare). Enabling this mode also requires that string literals no longer use double quotes.
Also, while backticks are not technically required in SQL languages, they are still common practice. If a DBA sends me a complex query to use, I don't want to go through the whole thing and remove each backtick, or need to put in a `+"`"+`
instead.
@davecheney, backticks are not in the SQL standard, they're database specific. For example MSSQL doesn't use backticks, but MySQL and Postgres do. With those DBs there's no alternative to using backticks sometimes.
Often you just need to encode an SQL string literal without any dynamic construction (i.e. no variable substitution or conditional logic). This is not considered "constructing SQL by hand" and is, on the contrary, very recommended, while usage of query builders and ORMs is somewhat discouraged in Go.
while usage of query builders and ORMs is somewhat discouraged in Go.
Could you please cite a source for this position. Thank you
To clarify my position. If you’re using sql you should be using prepared statements. To the best of my knowledge when that is done the dB drivers take care of quoting for identifies and values. This is why I assert sql queries are not a good premise for this proposal.
Could you please cite a source for this position. Thank you
While this is not a direct source (I don't have one, but I also am not the one who claimed this fact so I think I get a bit of leeway):
The lack of support for generics is a huge bummer for SQL Builders, because they cannot have type safety, which is really the only advantage they had over ?
syntax. They also typically heavily rely on chaining syntax which Go also discourages with it's semicolon placement rules. This was also referenced in the design doc for error handling here (scroll to the ?
operator question). While not stating that chaining is discouraged, definitely stating that chaining is uncommon. Again, this is because of semicolon placement rules, which cause chaining (aka builder syntax) to become a bit ugly:
var builder myBuilder
x := builder.
AddThis().
AddThat().
SetFoo(bar).
Baz().
Build()
This is not a problem for simple queries, however as queries get more complex, it gets exponentially worse.
To clarify my position. If you’re using sql you should be using prepared statements. To the best of my knowledge when that is done the dB drivers take care of quoting for identifies and values. This is why I assert sql queries are not a good premise for this proposal.
Literals are common in SQL queries, in which case I need to worry about quoting. I don't want to replace every single one of my literals with a ?
. For instance, a simple query for selecting every user who used a certain coupon:
SELECT `user` FROM `users` JOIN `transactions` USING (`uid`) WHERE coupon_applied = "someCoupon"
There are definitely times where I want string literals embedded in my queries, and again as queries get more complex I definitely don't want to be moving string literals into ?
.
Well, the way Go standard library (database/sql
) is implemented is clearly in favor of writing the SQL literals. The reason for that is if you use some form of SQL builder or ORM you often don't pay enough attention to what's going on under the hood. This, in turn, leads to nasty performance and code logic issues.
Prepared or not, you still need to communicate your statement (aka query) to the database server, and this usually requires writing your own SQL literal string. Sure it'd be a big mistake to use string concatenation instead of binding the query params. Binding works for both usual and prepared statements. Btw, prepared statements have their own issues - for example they're very hard to manage with transaction based connection pooling (i.e. via pgbouncer) - so I tend to not use them that much. Performance wise, in many typical cases prepared statement slow down the overall system performance, rather than speed it up - so use them with caution and measure the effect.
@MOZGIII i disagree with most of your advice, yes prepared statements have a cost, but a very clear benefit, and you don't need to quote the ?
's which is my point.
There must be a better use case for this proposal than SQL strings.
@davecheney I've mentioned both in the original proposal and my first reply to you other possible use cases, but you never seem to address them. Even if they're relatively minor, there's just no good way to encode a multiline string that contains backticks. Is that really too much to ask, especially out of a general-purpose programming language?
@deanveloper if we take SQL quoting out of the proposal, then the use cases you're suggesting are writing snippets of other languages in Go string literals? How common is this in the wild?
@davecheney When variable binding is used, the actual variables and the query are typically passed as separate units to the database engine. I can assure that it is the case for postgres, but less advanced databases actually may do something different. But nonetheless, in postgres, the data packet to the database over the wire can be represented as the following tuple: (string, Value[])
. If you use the SQL literal SELECT ?::text
it will be sent over the wire as is (pretty much). Along with it, the bound variable will be sent, for example "hello world"
. Moreover, the bound value will be encoded with the proper wire format for strings. On the server side, the database' will first run the received statement through the parser, then match the ?
s to the values and then execute the actual query. Nowhere in this sequence the value is quoted or put in place of the ?
in the actual query. This is what we call variable binding. One of the additional bonuses of this approach is that database parser layer typically caches the parsing for statements (the statements tend to repeat a lot with this approach) - and it boosts the performance quite significantly.
Now, prepared statements. They can also use variable binding, but they don't have to, the same way as the regular, non-prepared statements. The problem with prepared statements is it's quite difficult to manage them on a per-session level, and nobody does that. The alternative is preparing the statement for every operation independently. This is slow, because it often causes multiple round trips. This may become a critical bottleneck, as it easily doubles the execution time in a well optimized system. As I said, variable binding is not only possible with the prepared statements. Under the hood it is implemented even for a regular SQL queries: https://github.com/jackc/pgx/blob/762e68533f0090ecb6bb1166d51966b326597ec7/query.go#L410-L453
This code completes in only one round trip, as long as all value types (OIDs) are known upfront.
Anyhow, let's get back to literals - I don't want this to be an SQL discussion. I would say that the lack of support to just copy and paste arbitrary string (SQL with quotes in particular) is a design flaw of the existing implementation. Especially while iterating on the whatever thing you need to represent as a literal - it's much easier to set the boundaries once rather than escaping the backticks on every iteration.
@davecheney Embedding markdown is another example. I feel like pretty much every other language that has backticks as part of the syntax may cause issues if we attempt using it in the current raw string literals implementation. Those are bash, ruby, perl, TeX and many more. You never know all the ways the feature that's not in the language could've been used. Frankly, even for features that are in the language it's very hard to tell all possible uses.
I think we should not accept "SQL is not a good example" as an argument. The rationale is a) the lack of a better example doesn't prove there's no problem, and b) SQL is not just a practical example, it is an actual pain point that I've encountered while solving my day-to-day tasks with Go.
Surely, more examples can shape the way we address the problem, but, unless we address the SQL pain point separately (which I don't see proposed here), I don't see why would we want to ignore it.
@davecheney I'm going to come at this from a different angle, actually. Regardless of if you want to use Go for code generation, which I personally do quite a bit, raw strings the way they are now are just poorly designed.
Raw strings are designed to store large amounts of text, or text which contains special characters ("
and \
), or even both, into a variable. However, if the string contains a different special character, then the entire purpose behind raw strings falls apart. At _least_ in other languages they have safeguards such as using two different characters in a row as a delimiter to a string, making it quite a bit harder for the string to be terminated early. However, even this solution still has the problem with code generators being a bit messier.
The more argument (and lack of agreement) I see on this matter, the more convinced I become that it's not worth doing anything here and we should just live with what we have.
The present syntax for raw strings has the desirable properties (for a simple language such as Go) of being very easy to type, parse, read, explain, understand and remember. The only problem it has is not being able to include the back-tick character and opinions differ on how acute this problem actually is in the wild.
All the proposed solutions (including those suggested by myself in #32190) either seriously compromise one or more of these properties and/or deal with the back-tick problem at the expense of creating a similar problem with some other (more rarely used) character.
It's also worth remembering that, apart from the workaround mentioned by @deanveloper in his opening post, there are (at least) two other ways of dealing with the back-tick problem at the present time:
Changing the back-ticks in the raw string to some other unused character and then applying strings.Replace
on the resulting string to change them back to back-ticks.
Using fmt.Sprintf
in the following manner:
s := fmt.Sprintf(`This is a "raw" string including %cback-ticks%[1]c.`, '`')
fmt.Println(s)
// This is a "raw" string including `backticks`.
Admittedly none of these solutions is exactly "nice" but, on balance, I think they are better than introducing some new (and probably contentious) syntax to deal with the back-tick problem which I don't think everyone regards as being particularly serious in any case.
This has my support. I'm a voice for Unicode as an out of (ASCII) bounds signalling means, but admit that while it solves the easy case of quoting all normal situations, it cannot alone address recursive self quoting; for that we must have a case-specific delimiter, and identifier-backtick is a clean, simple, universal mechanism.
It is trivial to parse so I dispute any pushback on inconvenience of implementation. In addition to the likely uses...
`say, "plugh"`
xxx`say, "plugh"`xxx
Δ`say, "plugh"`Δ
...it also allows an extreme case: generate a huge random integer (256 bits, say), encode that in identifier form (letter or underscore followed by (letter|underscore|digit)*), and use that to blind quote text without looking at it and knowing it will work. This may never come to pass, but I like that it could:
randomQuoteTag2f4d87ef38eb88c21a00104006339a15e9cbe43f2610c247314cef54ab5f5db8`unlikely to have collisions`randomQuoteTag2f4d87ef38eb88c21a00104006339a15e9cbe43f2610c247314cef54ab5f5db8
This is something you could rely on when, say, packaging files in strings, like a go generate tool that pastes a copy of a source file into that source file.
Admittedly none of these solutions is exactly "nice" but, on balance, I think they are better than introducing some new (and probably contentious) syntax to deal with the back-tick problem which I don't think everyone regards as being particularly serious in any case.
My personal issue with this solution is that you then must apply transformations to the string before the code can compile. This is especially undesirable if you do not have syntax highlighting, which is a crutch that Go should not depend on. It is also annoying to write a program to apply the string transformation for you, as you cannot put the original string into a Go program.
@deanveloper I write a lot of SQL and SQL system and I write a lot of Go. I'm neutral on the core of this proposal. However the motivation for a change is important.
When building SQL strings, you may want to quote identifiers and quote text/time/(a few other cases) values. It is assumed that you will use value parameters where appropriate. Also of note, a prepared query and and a query that uses value parameters are not directly correlated in most system. Different database system support different ways to quote identifiers and values. SQL Server uses square brackets []
to quote identifiers and single quote marks for values only. PostgreSQL uses double quote marks for identifiers and single tick marks for values.
Let's take your SQL problem and address it first. For this problem, the solution you have will work, but is not one I would recommend. It is too ad-hoc for a system. Spend a half a day and we can make a better solution.
Spend a little bit more time in discovery to find other motivating pain points. The example you gave just isn't that motivating to me.
package sqls
import (
"github.com/golang-sql/sqlexp"
)
// Replacer sets up replacement classes for SQL strings.
type Replacer struct {
Quoter sqlexp.Quoter // If nil, ANSI SQL is assumed.
IdentiferReader rune // If zero value, ':' is used.
VariableReader rune // If zero value, '@' is used.
}
// Replace reads a format string and replaces identifier names that start with the IdentifierReader
// followed by by at least one letter class rune and zero or more letter or number class runes.
// Similar for VariableReader names. Values may be either a map[string]interface{} or a typed
// struct. If a struct is used reflection will be used.
func (r Replacer) Replace(f string, values interface{}) (string, error) {
//...
}
func TestReplacer(t *testing.T) {
r := &Replacer{}
out, err := r.Replace(`
select
:MyColumns/t1,
:OtherColumns/t2
from
:MyFirstTable t1
join :MySecondTable t2 on t1.ID = t2.:LinkColumn
where
1=1
and :Where1/t1 = @MyValue
`,
map[string]interface{} {
"MyColumns": []string{"ID", "Name"},
"OtherColumns": []string{"BookName=Book"}, // Identifiers are inspected for "=" for alias.
"MyFirstTable": "Author",
"MySecondTable": "Book",
"LinkColumn": "AuthorID",
"Where1": "Location",
"MyValue": "US",
})
fmt.Println(out)
// OUTPUT
/*
select
t1."ID", t1."Name",
t2."Book" as "BookName"
from
"Author" t1
join "Book" t2 on t1.ID = t2."AuthorID"
where
1=1
and t1."Location" = 'US'
*/
}
There are lots of ways to solve the problem you presented. This would be one of them. I would argue it would do a better job of what you are wanting then what raw strings (current or improved) can give you. You could implement this as a runtime step or as a pre-compile step. I've also used text/template + some custom template functions to construct SQL.
Constructing SQL is fine. Carefully including values in SQL text can be okay if done through a system (like above) to prevent silly mistakes. I don't care if you use parameters or not, I don't care if you prepare your SQL or not (modern SQL engines a prepare is useless).
Again, SQL is not the only target of this. Raw strings as they are now, are poorly designed as illustrated in https://github.com/golang/go/issues/32590#issuecomment-501566356. That comment is a better illustration of the problem that we currently face than all of this arguing on SQL Queries and how they should be done.
@kardianos that, or there could've just been a string literal. It is a beautiful workaround for a problem that shouldn't exist in the first place. With this you can either do a compile-time codegen - in which case just having a .sql
file wrapped with Go literals with proper escaping is sufficient, and the end result can be a statically allocates string from the binary image - or it can be applied at runtime - probably during the system initialization, cause actually constructing an SQL query like that for every operation would result in lost cycles (unless you actually generate different strings - which is possible, but comes close to what I'd avoid doing in the first place, as the dynamic values, either from the user or from the system itself, we always use the standard API to pass them to the driver via Args
list).
The problem with the example above is that, while it solves one part of the problem - and that is allowing you to get a program that invokes the query you want - it doesn't address other parts - among those are the ability to author the same literal that the app will use. This is more important for literals talk than the ability to actually somehow make the string value that you want appear at runtime. Go already addresses this problem, it just does it poorly (we have the `
notation in the language).
As a counter example of why we actually need to have good literals for raw strings: any byte array (even non-unicode) we can represent in a Base64 encoding, embed into the regular "
string (no need for even `
), decode it in runtime and be happy with it. The downsides of this solution should be obvious.
Spend a little bit more time in discovery to find other motivating pain points.
@MOZGIII Any string literal by itself won't automatically escape value or identifiers, nor will it expand an array of columns nor will it escape and join together a list of strings or other values. I personally find these things important. You may not.
@kardianos so, what you mean is your code does something valuable that's not directly correlated to representing literals, and as a bonus it solves the issues with `
. That's great, but what if you don't need all that other stuff? I personally sometimes just need to put whatever SQL I have agreed on with my peers in to the code - and with that workflow it's important that the query is actually the same that was used in prototyping so other people are familiar with it. Especially under the conditions where time is money and the SQL is typically 20+ lines. That said - those are all second order issues, and they key thing here is to improve the literals the language provides.
Are we still in the "why?", or are we in the "how? already? If we're in the "how?" then I don't see the point of looking into the workarounds to the SQL use case - let's use that example to see what literals we can offer to solve the pain point.
PS: just noticed markdown supports multiple backticks for verbatim substrings. This means we can play around with this idea easily!
Example: (working)
1) `a`
2) ``a``
3) ```a```
4) ````a````
5) `` ` ``
6) ``` ` ```
7) ```` ` ````
Result:
1) a
2) a
3) a
4) a
5) `
6) `
7) `
Example: (not working)
1) ``
2) ```
3) `````
1) ``
2) ```
3) `````
I must say I like qwe`value`qwe
-style the most currently. Something like qwe"value"qwe
or ##`value`##
works too, as well as ```"value"```
. To choose one among all the possible options, I guess the best way is to try writing some code pretending the change is already there and compare the solutions.
Strings cannot start with an even number of backticks, because any even number of backticks could also be interpreted as an empty string, introducing ambiguities.
I think this could be solved in a non-backwards-compatible way by simply disallowing empty raw strings. There's never any particular need for an empty string to be a raw string, as it's completely identical in terms of functionality to ""
. Just define a raw string to be started and ended by a matching amount of any number of backticks in a row and don't try to figure out if the user was trying to create an empty string or not.
The Markdown point definitely stands, although it would be possible to solve it for a given string by simply using more than three backticks as the delineator. Could be kind of annoying if you have to do that, but it's certainly better than something like
const src = `
To create a blockquote, prefix the lines with ` + "`>`" + `. For example:
` + "```markdown" + `
> This line is part of a blockquote.
> This line is also part of the same blockquote.
` + "```"
I don't think we should let the syntax of other languages rule over Go's syntax.
The main problem with a variable opening and closing sequence of a string is that it would complicate the Go language lexer significantly, and would slow down the compiler. As you can see here https://github.com/golang/go/blob/master/src/go/scanner/scanner.go, the lexer has a single character lookup, and normally decides what to do based on the current and the next character only. If the string prefix and suffix are variable then the Go lexer will have to keep track of that prefix and the expected suffix, not to mention that a syntax like FOOBAR
FOO...
var stmt = SQL`
SELECT `foo` FROM `bar` WHERE `baz` = "qux"
`SQL
... makes it hard for the lexer to distinguish between a variable SQL and a string that starts with SQL.
The proposal of @bcmills #24475 on the other hand would be relatively easy to add, and have the same characteristics as having a "variable" opening and closing sequence without too much performance downsides. A hard coded look up function for the 10 pairs would be sufficient. This would lead to the following all becoming strings:
«string»
‛string’
‟string”
‹string›
⸂string⸃
⸄string⸅
⸉string⸊
⸌string⸍
⸜string⸝
⸠string⸡
Then it's a matter of convention to use, say ⸠string⸡ for SQL, etc.
I disagree. Limited look ahead is a tenet of the underlying theory in
lexical analysis and parsing. The proposal at hand says “an unapostrophe
optionally preceded by an identifier. That implies the identifier matching
code would change to mean “read chars until an unsuitable identifier char,
if it is an unapostrophe then this is a raw string opening, else an
identifier.” It will keep reading the raw string. On the closing side, the
lexer will see the unapostrophe and peek one character ahead to see if that
is a valid first character of an identifier, if so, it keeps reading until
the first non-identifier character. It then has the end of raw string
symbol. If the opening and closing identifiers match, the raw string is
matched and is the next token. If not, raw string reading continues.
Simple change.
@hosewiejacke it's not about languages syntax. It's about the fact that the language concept of "everything you put between these two delimiters is part of the string" can't handle a backtick character.
Backtick characters end up being more prevalent because of languages such as Markdown and SQL, but that doesn't mean that the other languages syntax is the cause of the problem.
@beoran I used to think it was a good idea, but looking at it now on a different device, half of them don't render properly. I'm not sure if it's a consequence of the characters you chose or the device I'm using yet, however.
Edit - it was caused by my phone, it renders fine on my work computer.
@DeedleFake I'd actually be fine with this solution as well.
I've updated the original proposal to include "variable number of backticks + no empty raw strings" as an alternative solution.
I just thought I'd throw in my 2¢ with the case of a real world sql query I actually want to put in Go.
This one is particularly apt because it contains a number of columns that are otherwise reserved words like end
as well as double quoted strings
Table names have been redacted.
This query is written by tool rather than by hand to begin with, which for better or worse auto-backticks every identifier. The query itself could also likely use some optimization:
Click for Big Ol' Query
SELECT `uu_udh`.`location_id`, `uu_udh`.`redacted_level_id`, SUM( `uu_udh`.`redacted` ) `redacted_sum`,
COUNT( * ) `redacted_count`
FROM (
SELECT `ubh`.`location_id`, `uglh`.`redacted_level_id`,
(
SELECT `redacted_ability`
FROM `ulh_redacted`
WHERE `ulh_redacted`.`user_id` = `u_udh`.`user_id` AND `created` <= "2019-01-01"
ORDER BY `ulh_redacted_id` DESC
LIMIT 1
) `redacted`
FROM (
SELECT DISTINCT `udh`.`user_id`
FROM `udh_redacted` `udh`
WHERE ( "2019-01-01" > `udh`.`start` AND ( "2019-01-01" > `udh`.`end` OR `udh`.`end` IS NULL ) ) AND
`udh`.`deleted` = 0
) `u_udh`
INNER JOIN `ubh_redacted` `ubh` ON `u_udh`.`user_id` = `ubh`.`user_id`
AND ( "2019-01-01" > `ubh`.`start` AND ( "2019-01-01" > `ubh`.`end` OR `ubh`.`end` IS NULL ) )
INNER JOIN `ugh_redacted` `uglh` ON `u_udh`.`user_id` = `uglh`.`user_id`
AND ( "2019-01-01" > `uglh`.`start` AND ( "2019-01-01" > `uglh`.`end` OR `uglh`.`end` IS NULL ) )
) `uu_udh`
WHERE `uu_udh`.`redacted` IS NOT NULL
GROUP BY `uu_udh`.`location_id`, `uu_udh`.`redacted_level_id`
Note that this actual query I'm working with ends in a backtick. making the ``` … ```
methodology troublesome.
Converting that to a string looks something like this on one line.
Using a multi-line strings with this number of backticks, as the code stands would be outrageous.
"SELECT `uu_udh`.`location_id`, `uu_udh`.`redacted_level_id`, SUM( `uu_udh`.`redacted` ) `redacted_sum`,\n\tCOUNT( * ) `redacted_count`\nFROM (\n\tSELECT `ubh`.`location_id`, `uglh`.`redacted_level_id`,\n\t\t(\n\t\t\tSELECT `redacted_ability`\n\t\t\tFROM `ulh_redacted`\n\t\t\tWHERE `ulh_redacted`.`user_id` = `u_udh`.`user_id` AND `created` <= \"2019-01-01\"\n\t\t\tORDER BY `ulh_redacted_id` DESC\n\t\t\tLIMIT 1\n\t\t) `redacted`\n\n\tFROM (\n\t\tSELECT DISTINCT `udh`.`user_id`\n\t\tFROM `udh_redacted` `udh`\n\t\tWHERE ( \"2019-01-01\" > `udh`.`start` AND ( \"2019-01-01\" > `udh`.`end` OR `udh`.`end` IS NULL ) ) AND\n\t\t\t`udh`.`deleted` = 0\n\t) `u_udh`\n\t\t\t INNER JOIN `ubh_redacted` `ubh` ON `u_udh`.`user_id` = `ubh`.`user_id`\n\t\tAND ( \"2019-01-01\" > `ubh`.`start` AND ( \"2019-01-01\" > `ubh`.`end` OR `ubh`.`end` IS NULL ) )\n\n\t\t\t INNER JOIN `ugh_redacted` `uglh` ON `u_udh`.`user_id` = `uglh`.`user_id`\n\t\tAND ( \"2019-01-01\" > `uglh`.`start` AND ( \"2019-01-01\" > `uglh`.`end` OR `uglh`.`end` IS NULL ) )\n) `uu_udh`\nWHERE `uu_udh`.`redacted` IS NOT NULL\nGROUP BY `uu_udh`.`location_id`, `uu_udh`.`redacted_level_id`"
This is why I personally would adore a solution to multiline strings that allows backticks.
What I actually ended up doing was manually carefully removing all the backticks I could on non-reserved words. It's a remarkable PITA.
Nicely put. Sometimes the strongest argument is simply a clear image of the
problem.
On Fri, Jun 21, 2019 at 2:09 PM Jesse Donat notifications@github.com
wrote:
I just thought I'd throw in my 2¢ with the case of a real world sql
query I want to put in Go.This one is particularly apt because it contains a number of columns
that are otherwise reserved words like end as well as double quoted
stringsTable names have been redacted.
This query is written by tool rather than by hand to begin with, and could
likely use some optimization:
Click for Big Ol' QuerySELECT
uu_udh
.location_id
,uu_udh
.redacted_level_id
, SUM(uu_udh
.redacted
)redacted_sum
,COUNT( * )
redacted_count
FROM (SELECT
ubh
.location_id
,uglh
.redacted_level_id
,( SELECT `redacted_ability` FROM `ulh_redacted` WHERE `ulh_redacted`.`user_id` = `u_udh`.`user_id` AND `created` <= "2019-01-01" ORDER BY `ulh_redacted_id` DESC LIMIT 1 ) `redacted`
FROM (
SELECT DISTINCT `udh`.`user_id` FROM `udh_redacted` `udh` WHERE ( "2019-01-01" > `udh`.`start` AND ( "2019-01-01" > `udh`.`end` OR `udh`.`end` IS NULL ) ) AND `udh`.`deleted` = 0
)
u_udh
INNER JOIN `ubh_redacted` `ubh` ON `u_udh`.`user_id` = `ubh`.`user_id` AND ( "2019-01-01" > `ubh`.`start` AND ( "2019-01-01" > `ubh`.`end` OR `ubh`.`end` IS NULL ) ) INNER JOIN `ugh_redacted` `uglh` ON `u_udh`.`user_id` = `uglh`.`user_id` AND ( "2019-01-01" > `uglh`.`start` AND ( "2019-01-01" > `uglh`.`end` OR `uglh`.`end` IS NULL ) )
)
uu_udh
WHEREuu_udh
.redacted
IS NOT NULL
GROUP BYuu_udh
.location_id
,uu_udh
.redacted_level_id
Note that this actual query I'm working with ends in a backtick. making
the…
methodology troublesome.Converting that to a string looks something like this on one line.
Using a multi-line strings with this number of backticks, as the code
stands would be outrageous."SELECT
uu_udh
.location_id
,uu_udh
.redacted_level_id
, SUM(uu_udh
.redacted
)redacted_sum
,\n\tCOUNT( * )redacted_count
\nFROM (\n\tSELECTubh
.location_id
,uglh
.redacted_level_id
,\n\t\t(\n\t\t\tSELECTredacted_ability
\n\t\t\tFROMulh_redacted
\n\t\t\tWHEREulh_redacted
.user_id
=u_udh
.user_id
ANDcreated
<= "2019-01-01"\n\t\t\tORDER BYulh_redacted_id
DESC\n\t\t\tLIMIT 1\n\t\t)redacted
\n\n\tFROM (\n\t\tSELECT DISTINCTudh
.user_id
\n\t\tFROMudh_redacted
udh
\n\t\tWHERE ( "2019-01-01" >udh
.start
AND ( "2019-01-01" >udh
.end
ORudh
.end
IS NULL ) ) AND\n\t\t\tudh
.deleted
= 0\n\t)u_udh
\n\t\t\t INNER JOINubh_redacted
ubh
ONu_udh
.user_id
=ubh
.user_id
\n\t\tAND ( "2019-01-01" >ubh
.start
AND ( "2019-01-01" >ubh
.end
ORubh
.end
IS NULL ) )\n\n\t\t\t INNER JOINugh_redacted
uglh
ONu_udh
.user_id
=uglh
.user_id
\n\t\tAND ( "2019-01-01" >uglh
.start
AND ( "2019-01-01" >uglh
.end
ORuglh
.end
IS NULL ) )\n)uu_udh
\nWHEREuu_udh
.redacted
IS NOT NULL\nGROUP BYuu_udh
.location_id
,uu_udh
.redacted_level_id
"This is why I personally would adore a solution to multiline strings
that allows backticks.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/32590?email_source=notifications&email_token=AB4DFJLHZX2YVNSVM27VQM3P3U7SNA5CNFSM4HXVGYU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYJSZAY#issuecomment-504573059,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4DFJL3REHM2FXTOUMA5VDP3U7SNANCNFSM4HXVGYUQ
.>
Michael T. Jonesmichael.[email protected] michael.jones@gmail.com
It's important that we have some kind of variable delimiter, as that way if the string we are embedding somehow contains it, it is easy to change the string's delimiter in order to avoid the issue.
I don't believe this is true.
As it wasn't mentioned as one of the alternatives in the description above, I'd like to mention this possibility again.
Putting a quote character at the start of each line has some nice properties:
The down side is that you can't use this syntax to represent strings that are not newline-terminated, but I don't believe that's a huge down side - I believe that the strings we're talking about are overwhelmingly multiline strings, and it's not hard to strip a trailing newline if needed.
Putting a quote character at the start of each line has some nice properties
As the opener of the previous ticket, one of my primary goals is to be able to paste a large string without having to do ideally any manual reformatting / reencoding, hence my heredoc recommendation. The quote per line solution doesn’t really solve this.
I don’t think quote per line would add much actual help over a bunch of concatenation of double quoted strings.
The Java folks are looking to introduce something called Text Blocks
(JEP 355) which replaces the previous proposal for Raw String Literals
(JEP 326). This is worth reading because (AFAIK) it represents some fresh thinking on the matter.
One thing I like about text blocks
is that the first line of text is automatically aligned with the remaining lines - you can just paste the text straight in.
One thing I don't like is that they still permit certain escape sequences to be included in the text.
I'm on the fence about whether it's a good idea for the compiler to remove 'incidental' leading and trailing white space.
If something like this were to be entertained for Go, there appears to be at least two drawbacks:
You can't write them on one line - they'll always need at least two.
Assuming no escape sequences were permitted in the text (surely a no-no for Go), you wouldn't be able to include triple double quotes therein as this would be interpreted as the closing delimiter.
I don't really think (1) is much of a problem as the use cases for such a feature would probably involve multi-line text in any case.
(2) could be solved (if it's worth solving at all as we'd still have back-tick delimited strings and it would rarely occur anyway) would be to allow the delimiter to contain a variable number of double quotes.
As I said earlier I'm dubious about whether it's worthwhile doing anything here though, if something is to be done, the text block
idea may be worth considering.
As the opener of the previous ticket, one of my primary goals is to be able to paste a large string without having to do ideally any manual reformatting / reencoding
I see that that's nice to have, but ISTM that this is directly opposed to the property of being able to see whether a given section of text is quoted or not.
The convention of putting a character at the start of each line is widespread (think shell comment blocks, markdown ">"-quoted sections), and it seems to work well - it is very readable.
You say manual reformatting, but any decent text editor will make it trivial to indent each line with a character, so I don't think that my suggestion is incompatible with your goals.
@rogpeppe One of the main goals of this proposal is to make it so that raw strings can contain backticks. Raw strings are meant to be the "exactly what is between the backticks are what appears in the string", yet a raw string that contains backticks is impossible to represent without pretty annoying workarounds. I think that adding a way to fix the indentation problem would be a good idea to add to the strings
package, though.
@alanfo Another one of the goals was to improve current features rather than introduce new ones. It seems to be a cool idea though, however Go already has 2 "flavors" of strings, I'm not sure if introducing a third is a good idea.
@rogpeppe One of the main goals of this proposal is to make it so that raw strings can contain backticks.
I guess it depends whether "raw strings" are a goal in themselves, or whether we're really after a better way to represent arbitrary multiline text that can contain arbitrary characters including backticks.
It seems to be a cool idea though, however Go already has 2 "flavors" of strings, I'm not sure if introducing a third is a good idea.
It seems to me that the current proposal is in effect proposing the introduction of a third kind of string though admittedly it would be less of a departure from the current raw strings
than text blocks
would be.
@rogpeppe I like the basic idea of indented strings, but I note that they don't have to be in the language itself. People could write
s := strings.Indented(`|
|this is a long
|string with an indent
|character`)
I think these come up rarely enough that adding the call to strings.Indented
(or preferably a better name) doesn't seem too onerous to me.
This differs from the concept of a raw string literal that contains a backquote, which I think does have to be in the language itself.
@ianlancetaylor This is a similar idea to Kotlin's String.TrimMargin method which works well in that language and may be a better name for it.
They also have a String.TrimIndent method which automatically detects the minimal indent without the need for a special character.
@ianlancetaylor Though it's relatively minor most of the time, the problem with making it a function call is that it can't be a const
.
@ianlancetaylor Yes, I've used that technique in the past, but it's not entirely satisfactory.
As @DeedleFake points out, you can't make const strings that way.
Secondly, unless special knowledge of the strings package were baked into gofmt, it wouldn't be able to indent such strings appropriately to the surrounding code.
Thirdly, that API isn't great because such a string can be malformed, in which case you'd need either an error return, a runtime panic, or some string that ends up sensitive to indentation, of which no option is particular palatable.
Most importantly though, that technique doesn't address the fundamental problem this issue is trying to solve, which is that you still can't put a backquote in there, something that is addressed if this kind of string representation is in the language itself.
Understood about the backquote issue; I was thinking of the function as a way to address your issue while the backquote issue would be addressed separately. I think that @donatj 's comment above is compelling: it's desirable to be able to simply paste a raw string into a Go program and just add appropriate quoting at the beginning and end. So I don't think your suggestion, though useful in various cases, addresses the real problem of this issue.
The discussions before resulted in a very workable, general proposal: an
optional identifier before and after the backtick: a := "hello" ; b :=
hello
; c := ianthello
iant You commented at the time that it did not
"look quoty" to you. it looks perfect to me,
On Wed, Jul 10, 2019 at 2:33 PM Ian Lance Taylor notifications@github.com
wrote:
Understood about the backquote issue; I was thinking of the function as a
way to address your issue while the backquote issue would be addressed
separately. I think that @donatj https://github.com/donatj 's comment
above is compelling: it's desirable to be able to simply paste a raw string
into a Go program and just add appropriate quoting at the beginning and
end. So I don't think your suggestion, though useful in various cases,
addresses the real problem of this issue.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/32590?email_source=notifications&email_token=AB4DFJKOOJ4LUKVJCLTWLXLP6ZISNA5CNFSM4HXVGYU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZUZOUA#issuecomment-510236496,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4DFJOIPRTVNW27QY2AL7DP6ZISNANCNFSM4HXVGYUQ
.
--
Michael T. Jonesmichael.[email protected] michael.jones@gmail.com
@MichaelTJones I don't hate it. But when reading the code I'd somewhat prefer to see immediately that I am looking at a string, rather than seeing iant
, which looks like an identifier, and then seeing "
, at which point I see that it is not an identifier, but is actually a string.
I think that most other languages have raw string literals that preserve that property: it's clearly a string from the start. (C++: R"delim( string )delim"
; Rust: r#" string "#
; Swift #" string "#
).
@ianlancetaylor I can agree a bit with that, I think that @DeedleFake's suggestion may be a better syntax.
Empty raw strings are pretty useless and rarely used [citation needed], it might be a good idea to break a small amount of compatibility and disallow empty raw strings, and allow a variable number of backticks instead.
maybe the idiomatic usage would be quotetext
quote
On Wed, Jul 10, 2019 at 4:10 PM Dean Bassett notifications@github.com
wrote:
@ianlancetaylor https://github.com/ianlancetaylor I can agree a bit
with that, I think that @DeedleFake https://github.com/DeedleFake's
suggestion may be a better syntax.Empty raw strings are pretty useless and rarely used [citation needed], it
might be a good idea to break a small amount of compatibility and disallow
empty raw strings, and allow a variable number of backticks instead.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/32590?email_source=notifications&email_token=AB4DFJNLTJXSBGEH3QZYU4DP6ZT4TA5CNFSM4HXVGYU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVABWI#issuecomment-510263513,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB4DFJM3DR34EIJRAUNX2PDP6ZT4TANCNFSM4HXVGYUQ
.
--
Michael T. Jonesmichael.[email protected] michael.jones@gmail.com
I am sympathetic to @donatj's comment about pasting a raw string as I have encountered something very similar. In my case, it was editing an existing string rather than just copying it. I will just copy over my comment from https://github.com/golang/go/issues/32190#issuecomment-504782097 again since this proposal is for general raw string improvement.
Comment copied below---
I had an html template as a raw string. Now in that template, I have
Most helpful comment
Here is another real life situation that I don't believe has been mentioned yet.
The language defined by text/template uses
"
and`
for strings, just like Go, with the same semantics.Inside a template, I want to write a string literal that includes many
"
characters. To avoid excessive escaping, I wrote the literal using a raw string with backticks. However, the template text is itself inside Go code, so this doesn't work. I ended up doing something like this:whereas with one of these proposals, I could just directly write the text I want to use. For example, using my favored syntax (@ianlancetaylor's):