Julia: `\(` string interpolation: "inter\(pol)ate"

Created on 19 Dec 2017  Â·  25Comments  Â·  Source: JuliaLang/julia

An alternate syntax for string interpolation has been brought up by @ScottPJones on on discourse:

"Hello, \(person)."

The expression within parens is evaluated and interpolated. This syntax is used by Swift and has some significant advantages:

  1. The sequence \( is invalid in string literals and currently a syntax error, so this interpolation syntax doesn't give a different meaning to any otherwise useful sequence of characters like using $ does. This is particularly nice because of some of the common uses of $:

    • embedded LaTeX math mode, e.g. "$z = 2x + 1$"

    • monetary values such as USD, e.g. "$10.23".

  2. The ( and ) are part of the syntax so the extent of the interpolated expression is clear cut. In particular, there is a concern that if we want to expand the set of valid identifier characters in the future, the meaning of an interpolated string could change. For example, suppose x was not a valid identifier character in Julia v1.2. In code that's not a problem since abx would simply be a syntax error. In strings, however, since any character is allowed, you could write "pre$abxyz" and it would parse as "pre$(ab)xyz". If, in Julia 1.3, we started to allow x in identifiers, then that same interpolated string would parse as "pre$(abxyz)", changing the meaning of the program. In effect, the $identifier string interpolation syntax would force us to freeze the set of valid identifiers until a new major version of Julia.

As a counterpoint, we use $ consistently for interpolation not just into strings, but also for command objects and expressions – i.e. of values into expressions quoted with :( ... ) and quote ... end. For command interpolation, using $ matches the shell, which is quite nice, and for expression interpolation \( ) wouldn’t work since it’s valid Julia syntax. So if we changed string interpolation then it would be odd one out, which is a little unfortunate but not the worst thing ever.

If we wanted to transition fully in 1.0, we would have to deprecate $ string interpolation in 0.7 and introduce \( ) interpolation, and finally, disallow $ interpolation in 1.0. If we wanted to provide a longer grace period, we could by default warn once per program execution about the use of $ interpolation syntax, with some way of suppressing the warning, and then disallow the syntax in a later 1.x version. This is a particularly difficult syntax to change by simple search and replace, but a fairly easy syntax to change with a tool like FemtoCleaner, since change is purely syntactic.

If we didn't want to fully change interpolation syntax, and only wanted to mitigate issue 2 above (the more significant of the two issues, imo), then we would have to deprecated $ interpolation without parentheses. So "pre$(ab)xyz" would still be allowed, but "pre$abxyz" would not. However, this would be nearly as disruptive as changing interpolation syntaxes altogether.

strings

Most helpful comment

I don't buy the LaTeX argument. The main problem with typing LaTeX in string literals is not the $, but the fact that LaTeX extensively uses \, which has to be written as \\. This proposal does nothing for that. (And LaTeX also uses \(, for that matter!)

(This is why I wrote LaTeXStrings.jl. Also, having a special string type for LaTeX equations allows them to display nicely in IJulia. We also have raw"...", of course.)

My preference would be to keep $: for consistency with other kinds of interpolation in Julia and for conciseness in the common case of interpolating a single variable. I also don't think literal dollar signs are so common as to be a big imposition, except in contexts like LaTeX or regex where we need string macros anyway.

All 25 comments

This is particularly nice because of some of the common uses of $:

And also for quoting text. Docstring authors will have to double \\ (as they would be doing anyways), but won't have to quote $ when writing examples.

Another benefit is that it might be more extensible (and already has sample implementations of such https://github.com/JuliaString/StringLiterals.jl)

We can already extend string literals since \ is invalid before any non escape character. But yes, this syntax does generally require significantly less escaping.

I have commented on https://github.com/JuliaLang/julia/pull/15363#discussion_r157666453 about a good use case for this IMHO.

As you noted on Discourse, using $ consistently for interpolation in expressions and strings sounds like an advantage to me. @Ismael-VC sees it as a drawback, but I don't see the problem.

OTOH requiring parentheses after $ doesn't sound unreasonable. Even if it would break lots of code, the replacement syntax with parentheses is already supported on 0.6.

The minimally breaking approach to fix issue 2 above and allow identifier evolution without breaking code would be to have an identifier character whitelist and blacklist. The whitelist is the allowed identifier characters, while the blacklist is characters that will definitely never be allowed in identifiers. Then, we would allow "pre$ab|yz" to interpolate ab if | is in the blacklist, but not allow "pre$abxyz" as above if x is not in the blacklist since then it might become an allowable identifier character in the future. This would break a much smaller amount of existing code since I suspect that most of the time people use unparenthesized interpolation, the identifier is probably followed by space or punctuation or something else that's very clearly not part of the identifier.

Or we could just deprecate string interpolation, eliminating all of its problems :)

The one major advantage of string interpolation is conciseness, and requiring parens really hurts there. For example:

print("a = $b")

print("a = $(b)")

print("a = ", b)

The first line is the shortest, but requiring parens makes it longer than the non-interpolation version. That would be unfortunate.

+1 to disallowing "$abc<weird character>"; that seems sane.

While its true that parens are longer, escaping '$' in docstrings is awful for latex expressions - especially for readability in source and terminal.

I never understood, why '$' was used for interpolation, as its not at all uncommon to want to print '$' in a string.

I'd opt for swift-style.

At least, if we are going to change, let's choose \(...) over $(...) which would be more typing without solving the latex problem!

Another idea from that discourse thread: require prefixing string literals using $ interpolation with $ as in:

println(io, $”Hello, $name!”) # interpolated
println(io,  ”Hello, $name!”) # literal

There's a slight syntax clash here since in :($"hello") this already has a meaning, which is to interpolate the literal string value "hello" into an AST.

If we're going to change string formatting, it would be nice if there was a convenient (and efficient) way to do formatting so that we could get rid of @sprintf and friends. e.g. if we used \(, then perhaps allow a second argument, e.g.

"foo \(x, FixedDec(2))"

instead of

@sprintf "foo %.2f" x

Since \ before any unofficial escape is now a syntax error, we can introduce formatting into string literals in the future in a non-breaking way with any character following a \ that doesn't already mean something.

I don't buy the LaTeX argument. The main problem with typing LaTeX in string literals is not the $, but the fact that LaTeX extensively uses \, which has to be written as \\. This proposal does nothing for that. (And LaTeX also uses \(, for that matter!)

(This is why I wrote LaTeXStrings.jl. Also, having a special string type for LaTeX equations allows them to display nicely in IJulia. We also have raw"...", of course.)

My preference would be to keep $: for consistency with other kinds of interpolation in Julia and for conciseness in the common case of interpolating a single variable. I also don't think literal dollar signs are so common as to be a big imposition, except in contexts like LaTeX or regex where we need string macros anyway.

The minimal change for 1.0 is to implement an identifier blacklist and only allow bare name interpolation when the identifier is not followed by something not on the blacklist (i.e. followed by either a blacklisted character or the end of the string). I kind of like the idea of the $ prefix for $-interpolating strings, but I'm not that against just continuing with $ unprefixed.

I really don't see the point of blacklisting "weird" characters following interpolated identifiers. It means you need parens $(...) in more cases, makes the syntax harder to explain, and errors in interpolation syntax are obvious pretty quickly so I'm skeptical that this helps debugging much.

I thought I was pretty clear in my explanation above: without such a rule, adding any character to the list of characters allowed in identifiers is a breaking change. If you're ok with not adding any more identifier characters until 2.0, then we don't have to do that, but if we want the option to add identifier characters in 1.x, then we have to do at least this.

Sorry, I missed that explanation, @StefanKarpinski. A blacklist sounds fine, then, though it's a little work to nail down. Operator characters, #, whitespace, and most punctuation?

Operator characters, #, whitespace, and most punctuation?

That seems like a good start. We can always expand the blacklist since doing so makes more string interpolations valid, not fewer (the terminology is a bit confusing), so starting with a conservative set is ok.

Does LaTeXStrings.jl support interpolation and/or escaping?

@Liso77, only if you use the explicit constructor LaTeXString("....") with an ordinary string literal (i.e. escaping backslashes etcetera). I've thought about implementing interpolation with some other character that isn't allowed in LaTeX equations, e.g. %x for an unescaped %.

(The nice thing about string macros is that you can implement your own escaping/interpolation syntax. e.g. regex has its own escaping, and PyCall has its own interpolation syntax with $ and $$ in py"...".)

  1. I understand obvious reasons if you like to have LaTexStrings as simple as possible. I understand also if you like to support interpolation in the future too. I just think that comparing idea where interpolation is integral part with something where it is not (yet) is missing something important.

  2. I think that we could implement \(...) interpolation to LaTexStrings without needing to escape every \. Am I wrong?

The problem is that \(...\) already has a meaning in LaTeX. Anyway, since LaTeXStrings uses a string macro, it can implement any interpolation syntax it wants. So its choice is not relevant to the syntax used for ordinary Julia string literals.

We should do the blacklist part of this for 1.0 but it's a low priority and can be done after the feature freeze since this won't affect much code in the wild.

Kudos to @simonbyrne 's suggestion, as with

"foo \(x, FixedDec(2))"

it would be possible to softcode the format (generate the format at runtime), like:

prec = 2
"foo \(x, FixedDec(prec))"

while @sprintf requires a literal string format spec; therefore, neither option below currently work:

prec = 2
@sprintf("%.$(prec)g", π)        # fails
fmt = "%.$(prec)g"
@sprintf(fmt, π)                 # fails
@sprintf(@eval "%.$(prec)g", π)  # fails

We have packages for this now. We decided not to do the blacklist option for 1.0. But someone could still make a PR to alter the stdlib and think about doing this for 2.0

The interpolation syntax still seems perfectly viable.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

musm picture musm  Â·  3Comments

dpsanders picture dpsanders  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

omus picture omus  Â·  3Comments

manor picture manor  Â·  3Comments