Julia: Can/should we take repr seriously?

Created on 13 Sep 2019  Â·  21Comments  Â·  Source: JuliaLang/julia

Currently, repr in Julia is maybe evaluatable or parseable or (often) just a string representation only consumable by humans. However, it could be much useful if it is guaranteed to be evaluatable. For example, you can find a lot of examples in base/loading.jl which is used as simple IPC mechanism:

https://github.com/JuliaLang/julia/blob/9a8b2fd72b675bb8a5bf0322943ee9451787b86a/base/loading.jl#L1148-L1151

In case of base/loading.jl it is OK to use repr because the types used there are known to have evaluatable repr. However, some basic types like PkgId cannot be used in this way:

julia> repr(Base.PkgId(InteractiveUtils))
"InteractiveUtils [b77e0a4c-d291-57a0-90e8-8db25a27a240]"

Another concrete example I have in mind where repr can be useful is to create a plot "spec" using DataVoyager.jl and dump it into a runnable Julia script. Currently, using it needs a rather convoluted invocation like show(IOContext(stdout, :compact=>false), "text/plain", spec) because repr is not a reliable entry point for evaluatable code.

We have Serialization module for serializing complex Julia objects. Unfortunately, it cannot be used for communicating with different Julia versions. For complex serializations it is probably a good idea to use something like JSON instead of Julia code. However, I think the above examples are good enough motivation to have repr for simple cases.

It was suggested to add a flag like parseable (e.g., https://github.com/JuliaLang/julia/pull/33178#issuecomment-530294366, https://github.com/JuliaLang/julia/issues/30683#issuecomment-531223984) via the IOContext mechanism. I have a concern about this direction because it is practically impossible to enforce parsability this way (a brief discussion with @JeffBezanson in https://github.com/JuliaLang/julia/pull/30757#issuecomment-456936172). I think the overloading API should be clear that the implementer _has_ to make it evaluatable. This can be done by introducing overloading API such as show(::IO, ::MIME"application/julia", obj) or repr(::IO, obj).

Questions:

  • Do we need an API to produce Julia code that is (very likely to be) evaluatable? (100% guarantee is impossible when using different versions of libraries and/or Julia)
  • How should it be implemented? Based on IOContext? New entry point (e.g., show(::IO, ::MIME"application/julia", obj))? Something else?
display and printing

Most helpful comment

The current repr is simply documented and defined to be not much more than sprint(show, _). I'd advocate for its deprecation in 2.0. I think any fully round-trippable function probably needs to be opt-in (with no default definition) and should have a new name completely separated from show.

All 21 comments

I have a concern about this direction because it is practically impossible to enforce parsability this way

I don't really see why --- telling people to handle the flag doesn't seem any worse than telling people to add a particular method.

Aside: eval-uatable or parse-able can be totally unrelated things, so it's not necessarily meaningful to say it would be both. There's also just many things that don't have a eval representation, but can be serialized or printed.

A big part of the problem seems to be that it's hard to characterize how array/dataframe elements should be printed in the REPL. They don't need to be parseable, and we want to use a "nice"/non-parseable representation when possible, but we also want to print type info like the f0 suffix, and we want to quote strings. It's a mix of requirements that's hard to associate with a single function like show or print that should have a simple description.

I don't really see why --- telling people to handle the flag doesn't seem any worse than telling people to add a particular method.

I believe relying on people to not forget something is a bad idea. People forgetting to define HasEltype is a good example. I think overloading API should _help_ people recognizing that they must construct (at least) parseable representation.

It's a mix of requirements that's hard to associate with a single function like show or print that should have a simple description.

Isn't it an argument favoring adding an API for repr implementation? It would decouple repr from display and print.

OK, adding repr(::IO, x) would be pretty simple and worth considering.

One thing I want to discuss more is, what are the use cases for non-parseable show? Are there any types where we want print, show, and repr to all be different? I'm skeptical. Again, it seems most of the complexity is in how we want array elements to look, rather than in needing many different kinds of printing per se. For example, given these definitions:

print: pretty, concise, doesn't need to be parseable
show: parseable

I think the desired array element printing is very close to:

default: print
numbers: print if type is implied by eltype, otherwise show
strings: show

Any examples where that would go wrong? And are there any other contexts like this?

One thing I want to discuss more is, what are the use cases for non-parseable show?

The only reason I can think of is backward compatibility. It would be great if show(::IO, x) were not used for array's text/plain show and it is used only for repr. I'm proposing a new entry point since it sounds difficult for entire Julia community to switch to it. It probably is easier to introduce repr(::IO, x) and then deprecate show(::IO, x) while switching to Julia 2.0.

I understand that co-existence of similar APIs is very unsatisfactory. But addition of a new API is the only option I can think of to clean up show infrastructure and make repr work while preserving backward compatibility (which of course can be due to my lack of imagination).

I think the desired array element printing is very close to:

default: print
numbers: print if type is implied by eltype, otherwise show
strings: show

I suggest to not include this logic inside array printing. Rather, I think array printer should just set compact and typeinfo for text/plain show and then the above logic should be implemented inside text/plain show of each type. This is because:

  • It's usable for other printing of collections (especially table-like data structures).
  • It avoids confusion that show can be used to pretty-print element values. (OK, maybe this is weak, given that not many people implement numbers)

I think array printer should just set compact and typeinfo for text/plain show

It also should set displaysize to handle something like a vector of vectors, so that inner array can switch to printing like [1, 2, 3] when the height is 1.

I think we should be using MIME types to control this?
I thought we already were.

Yes, we use MIME for that. We briefly used an IOContext flag for it (multiline instead of displaysize[2] == 1, but otherwise basically the same). It was a violation of the purpose of IOContext, and thus did badly—search the old issues for this. We could potentially test for both options (MIME(text/plain) + displaysize[2] == 1 is a valid test combo in 3-arg mime-show), but the caller would need to be prepared to handle multiline output then (since IOContext values are only advisory).

I think #34387 is a big step for fixing the situation (thanks to fchorney!). Ref: https://github.com/JuliaLang/julia/issues/30901#issuecomment-573450132

But not everything is resolved since show(io::IO, ::MIME"text/plain", X::AbstractArray) still does not set IOContext(io, :compact => true) always

https://github.com/JuliaLang/julia/blob/1d918dd8d5bf6316458f4ea480d521ac005713b0/base/arrayshow.jl#L329-L332

That is to say, 3-arg show for element types still cannot detect if it is called by the show of the container in a reliable manner.

@JeffBezanson You suggested to keep this in https://github.com/JuliaLang/julia/pull/34387#discussion_r368733261 (just for now?). Is there a plan to improve this?

Also, it would be nice if fallback to 2-arg show

https://github.com/JuliaLang/julia/blob/1d918dd8d5bf6316458f4ea480d521ac005713b0/base/arrayshow.jl#L108-L111

can be removed in a future version of Julia. If 2-arg show is meant to be used for repr, we should expect this to generate larger output on average. Furthermore, the fallback to 2-arg show motivates users to overload it to customize how it is printed inside container. This often contradicts with how repr works.

Instead of requiring repr to return a Meta.parse-able string, I would consider a new uneval to return a eval-able expression. Like eval (or perhaps more like macroexpand), you can provide uneval with a module, since names require different qualification depending on which module you evaluate them in. I think this is the reason for an IOContext :module key.

I can't tell if that punts most of the problem to now serializing expressions into strings, but I do like that it separates it, and I think it addresses https://github.com/JuliaLang/julia/issues/33260#issuecomment-531887799.

example:
OffsetArrays doesn't "repr it as you build it", so a less verbose version (filter out the parser provenance LineNodes, and begin blocks) of this would be great

import OffsetArrays: OffsetArray

uneval(x) = quote $x end

function uneval(o::OffsetArray)
  p = uneval(parent(o))
  a = uneval(axes(o))
  oa = uneval(OffsetArray)
  quote
    ($oa)($p, $a)
  end
end

a = OffsetArray(rand(3,3), (3:5, 3:5))
@show repr(a)
println("show:")
show(stdout, "text/plain", a)
println()

@show string(uneval(a))
@show eval(Meta.parse(string(uneval(a)))) == a

```
repr(a) = "[0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967]"
show:
3×3 OffsetArray(::Array{Float64,2}, 3:5, 3:5) with eltype Float64 with indices 3:5×3:5:
0.390171 0.52951 0.153272
0.267477 0.982007 0.303269
0.438121 0.0148684 0.363546
string(uneval(a)) = "begin\n #= /private/tmp/uneval.jl:10 =#\n (begin\n #= /private/tmp/uneval.jl:3 =#\n OffsetArray\n end)(begin\n #= /private/tmp/uneval.jl:3 =#\n [0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967]\n end, begin\n #= /private/tmp/uneval.jl:3 =#\n (3:5, 3:5)\n end)\nend"

eval(Meta.parse(string(uneval(a)))) == a = true

EDIT:
MacroTools is wonderful.

```julia
import MacroTools: prewalk, rmlines, unblock
@show string(prewalk(rmlines ∘ unblock, uneval(a)))
"(OffsetArray)([0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967], (3:5, 3:5))"

The current repr is simply documented and defined to be not much more than sprint(show, _). I'd advocate for its deprecation in 2.0. I think any fully round-trippable function probably needs to be opt-in (with no default definition) and should have a new name completely separated from show.

Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary. Seems like a very hard and complex requirement to try to satisfy.

Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary. Seems like a very hard and complex requirement to try to satisfy.

For me, it makes it more enjoyable to use a REPL. I think it's the same justification for why people spend effort making configuration file formats that are both human- and machine-readable.

It would let a user print out some complicated nested data structure, be able to select a piece of it, and copy the selction, and paste into the REPL to get an that value. If we weren't beholden to text terminals, and we had smarter terminals with better copy and pasting, then it would likely obviate the need for a format like this, since you could just have a presentation format, and in parallel have an underlying representation that lets you reconstruct the value from the clipboard.

I get that it’s _nice_ which is why repr is suggested to have this property. But handling all the tricky cases like circularity in data structures is really absurdly difficult. That’s why I’m in favor of having a convention round-tripability rather than a hard requirement.

Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary.

As I've explained in the OP, Base uses repr and there are other situations this would be useful. For example:

That’s why I’m in favor of having a convention round-tripability rather than a hard requirement.

I think an API that may or may not satisfy some property is rather hard to use because you can't rely on it. That's why I think @mbauman's suggestion for making this opt-in makes sense.

  • Vega.jl: doesn't actually need to be parsable, just reasonably readable; if you need the original user input, you should save it, e.g. like regexes do.
  • PkgBenchmark.jl: only needs to be parsable for very limited types, serialization would be the better way to do this.
  • BenchmarkCI.jl: same kind of thing, only needs to work for strings and booleans.
  • Aqua.jl: same deal, only used for vectors of strings, serialization would be the better way to do this.

Again, I'm not saying that it's not nice if the output of repr is parseable and you can use it that way if you're confident that for the kinds of values you're printing are of limited types where repr does work, but insisting that repr be able to print arbitrary objects in such a way that parsing and evaling the resulting string gives you back an equal object is just way too much. Serialization does this without the requirement that the serialized representation be a valid Julia expression and it's already super complicated and adds a ton of complexity. What's being proposed here would foist that kind of overhead and complexity on every call to the simple repr function.

Here's an example. How would you define repr for this type:

mutable struct X
    x::X
    function X()
        x = new()
        x.x = x
    end
end

Currently we print it like this:

julia> x = X()
X(X(#= circular reference @-1 =#))

How would you print this so that the printed expression evaluates back to an equal value? Or what about this array:

julia> a = Any[1, 2, 3, 4]
4-element Array{Any,1}:
 1
 2
 3
 4

julia> a[3] = a;

julia> repr(a)
"Any[1, 2, Any[#= circular reference @-1 =#], 4]"

Do we disallow calling repr on arrays?

Vega.jl: doesn't actually need to be parsable, just reasonably readable; if you need the original user input, you should save it, e.g. like regexes do.

I don't think so. It's reasonable (and already done) to support converting arbitrary vega/vega-lite spec back to Julia syntax. For example, you'd want to print a spec as Julia code and then put it in a script so that it can be re-used for different dataset.

Saving the original code is not an option since vega-lite spec can be coming from elsewhere like a GUI to construct the spec: https://github.com/queryverse/DataVoyager.jl

Serialization does this

As I discussed in the OP, using Serialization is not appropriate sometimes. For example, you can't cross Julia versions or sessions with different sysimages.

Also, using Serialization (or something like JSON) is a lot more complicated than constructing code to be executed. I don't think it's appropriate for something like pre-compilation mechanism in Base and the packages like PkgBenchmark.jl.

What's being proposed here would foist that kind of overhead and complexity on every call to the simple repr function.

That's why I said opt-in makes sense. Why not do it when it's not super complex?

Do we disallow calling repr on arrays?

If we have repr2 that is opt-in, it'd throw if you do repr2(a). It's very safe because you'd know that it doesn't work at the time you call it rather than the time you try to parse it.

Yes, I don't think anybody here wants a major effort to give all objects parseable representations no matter what. There are a couple cases:

  1. Cycles and shared references: require heavy parser support, we probably won't do it.
  2. Impossible cases, e.g. foreign or stateful objects of some kind.
  3. Corner cases where parseable output is possible but awful.

So I think all anybody wants is a mode where 1 & 2 give an error at printing time, and 3 gives you the parseable output even though it's awful. A :parseable flag seems like a reasonable way to get that: initially it will be the same as repr, then just get more accurate and strict over time.

I think a flag would not be enough because an implementer can easily forget about checking :parseable. I think it's better to have an API that helps the implementer to notice what properties must hold for the result of the function. I think this is essential for making sure that trying to get a parsable representation of unsupported object throws an error at print-time.

We slowly get closer over time with Core.println() (aka jl_static_show) to being a parsable textual format. The other difficult problem is handling shared references however, since x = []; [x, x] is rather hard to describe clearly in a textual format.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yurivish picture yurivish  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

iamed2 picture iamed2  Â·  3Comments

dpsanders picture dpsanders  Â·  3Comments