Currently, repr
in Julia is maybe evaluatable or parseable or (often) just a string representation only consumable by humans. However, it could be much useful if it is guaranteed to be evaluatable. For example, you can find a lot of examples in base/loading.jl
which is used as simple IPC mechanism:
In case of base/loading.jl
it is OK to use repr
because the types used there are known to have evaluatable repr
. However, some basic types like PkgId
cannot be used in this way:
julia> repr(Base.PkgId(InteractiveUtils))
"InteractiveUtils [b77e0a4c-d291-57a0-90e8-8db25a27a240]"
Another concrete example I have in mind where repr
can be useful is to create a plot "spec" using DataVoyager.jl and dump it into a runnable Julia script. Currently, using it needs a rather convoluted invocation like show(IOContext(stdout, :compact=>false), "text/plain", spec)
because repr
is not a reliable entry point for evaluatable code.
We have Serialization
module for serializing complex Julia objects. Unfortunately, it cannot be used for communicating with different Julia versions. For complex serializations it is probably a good idea to use something like JSON instead of Julia code. However, I think the above examples are good enough motivation to have repr
for simple cases.
It was suggested to add a flag like parseable
(e.g., https://github.com/JuliaLang/julia/pull/33178#issuecomment-530294366, https://github.com/JuliaLang/julia/issues/30683#issuecomment-531223984) via the IOContext
mechanism. I have a concern about this direction because it is practically impossible to enforce parsability this way (a brief discussion with @JeffBezanson in https://github.com/JuliaLang/julia/pull/30757#issuecomment-456936172). I think the overloading API should be clear that the implementer _has_ to make it evaluatable. This can be done by introducing overloading API such as show(::IO, ::MIME"application/julia", obj)
or repr(::IO, obj)
.
Questions:
IOContext
? New entry point (e.g., show(::IO, ::MIME"application/julia", obj)
)? Something else?I have a concern about this direction because it is practically impossible to enforce parsability this way
I don't really see why --- telling people to handle the flag doesn't seem any worse than telling people to add a particular method.
Aside: eval
-uatable or parse
-able can be totally unrelated things, so it's not necessarily meaningful to say it would be both. There's also just many things that don't have a eval
representation, but can be serialized or printed.
A big part of the problem seems to be that it's hard to characterize how array/dataframe elements should be printed in the REPL. They don't need to be parseable, and we want to use a "nice"/non-parseable representation when possible, but we also want to print type info like the f0
suffix, and we want to quote strings. It's a mix of requirements that's hard to associate with a single function like show
or print
that should have a simple description.
I don't really see why --- telling people to handle the flag doesn't seem any worse than telling people to add a particular method.
I believe relying on people to not forget something is a bad idea. People forgetting to define HasEltype
is a good example. I think overloading API should _help_ people recognizing that they must construct (at least) parseable representation.
It's a mix of requirements that's hard to associate with a single function like
show
or
Isn't it an argument favoring adding an API for repr
implementation? It would decouple repr
from display
and print
.
OK, adding repr(::IO, x)
would be pretty simple and worth considering.
One thing I want to discuss more is, what are the use cases for non-parseable show
? Are there any types where we want print, show, and repr to all be different? I'm skeptical. Again, it seems most of the complexity is in how we want array elements to look, rather than in needing many different kinds of printing per se. For example, given these definitions:
print
: pretty, concise, doesn't need to be parseable
show
: parseable
I think the desired array element printing is very close to:
default: print
numbers: print
if type is implied by eltype, otherwise show
strings: show
Any examples where that would go wrong? And are there any other contexts like this?
One thing I want to discuss more is, what are the use cases for non-parseable
show
?
The only reason I can think of is backward compatibility. It would be great if show(::IO, x)
were not used for array's text/plain show
and it is used only for repr
. I'm proposing a new entry point since it sounds difficult for entire Julia community to switch to it. It probably is easier to introduce repr(::IO, x)
and then deprecate show(::IO, x)
while switching to Julia 2.0.
I understand that co-existence of similar APIs is very unsatisfactory. But addition of a new API is the only option I can think of to clean up show
infrastructure and make repr
work while preserving backward compatibility (which of course can be due to my lack of imagination).
I think the desired array element printing is very close to:
default:
numbers:show
strings:show
I suggest to not include this logic inside array printing. Rather, I think array printer should just set compact
and typeinfo
for text/plain show
and then the above logic should be implemented inside text/plain show
of each type. This is because:
show
can be used to pretty-print element values. (OK, maybe this is weak, given that not many people implement numbers)I think array printer should just set
compact
andtypeinfo
for text/plainshow
It also should set displaysize
to handle something like a vector of vectors, so that inner array can switch to printing like [1, 2, 3]
when the height is 1.
I think we should be using MIME types to control this?
I thought we already were.
Yes, we use MIME for that. We briefly used an IOContext flag for it (multiline
instead of displaysize[2] == 1
, but otherwise basically the same). It was a violation of the purpose of IOContext, and thus did badly—search the old issues for this. We could potentially test for both options (MIME(text/plain)
+ displaysize[2] == 1
is a valid test combo in 3-arg mime-show), but the caller would need to be prepared to handle multiline output then (since IOContext values are only advisory).
I think #34387 is a big step for fixing the situation (thanks to fchorney!). Ref: https://github.com/JuliaLang/julia/issues/30901#issuecomment-573450132
But not everything is resolved since show(io::IO, ::MIME"text/plain", X::AbstractArray)
still does not set IOContext(io, :compact => true)
always
That is to say, 3-arg show
for element types still cannot detect if it is called by the show
of the container in a reliable manner.
@JeffBezanson You suggested to keep this in https://github.com/JuliaLang/julia/pull/34387#discussion_r368733261 (just for now?). Is there a plan to improve this?
Also, it would be nice if fallback to 2-arg show
can be removed in a future version of Julia. If 2-arg show
is meant to be used for repr
, we should expect this to generate larger output on average. Furthermore, the fallback to 2-arg show
motivates users to overload it to customize how it is printed inside container. This often contradicts with how repr
works.
Instead of requiring repr
to return a Meta.parse
-able string, I would consider a new uneval
to return a eval
-able expression. Like eval
(or perhaps more like macroexpand
), you can provide uneval
with a module, since names require different qualification depending on which module you evaluate them in. I think this is the reason for an IOContext
:module
key.
I can't tell if that punts most of the problem to now serializing expressions into strings, but I do like that it separates it, and I think it addresses https://github.com/JuliaLang/julia/issues/33260#issuecomment-531887799.
example:
OffsetArrays
doesn't "repr
it as you build it", so a less verbose version (filter out the parser provenance LineNodes, and this would be greatbegin
blocks) of
import OffsetArrays: OffsetArray
uneval(x) = quote $x end
function uneval(o::OffsetArray)
p = uneval(parent(o))
a = uneval(axes(o))
oa = uneval(OffsetArray)
quote
($oa)($p, $a)
end
end
a = OffsetArray(rand(3,3), (3:5, 3:5))
@show repr(a)
println("show:")
show(stdout, "text/plain", a)
println()
@show string(uneval(a))
@show eval(Meta.parse(string(uneval(a)))) == a
```
repr(a) = "[0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967]"
show:
3×3 OffsetArray(::Array{Float64,2}, 3:5, 3:5) with eltype Float64 with indices 3:5×3:5:
0.390171 0.52951 0.153272
0.267477 0.982007 0.303269
0.438121 0.0148684 0.363546
string(uneval(a)) = "begin\n #= /private/tmp/uneval.jl:10 =#\n (begin\n #= /private/tmp/uneval.jl:3 =#\n OffsetArray\n end)(begin\n #= /private/tmp/uneval.jl:3 =#\n [0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967]\n end, begin\n #= /private/tmp/uneval.jl:3 =#\n (3:5, 3:5)\n end)\nend"
eval(Meta.parse(string(uneval(a)))) == a = true
EDIT:
MacroTools is wonderful.
```julia
import MacroTools: prewalk, rmlines, unblock
@show string(prewalk(rmlines ∘ unblock, uneval(a)))
"(OffsetArray)([0.39017124461364094 0.5295100509285167 0.15327227892449202; 0.2674771551403925 0.9820070800345242 0.30326872122566284; 0.438120593884042 0.01486844883391325 0.3635461803662967], (3:5, 3:5))"
The current repr
is simply documented and defined to be not much more than sprint(show, _)
. I'd advocate for its deprecation in 2.0. I think any fully round-trippable function probably needs to be opt-in (with no default definition) and should have a new name completely separated from show
.
Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary. Seems like a very hard and complex requirement to try to satisfy.
Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary. Seems like a very hard and complex requirement to try to satisfy.
For me, it makes it more enjoyable to use a REPL. I think it's the same justification for why people spend effort making configuration file formats that are both human- and machine-readable.
It would let a user print out some complicated nested data structure, be able to select a piece of it, and copy the selction, and paste into the REPL to get an that value. If we weren't beholden to text terminals, and we had smarter terminals with better copy and pasting, then it would likely obviate the need for a format like this, since you could just have a presentation format, and in parallel have an underlying representation that lets you reconstruct the value from the clipboard.
I get that it’s _nice_ which is why repr
is suggested to have this property. But handling all the tricky cases like circularity in data structures is really absurdly difficult. That’s why I’m in favor of having a convention round-tripability rather than a hard requirement.
Why do we need a textual format that’s round trippable? I have yet to encounter a situation where this necessary.
As I've explained in the OP, Base
uses repr
and there are other situations this would be useful. For example:
That’s why I’m in favor of having a convention round-tripability rather than a hard requirement.
I think an API that may or may not satisfy some property is rather hard to use because you can't rely on it. That's why I think @mbauman's suggestion for making this opt-in makes sense.
Again, I'm not saying that it's not nice if the output of repr
is parseable and you can use it that way if you're confident that for the kinds of values you're printing are of limited types where repr
does work, but insisting that repr
be able to print arbitrary objects in such a way that parsing and evaling the resulting string gives you back an equal object is just way too much. Serialization does this without the requirement that the serialized representation be a valid Julia expression and it's already super complicated and adds a ton of complexity. What's being proposed here would foist that kind of overhead and complexity on every call to the simple repr
function.
Here's an example. How would you define repr
for this type:
mutable struct X
x::X
function X()
x = new()
x.x = x
end
end
Currently we print it like this:
julia> x = X()
X(X(#= circular reference @-1 =#))
How would you print this so that the printed expression evaluates back to an equal value? Or what about this array:
julia> a = Any[1, 2, 3, 4]
4-element Array{Any,1}:
1
2
3
4
julia> a[3] = a;
julia> repr(a)
"Any[1, 2, Any[#= circular reference @-1 =#], 4]"
Do we disallow calling repr
on arrays?
Vega.jl: doesn't actually need to be parsable, just reasonably readable; if you need the original user input, you should save it, e.g. like regexes do.
I don't think so. It's reasonable (and already done) to support converting arbitrary vega/vega-lite spec back to Julia syntax. For example, you'd want to print a spec as Julia code and then put it in a script so that it can be re-used for different dataset.
Saving the original code is not an option since vega-lite spec can be coming from elsewhere like a GUI to construct the spec: https://github.com/queryverse/DataVoyager.jl
Serialization does this
As I discussed in the OP, using Serialization is not appropriate sometimes. For example, you can't cross Julia versions or sessions with different sysimages.
Also, using Serialization (or something like JSON) is a lot more complicated than constructing code to be executed. I don't think it's appropriate for something like pre-compilation mechanism in Base
and the packages like PkgBenchmark.jl.
What's being proposed here would foist that kind of overhead and complexity on every call to the simple
repr
function.
That's why I said opt-in makes sense. Why not do it when it's not super complex?
Do we disallow calling repr on arrays?
If we have repr2
that is opt-in, it'd throw if you do repr2(a)
. It's very safe because you'd know that it doesn't work at the time you call it rather than the time you try to parse it.
Yes, I don't think anybody here wants a major effort to give all objects parseable representations no matter what. There are a couple cases:
So I think all anybody wants is a mode where 1 & 2 give an error at printing time, and 3 gives you the parseable output even though it's awful. A :parseable
flag seems like a reasonable way to get that: initially it will be the same as repr
, then just get more accurate and strict over time.
I think a flag would not be enough because an implementer can easily forget about checking :parseable
. I think it's better to have an API that helps the implementer to notice what properties must hold for the result of the function. I think this is essential for making sure that trying to get a parsable representation of unsupported object throws an error at print-time.
We slowly get closer over time with Core.println()
(aka jl_static_show
) to being a parsable textual format. The other difficult problem is handling shared references however, since x = []; [x, x]
is rather hard to describe clearly in a textual format.
Most helpful comment
The current
repr
is simply documented and defined to be not much more thansprint(show, _)
. I'd advocate for its deprecation in 2.0. I think any fully round-trippable function probably needs to be opt-in (with no default definition) and should have a new name completely separated fromshow
.