The idiomatic Crystal way of converting a value to a String
is to append it to an IO
. The most notable it Object#to_s(io : IO)
which writes a string representation to the IO
. When an actual String
is required, there is an overload Object#to_s()
which calls the IO variant wrapped in String.build
.
Other examples are Object#inspect
, Base64#encode
, Base64#decode
or Enumerable#join
which all adapt the same technique.
Since they provide the same functionality, both methods usually have the same signature except that one returns String
and the other accepts an additional argument io : IO
. That's the distinctive difference between them. It is obvious that the positioning of the other arguments should be the same to avoid any friction.
If the methods also accept other arguments, there is a question as to where the io
argument should be placed. There are different styles throughout the standard lib:
Time#to_s(format : String, io : IO)
Int#to_s(base : Int, io : IO, upcase : Bool = false)
BigRational#to_s(io : IO, base = 10)
It doesn't make any sense to add io
in between as it screws up argument order. At the end is better but it will cause issues when optional arguments are added to the signature (that's probably what happened with Int#to_s
). I think it should be preferred to put io
at the front of the other arguments. This clearly signals that this is the IO variant of the method. And additional arguments can simply be appended at the end in both variants. IMHO this is the most logical solution.
I propose to adapt "io
in front" style throughout the standard library and encourage others to do the same. This way it would be simple to switch from a String method to the IO method by inserting io
as first argument and you don't need to think about where to put it.
:+1: for IO in front, please send a PR
join
looks odd. The idea is that you apply some operation and you place the result in an IO, that's why it's the last argument. After it, optional arguments can come. I see nothing bad with having the methods be as they are now. This is just bikeshedding.
Put another way, since the compiler and language are stuck, and nobody is willing to work on them (complexity, lack of time, lack of direction, etc.) the only possible thing to do is to continue changing the standard library, refining and refining, with no real value if the language supporting all of that will not blossom.
(@straight-shoota just a small note that I truly appreciate your work on Crystal, but maybe current effort should be focused somewhere else, maybe the language and the compiler... though of course it's hard because there's no clear direction on how to do that, or what language changes are needed to render Crystal a production language)
This is a proposal to use a predictable and consistent API throughout the stdlib. The stdlib is already pretty good in many places, but obviously it can still be improved - and not only by adding new features. The compiler and language need a lot of work, that's true. But it's already actually pretty usable.
I don't think this is bikeshedding. IMHO it is important to address this issue (and lots of 馃憤 on the proposal show others do as well) to avoid ending up with different ways the stdlib API's work regarding String/IO methods. I have stumbled over this a number of times, that's why I think it would be great to agree on unifying the overload styles.
It's totally fine if we agree on putting io
last everywhere and change everything to that style. But I strongly favour to put it in front.
It is my understanding that adding an io
to the signature is not just an optional argument that toggles some of it's behaviour but it changes the very nature how the method works: it writes to that IO instead of returning a String. This is very significant and I think it makes sense to put significant arguments first.
I've already explained the more practical argument that putting io
at the end causes issues with the positioning of optional arguments (including splats).
Regarding the argument "join
looks odd." I could answer that I find join(", ", io)
looks odd. I'd rather have as join(io, ", ")
and interpret as "do a join by taking this IO and write every output to it".
(Thanks for the appreciation. I'd be happy to work on other parts as well, but I believe that would require some initiative from the core team to decide how things should go.)
I guess standarizing the arguments order is good then, specially because everyone seems to agree.
Convention is better than configuration, but join(io, sep)
is a huge NO for me.
@ysbaddaden So what's your suggestion on this? Keep the APIs inconsistent and error prone?
I'm ambivalent towards join(io, sep)
.
@ysbaddaden Why NOt, especially when you agree on the topic of convention-over-configuration? There's 10 馃憤 for the time being, so please give at least some rationale behind your opinion on the matter.
Another alternative is to move the io
as a named argument always, plus allow Object#to_s(io : IO)
. It will be a bit more verbose, but the io
would be able to be at the end and would play nice with other optional named arguments.
Also, given that the idiom is kind of common, either use the given io
or use a StringBuilder, maybe we could encapsulate that as a pattern. But the main issue is the api rather than the implementation.
I am ok with putting io at the beginning though. Not the nicest thing for join, but an acceptable compromise from my side.
When there is a choice between join(io, ", ")
and join(", ", io: io)
I would very much prefer the former. It would add so much more clutter everywhere you're delegating a lot to write to IO. Just imagine #6004 with all this additional named arguments 馃檲
Since I've been asked, I would have (leaving out types for clarity):
Enumerable#join(separator)
Enumerable#join(separator, io)
Time#to_s(format)
Time#to_s(format, io)
Int#to_s(base, *, upcase = false)
Int#to_s(base, io, *, upcase = false)
BigRational#to_s(*, base = 10)
BigRational#to_s(io, *, base = 10)
That is the following rationale:
io
as the last argument;io
feels like an optional argument, correctly placed.Said differently: think about each usage, improve the API, instead of blindly enforcing rules.
@ysbaddaden I agree that base
and upcase
should be named arguments. It's nice additional expressiveness.
This would avoid any issues with the placement of io
as last argument. In my opinion, this is however an important point against putting io
as last argument. If you only look at these few examples and find different solutions, that's fine. But your argument just avoids individual problems and doesn't address the general issue which can't be solved differently in any case.
I couldn't find a proper example in the stdlib, but I'll use a method from Crinja to illustrate:
Template#render
is supposed to be called with or without IO
and optional variables
. It wouldn't make much sense to make variables
a named argument because it is frequently used and calls should be as simple as template.render({ "name" => "John" })
.
If io
is supposed to be added as last argument, you would need three overloads:
Template.render(variables = nil)
Template.render(variables, io : IO)
Template.render(io : IO)
When io
is added as first argument, it's only two:
Template.render(variables = nil)
Template.render(io : IO, variables = nil)
If the API would be extended to have an additional optional, positional argument (say an execution context for example), it would still only be two overloads vs. six (!!). This is probably less common, but still worth considering.
Splat arguments are completely impossible when io
is supposed to be the last argument always.
In my opinion these are strong points against putting io
at the end, while there are no such limitations when put at the front.
I couldn't find a proper example in the stdlib.
That settles it for me. This discussion is about stdlib. Libraries can use another pattern if it fits better. Again: blindly enforcing rules or patterns makes no sense; actual use cases and contexts are more important.
Crystal isn't just the stdlib but an entire ecosystem. Conventions used in the stdlib should be generally applicable if possible. It doesn't help at all if the stdlib goes for "io last everywhere" but that most likely doesn't work for other libraries.
Even if you leave everything outside stdlib out of consideration (which makes no sense to me), you can't exclude that this will become an issue in stdlib at some point.
It feels foolish to disregard any issues just because they might currently not apply here.
I have given actual use cases and it doesn't seem to be far fetched to have methods with optional IO and optional positional argument(s) or splats in stdlib. What's the point in ignoring all this just because the alternative "feels bad"? 馃槦
It looks like this hasn't come up again in two years, so it doesn't seem to be a huge pain point in practice. Given there's quite some disagreement here, I would say we keep status quo of seeing what feels best for each kind of method and close this.
Yeah, this hasn't seen much activity. I'm still convinced it would be a good thing. Even if it's not a huge pain point, it still causes friction. Different styles make it difficult to remember which one goes where.
According to the reactions on the OP and the comments, I actually don't see this as being very contested. As far as I understand, we all agree that standardizing would be benefitial.
It would be nice if IO
would always go last. Then when doing:
"foo#{some_method(1, 2)}bar"
the compiler could check if there's an overload that matches some_method(1, 2, IO)
and use that instead of doing io << some_method(1, 2)
behind the hood.
Probably not very important, though.
The same could work when IO
is always the first argument 馃槒
I don't mind which direction any more, I think "IO
at end" is more common and choosing between front or end is moot. Standardization is good cause it allows us to do interpolation optimizations.
Not sure how this optimizationwould work with the new String.interpolation
API though.
I'm fine with standardizing. I still prefer io as first argument because of the practical benefit. I've already mentioned this in https://github.com/crystal-lang/crystal/issues/5916#issuecomment-384680940. But I was incorrect about having no example in stdlib: Enumerable#join
is actually a perfect example!
Currently, these two non-yielding overloads exists:
def join(separator = "")
end
def join(separator, io)
end
But we're actually missing an overload accepting an IO but using a default separator. With the current argument placement, it would be this:
def join(io : IO)
end
That makes three different overloads in total.
If IO argument comes first, it would only be two overloads in total:
def join(separator = "")
end
def join(io : IO, separator = "")
end
That argument expands to all methods accepting an io and optional positional arguments. That also includes splat arguments. When IO is supposed to always go last, it can't work with splats.
@straight-shoota I think that makes sense. The problem is that it's a big breaking change.
But if we want to do it (I'd like to do it), let's do it before 1.0 and also document it in the style guide.
The change actually isn't that big. IIRC besides join
only a couple of encode
and format
methods would be affected. And we can easily add deprecation warnings.
Pull request is available at #9134
I decided to hold off on methods that take two arguments to transform from one to the other. That's class methods like Base64.encode(data, io : IO)
. The general idea is that the information flows from source (first argument) to sink (second argument). In instance methods like #join
or #to_s
source is the receiver, sink the first argument and all other arguments are just options to the process.
But when the source is specified in the argument, applying the IO-first scheme would be strange because source and sink are in the wrong order. This is very obvious when the source is also an IO. Let's imagine we would consider adding a method to Base64
that reads directly from an IO, encodes the data and writes it to another IO. It should have the signature Base64.encode(from : IO, to : IO)
not the other way around.
In case someone is interested in the diff for class methods: https://github.com/crystal-lang/crystal/compare/master...straight-shoota:refactor/io-methods-transform
That's why I always put io as a last argument in the stdlib, because io is a sink.
Yes I would also much prefer io as last. Probably fewer changes and inconsistencies required for that.
@asterite Yes, but the sink makes most sense as first argument when the source is the receiver of a method call.
@oprypin I didn't do a proper count because the plain numbers don't mean much. But really I don't see how moving all IO arguments to last argument would lead to fewer changes.
IO as first argument is consistent, so I'm not sure what you mean with inconsistencies. The consistency argument would actually speak against IO last because it doesn't work with splats.
Also IO-first needs fewer overloads for optional positional arguments.
Also IO-first needs fewer overloads for optional positional arguments.
I see this as the only supporting argument. A very significant one, though.
Regarding inconsistencies:
I decided to hold off on methods that take two arguments to transform from one to the other.
I'm holding out hope for https://github.com/crystal-lang/crystal/issues/5916#issuecomment-612644776
Look at the consistency at the callsite: source.process(sink, args...)
and process(source, sink, args...)
. Makes total sense to me.
For automatic expansion in string interpolations, I don't think it matters where the io argument is placed, first or last. As long as it is standardized, it should work either way.
Well, yes, it doesn't matter first or last, if it's always the sink. If the first arg can sometimes be the source, we're kinda in trouble.
@straight-shoota What's an example where there are a source other than self, and a sink? Are these always class methods? Can you list them all?
@oprypin That doesn't matter, really. If the last argument is an IO it still doesn't mean it works the same way as an overload that returns String
. Could just have a entirely different purpose. If there's only one argument, it is both first and last. Yet it could still be a source or something completely different.
I don't think this automatic interpolation can work anyway without an explicit indicator to enable it (for example an annotation, or a magic argument name).
@asterite I don't think there are currently many instance methods in stdlib that accept both a source and sink as arguments. There's at least Time::Format#format(time : Time, io : IO)
. It's hard to make a list because grepping only gets you so far. When there's no type restriction and no argument named io
, it won't show up easily.
:thinking: :thinking: :thinking:
It'd be possible to implement a search based on what is being passed to the method as part of normal usage.
I remember
I see no problem with using io as a first argument always. Is there a case where we have two ios other than copy?
There's FileUtils.cmp(stream1 : IO, stream2 : IO)
for example.
But I really don't see how it helps us moving forward to know what methods currently exist. That can serve as a rough baseline for consideration, but new methods are being added all the time.
Just my opinion: I think Base64.encode(io, what)
looks just fine. I think using the IO
overload is more low-level, so it's fine if we want to have a convention where IO always comes first. It shouldn't affect a lot of users.
@asterite We also have the uncontested convention to place source IO first. If both source and sink are passed as arguments, both conventions contradict. So the convention should be that at callsite the sink follows the source.
Where is there source IO other than copy?
Another alternative could be to have an IO required named argument.
Lot's of methods read from an IO (.parse
, .from_io
, etc.) and the source IO is the first argument always.
Another alternative could be to have an IO required named argument.
How would that be any better than #9134? It's a breaking change to a lot of methods and results in unneccesarily verbose call syntax with no apparant benefit. Considere there are the hundreds of #to_s
etc. that only have a single argument.
I don't understand why we're coming back to debating alternatives that as far as I'm concerned have been rejected before and have no apparent advantage over the current proposal. Apart from being a breaking change, there is no downside to that. And a breaking change of that size will be necessary for any standardization, and outweighs the current inconsitencies.
I'm convinced #9134 is the correct way to go forward.
But parse and from_io read from an IO. It's different. I was asking what methods we have that write to an IO where there would be a confusion if IO was the first argument.
And just to be clear, I think that PR is fine. I just don't understand why Base64 encode is an exception. I'd move io to the first position too there
We also have the uncontested convention to place source IO first.
Please, I'm against it, said it out loud, and proposed a better rationale instead of blindly enforcing a rule that will regularly make no sense. For example:
Enumerable#join(io, sep) # urgh
Base64.encode(io, what) # does it read from `io` ?!
Putting the io
argument _last_ leads to a _better_ API.
Moving optional arguments to kwargs would improve the API even more.
@ysbaddaden My comment about undisputed convention was about source IO, not sink.
It seems you missed some of the recent discussion, otherwise I can't understand what you could possibly mean. Please show a single concrete example where the current proposal as of #9134 makes no sense. It follows the convention sink follows the source at callsite.
The Base64.encode
example has been addressed in https://github.com/crystal-lang/crystal/issues/5916#issuecomment-616183833 and doesn't apply.
Putting the io argument last leads to a better API.
I disagree. Your suggestion would lead to the sequential order at callsite source.process(*required_options, sink, **optional_options)
. That makes no sense at all and is in no way better than my proposal.
Moving optional arguments to kwargs would improve the API even more.
Some optional arguments are completely fine as positional arguments. For example:
Enumerable#join(io : IO, separator = "")
String#ljust(io : IO, char = ' ')
Time#to_s(io : IO, format : Format)
They should not need to be named arguments because their purpose is quite clear from the context.
Most helpful comment
@straight-shoota I think that makes sense. The problem is that it's a big breaking change.
But if we want to do it (I'd like to do it), let's do it before 1.0 and also document it in the style guide.