Crystal: Standardize overloads for methods with IO/returning String

Created on 4 Apr 2018 · 43Comments · Source: crystal-lang/crystal

The idiomatic Crystal way of converting a value to a String is to append it to an IO. The most notable it Object#to_s(io : IO) which writes a string representation to the IO. When an actual String is required, there is an overload Object#to_s() which calls the IO variant wrapped in String.build.

Other examples are Object#inspect, Base64#encode, Base64#decode or Enumerable#join which all adapt the same technique.

Since they provide the same functionality, both methods usually have the same signature except that one returns String and the other accepts an additional argument io : IO. That's the distinctive difference between them. It is obvious that the positioning of the other arguments should be the same to avoid any friction.

If the methods also accept other arguments, there is a question as to where the io argument should be placed. There are different styles throughout the standard lib:

Time#to_s(format : String, io : IO)
Int#to_s(base : Int, io : IO, upcase : Bool = false)
BigRational#to_s(io : IO, base = 10)

It doesn't make any sense to add io in between as it screws up argument order. At the end is better but it will cause issues when optional arguments are added to the signature (that's probably what happened with Int#to_s). I think it should be preferred to put io at the front of the other arguments. This clearly signals that this is the IO variant of the method. And additional arguments can simply be appended at the end in both variants. IMHO this is the most logical solution.

I propose to adapt "io in front" style throughout the standard library and encourage others to do the same. This way it would be simple to switch from a String method to the IO method by inserting io as first argument and you don't need to think about where to put it.

breaking-change refactor discussion stdlib

Source

straight-shoota

👍14 👎2

Most helpful comment

@straight-shoota I think that makes sense. The problem is that it's a big breaking change.

But if we want to do it (I'd like to do it), let's do it before 1.0 and also document it in the style guide.

asterite on 13 Apr 2020

👍3

All 43 comments

:+1: for IO in front, please send a PR

RX14 on 5 Apr 2018

join looks odd. The idea is that you apply some operation and you place the result in an IO, that's why it's the last argument. After it, optional arguments can come. I see nothing bad with having the methods be as they are now. This is just bikeshedding.

Put another way, since the compiler and language are stuck, and nobody is willing to work on them (complexity, lack of time, lack of direction, etc.) the only possible thing to do is to continue changing the standard library, refining and refining, with no real value if the language supporting all of that will not blossom.

asterite on 5 Apr 2018

👍2

(@straight-shoota just a small note that I truly appreciate your work on Crystal, but maybe current effort should be focused somewhere else, maybe the language and the compiler... though of course it's hard because there's no clear direction on how to do that, or what language changes are needed to render Crystal a production language)

asterite on 5 Apr 2018

👍3

This is a proposal to use a predictable and consistent API throughout the stdlib. The stdlib is already pretty good in many places, but obviously it can still be improved - and not only by adding new features. The compiler and language need a lot of work, that's true. But it's already actually pretty usable.

I don't think this is bikeshedding. IMHO it is important to address this issue (and lots of 👍 on the proposal show others do as well) to avoid ending up with different ways the stdlib API's work regarding String/IO methods. I have stumbled over this a number of times, that's why I think it would be great to agree on unifying the overload styles.

It's totally fine if we agree on putting io last everywhere and change everything to that style. But I strongly favour to put it in front.

It is my understanding that adding an io to the signature is not just an optional argument that toggles some of it's behaviour but it changes the very nature how the method works: it writes to that IO instead of returning a String. This is very significant and I think it makes sense to put significant arguments first.

I've already explained the more practical argument that putting io at the end causes issues with the positioning of optional arguments (including splats).

Regarding the argument "join looks odd." I could answer that I find join(", ", io) looks odd. I'd rather have as join(io, ", ") and interpret as "do a join by taking this IO and write every output to it".

(Thanks for the appreciation. I'd be happy to work on other parts as well, but I believe that would require some initiative from the core team to decide how things should go.)

straight-shoota on 5 Apr 2018

I guess standarizing the arguments order is good then, specially because everyone seems to agree.

asterite on 6 Apr 2018

Convention is better than configuration, but join(io, sep) is a huge NO for me.

ysbaddaden on 7 Apr 2018

😕1

@ysbaddaden So what's your suggestion on this? Keep the APIs inconsistent and error prone?

straight-shoota on 7 Apr 2018

I'm ambivalent towards join(io, sep).

RX14 on 7 Apr 2018

@ysbaddaden Why NOt, especially when you agree on the topic of convention-over-configuration? There's 10 👍 for the time being, so please give at least some rationale behind your opinion on the matter.

Sija on 7 Apr 2018

Another alternative is to move the io as a named argument always, plus allow Object#to_s(io : IO). It will be a bit more verbose, but the io would be able to be at the end and would play nice with other optional named arguments.

Also, given that the idiom is kind of common, either use the given io or use a StringBuilder, maybe we could encapsulate that as a pattern. But the main issue is the api rather than the implementation.

I am ok with putting io at the beginning though. Not the nicest thing for join, but an acceptable compromise from my side.

bcardiff on 9 Apr 2018

👍1

When there is a choice between join(io, ", ") and join(", ", io: io) I would very much prefer the former. It would add so much more clutter everywhere you're delegating a lot to write to IO. Just imagine #6004 with all this additional named arguments 🙈

straight-shoota on 26 Apr 2018

Since I've been asked, I would have (leaving out types for clarity):

Enumerable#join(separator)
Enumerable#join(separator, io)
Time#to_s(format)
Time#to_s(format, io)
Int#to_s(base, *, upcase = false)
Int#to_s(base, io, *, upcase = false)
BigRational#to_s(*, base = 10)
BigRational#to_s(io, *, base = 10)

That is the following rationale:

always push io as the last argument;
rarely used optional arguments should be kwargs, improving readability;
io feels like an optional argument, correctly placed.

Said differently: think about each usage, improve the API, instead of blindly enforcing rules.

ysbaddaden on 26 Apr 2018

👍2

@ysbaddaden I agree that base and upcase should be named arguments. It's nice additional expressiveness.

This would avoid any issues with the placement of io as last argument. In my opinion, this is however an important point against putting io as last argument. If you only look at these few examples and find different solutions, that's fine. But your argument just avoids individual problems and doesn't address the general issue which can't be solved differently in any case.

I couldn't find a proper example in the stdlib, but I'll use a method from Crinja to illustrate:

Template#render is supposed to be called with or without IO and optional variables. It wouldn't make much sense to make variables a named argument because it is frequently used and calls should be as simple as template.render({ "name" => "John" }).

If io is supposed to be added as last argument, you would need three overloads:

Template.render(variables = nil)
Template.render(variables, io : IO)
Template.render(io : IO)

When io is added as first argument, it's only two:

Template.render(variables = nil)
Template.render(io : IO, variables = nil)

If the API would be extended to have an additional optional, positional argument (say an execution context for example), it would still only be two overloads vs. six (!!). This is probably less common, but still worth considering.

Splat arguments are completely impossible when io is supposed to be the last argument always.

In my opinion these are strong points against putting io at the end, while there are no such limitations when put at the front.

straight-shoota on 26 Apr 2018

I couldn't find a proper example in the stdlib.

That settles it for me. This discussion is about stdlib. Libraries can use another pattern if it fits better. Again: blindly enforcing rules or patterns makes no sense; actual use cases and contexts are more important.

ysbaddaden on 26 Apr 2018

👎1

Crystal isn't just the stdlib but an entire ecosystem. Conventions used in the stdlib should be generally applicable if possible. It doesn't help at all if the stdlib goes for "io last everywhere" but that most likely doesn't work for other libraries.

Even if you leave everything outside stdlib out of consideration (which makes no sense to me), you can't exclude that this will become an issue in stdlib at some point.

It feels foolish to disregard any issues just because they might currently not apply here.

I have given actual use cases and it doesn't seem to be far fetched to have methods with optional IO and optional positional argument(s) or splats in stdlib. What's the point in ignoring all this just because the alternative "feels bad"? 😟

straight-shoota on 26 Apr 2018

It looks like this hasn't come up again in two years, so it doesn't seem to be a huge pain point in practice. Given there's quite some disagreement here, I would say we keep status quo of seeing what feels best for each kind of method and close this.

jhass on 12 Apr 2020

Yeah, this hasn't seen much activity. I'm still convinced it would be a good thing. Even if it's not a huge pain point, it still causes friction. Different styles make it difficult to remember which one goes where.
According to the reactions on the OP and the comments, I actually don't see this as being very contested. As far as I understand, we all agree that standardizing would be benefitial.

straight-shoota on 12 Apr 2020

👍1

It would be nice if IO would always go last. Then when doing:

"foo#{some_method(1, 2)}bar"

the compiler could check if there's an overload that matches some_method(1, 2, IO) and use that instead of doing io << some_method(1, 2) behind the hood.

Probably not very important, though.

asterite on 12 Apr 2020

👀2

The same could work when IO is always the first argument 😏

straight-shoota on 12 Apr 2020

👀1

I don't mind which direction any more, I think "IO at end" is more common and choosing between front or end is moot. Standardization is good cause it allows us to do interpolation optimizations.

Not sure how this optimizationwould work with the new String.interpolation API though.

RX14 on 13 Apr 2020

I'm fine with standardizing. I still prefer io as first argument because of the practical benefit. I've already mentioned this in https://github.com/crystal-lang/crystal/issues/5916#issuecomment-384680940. But I was incorrect about having no example in stdlib: Enumerable#join is actually a perfect example!
Currently, these two non-yielding overloads exists:

def join(separator = "")
end
def join(separator, io)
end

But we're actually missing an overload accepting an IO but using a default separator. With the current argument placement, it would be this:

def join(io : IO)
end

That makes three different overloads in total.

If IO argument comes first, it would only be two overloads in total:

def join(separator = "")
end
def join(io : IO, separator = "")
end

That argument expands to all methods accepting an io and optional positional arguments. That also includes splat arguments. When IO is supposed to always go last, it can't work with splats.

straight-shoota on 13 Apr 2020

👍3 👎1

@straight-shoota I think that makes sense. The problem is that it's a big breaking change.

But if we want to do it (I'd like to do it), let's do it before 1.0 and also document it in the style guide.

asterite on 13 Apr 2020

👍3

The change actually isn't that big. IIRC besides join only a couple of encode and format methods would be affected. And we can easily add deprecation warnings.

straight-shoota on 13 Apr 2020

Pull request is available at #9134

I decided to hold off on methods that take two arguments to transform from one to the other. That's class methods like Base64.encode(data, io : IO). The general idea is that the information flows from source (first argument) to sink (second argument). In instance methods like #join or #to_s source is the receiver, sink the first argument and all other arguments are just options to the process.

But when the source is specified in the argument, applying the IO-first scheme would be strange because source and sink are in the wrong order. This is very obvious when the source is also an IO. Let's imagine we would consider adding a method to Base64 that reads directly from an IO, encodes the data and writes it to another IO. It should have the signature Base64.encode(from : IO, to : IO) not the other way around.

In case someone is interested in the diff for class methods: https://github.com/crystal-lang/crystal/compare/master...straight-shoota:refactor/io-methods-transform

straight-shoota on 19 Apr 2020

👍2

That's why I always put io as a last argument in the stdlib, because io is a sink.

asterite on 19 Apr 2020

👍1

Yes I would also much prefer io as last. Probably fewer changes and inconsistencies required for that.

oprypin on 19 Apr 2020

@asterite Yes, but the sink makes most sense as first argument when the source is the receiver of a method call.

@oprypin I didn't do a proper count because the plain numbers don't mean much. But really I don't see how moving all IO arguments to last argument would lead to fewer changes.
IO as first argument is consistent, so I'm not sure what you mean with inconsistencies. The consistency argument would actually speak against IO last because it doesn't work with splats.
Also IO-first needs fewer overloads for optional positional arguments.

straight-shoota on 19 Apr 2020

Also IO-first needs fewer overloads for optional positional arguments.

I see this as the only supporting argument. A very significant one, though.

Regarding inconsistencies:

I decided to hold off on methods that take two arguments to transform from one to the other.

I'm holding out hope for https://github.com/crystal-lang/crystal/issues/5916#issuecomment-612644776

oprypin on 19 Apr 2020

Look at the consistency at the callsite: source.process(sink, args...) and process(source, sink, args...). Makes total sense to me.

For automatic expansion in string interpolations, I don't think it matters where the io argument is placed, first or last. As long as it is standardized, it should work either way.

straight-shoota on 20 Apr 2020

Well, yes, it doesn't matter first or last, if it's always the sink. If the first arg can sometimes be the source, we're kinda in trouble.

oprypin on 20 Apr 2020

@straight-shoota What's an example where there are a source other than self, and a sink? Are these always class methods? Can you list them all?

asterite on 20 Apr 2020

@oprypin That doesn't matter, really. If the last argument is an IO it still doesn't mean it works the same way as an overload that returns String. Could just have a entirely different purpose. If there's only one argument, it is both first and last. Yet it could still be a source or something completely different.

I don't think this automatic interpolation can work anyway without an explicit indicator to enable it (for example an annotation, or a magic argument name).

@asterite I don't think there are currently many instance methods in stdlib that accept both a source and sink as arguments. There's at least Time::Format#format(time : Time, io : IO). It's hard to make a list because grepping only gets you so far. When there's no type restriction and no argument named io, it won't show up easily.

straight-shoota on 20 Apr 2020

:thinking: :thinking: :thinking:
It'd be possible to implement a search based on what is being passed to the method as part of normal usage.
I remember

oprypin on 20 Apr 2020

I see no problem with using io as a first argument always. Is there a case where we have two ios other than copy?

asterite on 20 Apr 2020

There's FileUtils.cmp(stream1 : IO, stream2 : IO) for example.

But I really don't see how it helps us moving forward to know what methods currently exist. That can serve as a rough baseline for consideration, but new methods are being added all the time.

straight-shoota on 20 Apr 2020

Just my opinion: I think Base64.encode(io, what) looks just fine. I think using the IO overload is more low-level, so it's fine if we want to have a convention where IO always comes first. It shouldn't affect a lot of users.

asterite on 20 Apr 2020

@asterite We also have the uncontested convention to place source IO first. If both source and sink are passed as arguments, both conventions contradict. So the convention should be that at callsite the sink follows the source.

straight-shoota on 20 Apr 2020

Where is there source IO other than copy?

asterite on 20 Apr 2020

Another alternative could be to have an IO required named argument.

asterite on 20 Apr 2020

👍1

Lot's of methods read from an IO (.parse, .from_io, etc.) and the source IO is the first argument always.

Another alternative could be to have an IO required named argument.

How would that be any better than #9134? It's a breaking change to a lot of methods and results in unneccesarily verbose call syntax with no apparant benefit. Considere there are the hundreds of #to_s etc. that only have a single argument.

I don't understand why we're coming back to debating alternatives that as far as I'm concerned have been rejected before and have no apparent advantage over the current proposal. Apart from being a breaking change, there is no downside to that. And a breaking change of that size will be necessary for any standardization, and outweighs the current inconsitencies.

I'm convinced #9134 is the correct way to go forward.

straight-shoota on 20 Apr 2020

But parse and from_io read from an IO. It's different. I was asking what methods we have that write to an IO where there would be a confusion if IO was the first argument.

And just to be clear, I think that PR is fine. I just don't understand why Base64 encode is an exception. I'd move io to the first position too there

asterite on 20 Apr 2020

We also have the uncontested convention to place source IO first.

Please, I'm against it, said it out loud, and proposed a better rationale instead of blindly enforcing a rule that will regularly make no sense. For example:

Enumerable#join(io, sep) # urgh
Base64.encode(io, what) # does it read from `io` ?!

Putting the io argument _last_ leads to a _better_ API.
Moving optional arguments to kwargs would improve the API even more.

ysbaddaden on 20 Apr 2020

@ysbaddaden My comment about undisputed convention was about source IO, not sink.

It seems you missed some of the recent discussion, otherwise I can't understand what you could possibly mean. Please show a single concrete example where the current proposal as of #9134 makes no sense. It follows the convention sink follows the source at callsite.

The Base64.encode example has been addressed in https://github.com/crystal-lang/crystal/issues/5916#issuecomment-616183833 and doesn't apply.

Putting the io argument last leads to a better API.

I disagree. Your suggestion would lead to the sequential order at callsite source.process(*required_options, sink, **optional_options). That makes no sense at all and is in no way better than my proposal.

Moving optional arguments to kwargs would improve the API even more.

Some optional arguments are completely fine as positional arguments. For example:

Enumerable#join(io : IO, separator = "")
String#ljust(io : IO, char = ' ')
Time#to_s(io : IO, format : Format)

They should not need to be named arguments because their purpose is quite clear from the context.

straight-shoota on 20 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Float JSON representation is wrong

RX14 · 3Comments

Allow accessing named tuples with strings

costajob · 3Comments

Tuple of tuples fails to parse

pbrusco · 3Comments

Rewrite sigfault.c in Crystal

asterite · 3Comments

Can't use Time in NamedTuple with .to_h conversion

relonger · 3Comments