Julia: Proposal: Deprecate then remove function piping

Created on 30 Jan 2017  Â·  37Comments  Â·  Source: JuliaLang/julia

Proposal

Deprecate the current use of |> as a function pipe. That is, the syntax x |> f would be deprecated in favor of the normal call syntax f(x). After the deprecation period, Base.:(|>) would be undefined.

This change was initially suggested by tkelman in https://github.com/JuliaLang/julia/issues/16985#issuecomment-227015399.

There has been a lot of contentious debate over various syntaxes for function piping (in particular, see #5571), with arguments for mimicking a variety of languages. That discussion has been had _ad nauseum_ and I do not wish to rehash it. That is NOT the purpose of this proposal.

Rationale

A number of well thought out, well maintained packages have implemented macros that provide convenient piping syntax for a variety of use cases, both general and specific. Examples include Lazy.jl, FunctionalData.jl, Pipe.jl, and ChainMap.jl, among others.

StefanKarpinski and andyferris gave us arbitrary function composition in #17155, which can serve a similar purpose in many situations.

As tkelman similarly argued in #5571, the function pipeline in Base is backwards from the familiar call syntax; having both in the Base language is essentially endorsing the use of 2 disparate syntaxes to achieve the same goal. While there are often multiple ways to write the same thing using solutions in Base, typically the solutions at least adhere to a similar mental model. In this case, the syntaxes employ literally opposite mental models.

Function pipelines violate the principle of least surprise by applying the action after the object. That is, if you read sum(x) you know immediately when you see sum() that you're going to add up the values in the argument. When you see x |> sum, you see x, then all of a sudden you're adding up its values. Few if any other Base solutions put the action at the end, which makes piping the odd one out.

Piping does indeed have precedent in other languages, e.g. Hadley Wickham's %>% in R (which is not part of base R), and sometimes that style/flow makes sense. However, in the interest of consistency within Base Julia, I propose that we defer the responsibility for providing piping syntax to packages, which can redefine |> or provide convenience macros as they see fit.

Action Items

Should this proposal be accepted, the action items would be:

  • [ ] Remove uses of the syntax within Base, if any exist
  • [ ] Provide a formal deprecation for Base.:(|>) in either 0.6 or 1.0
  • [ ] Remove it in a subsequent release
deprecation design julep

Most helpful comment

If we're deprecating this should we also deprecate * for string concatenation? That has similar issues as it's with redundant with string(a, b), and violates the principle of least surprise given that a and b aren't numbers.

More generally, we should probably deprecate all infix notation, as it's confusing to have multiple calling conventions like *(a, b) vs a * b – we can trim our current 3 disparate syntaxes down to one and get total consistency. To avoid ugliness we might consider moving the function call inside the parens, and perhaps getting rid of the redundant commas, as well.

All 37 comments

Function piping provides a postfix syntax for function calling, which is convenient at the REPL for interactive data generation and further visualization/summarization.

A use case that I have seen many people type is

julia> somecomplicatedthingproducingarray
...

<ARROW UP>

julia> somecomplicatedthingproducingarray |> summarize

where the summarize function is something like a plot or histogram

@jiahao I'm not arguing that it's not useful, but rather that we should be consistent within Base and let packages provide things like this.

there's also ans for repl usage

In this proposal would |> still be parsed as an infix operator?

@ajkeller34: definitely, packages would be free to do whatever they want with it (though they'd have to play nicely with each other in terms of type piracy and coexistence), without as much of a constraint of being semantically compatible with the old base definition.

Remove uses of the syntax within Base, if any exist

Here's a now-very-outdated attempt I made to do this: https://github.com/tkelman/julia/commit/212727cdc4aaa3221763580f15d42cfe198bcc1c
At the time, most of the uses in base were pretty trivial. A few of the tests' uses of "pipe this thing to this anonymous function" are maybe nicer with piping, but since most of those were reusing the same anonymous function multiple times it would probably be worth giving it a name and calling it like a normal function at that point.

In case anyone is curious, I have ChainRecursive.jl out now. I'll put an announcement on discourse about the disintegration of ChainMap.jl and its various children once it's complete.

Let me offer some resistance here since I have some vested interest and a particular liking to what |> makes possible.

I second with @jiahao that |> is very useful when you want to quickly try things out in the REPL. Further, I find it also useful when your argument is too big or merits some poise (yes, I said that). In the case of the linked example, it is in fact better to have the argument be more prominent than the function being called. sum(x) is too simple an example, and should indeed be written as sum(x)). In Escher.jl all functions that add properties to elements have a curried method. This dovetails so well with |> (that was planned, it also works great with map) and it's a joy to be able to try things out at the end of the line and see the UI update immediately. I don't have to find my way to the beginning of the expression and faff around. For use with Escher at least, the suggested alternative is to assign big expressions to variables of made up names like padded_box_contents_aligned_right_tomato_background (or worse box34) and then call a function on them. As opposed to the beautifully reading <big UI expression> |> aligncontents(right) |> pad(1em) |> fillcolor("tomato")

I know that after this I can define |> inside Escher and I probably will, but it will kill my brain to see WARNING: using Escher.|> in module YourPackage conflicts with an existing identifier. Packages will almost definitely give different meanings to this, which to me is very alarming!

StefanKarpinski and andyferris gave us arbitrary function composition in #17155, which can serve a similar purpose in many situations.

The alternative to box |> fill("orange") |> pad(2em) would be (fill("orange") ∘ pad(2em))(box) as opposed to box |> fill("orange") ∘ pad(2em)? These two seem orthogonal.

Escher's use of closures as objects seems to me like it's defining a DSL just for the sake of using this syntax (which has serious limitations for anything that isn't single-input, single-output), where it would likely be better-served, and more generalizable, if it used one of the multiple available chaining macros.

Removing Base's definition of this would allow people who like this syntax to do more interesting things with it.

@shashi I understand your points, but you would be able to get the same behavior using one of the packages I cited in the issue, would you not? As an example, in your Escher example, you could use FunctionalData to do @p vbox(<really big thing>) | pad(2em) or Lazy to do @> vbox(...) pad(2em).

Removing Base's definition of this would allow people who like this syntax to do more interesting things with it.

Except it will not be usable, since the only safe way to use it then would be Escher.|>(...) or Lazy.|>(...).

Hypothetically, how would one use |> as an infix operator if you're using two different packages that both define and export it, assuming it's not defined in Base?

@kmsquire It depends on the use case. |> would still be parsed as an infix operator just as it is now, it just wouldn't have a value in Base. If you use it in a macro, it doesn't matter how any particular package defines it, since it simply becomes the first argument in a call expression.

Take for example <|, which is parsed as an infix operator but does not have a value. Even though it's undefined, we still have

julia> dump(:(a <| b))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol <|
    2: Symbol a
    3: Symbol b
  typ: Any

Packages can define and export methods for Base.:(<|) that mean different things, just as one can do with +.

But the packages that provide nice function piping do so in macros, I assume for precisely this reason.

FWIW, no chaining package would need to make use of |> during evaluation because during chaining everything gets zipped up into one expression. I'd imagine if packages do go defining |> it will be precisely the definition in base. Although they should probably be using a chaining macro instead. See DataFramesMeta for a good example of how to build an interface that works well with chaining.

If we're deprecating this should we also deprecate * for string concatenation? That has similar issues as it's with redundant with string(a, b), and violates the principle of least surprise given that a and b aren't numbers.

More generally, we should probably deprecate all infix notation, as it's confusing to have multiple calling conventions like *(a, b) vs a * b – we can trim our current 3 disparate syntaxes down to one and get total consistency. To avoid ugliness we might consider moving the function call inside the parens, and perhaps getting rid of the redundant commas, as well.

|> would still be parsed as an infix operator just as it is now, it just wouldn't have a value in Base. If you use it in a macro, it doesn't matter how any particular package defines it

Still not sure why we need to remove it from Base.

@bramtayl makes a good point:

I'd imagine if packages do go defining |> it will be precisely the definition is base.

And still the only way to use more than one package which define this is to not use it infix.

I don't see why removing the definition in Base is required for |> to be used inside macros.

It isn't. My point is that |> can be used inside of macros regardless of the situation in Base. The same goes for any operator that parses appropriately. The point of the proposal is to make Base self-consistent in terms of function calls, then piping behavior can be achieved through packages. Whether the packages use |> in particular doesn't matter; they could just as well use <| or literally any other infix operator.

@ararslan right, that was not what I meant to ask, I updated my comment right after, sorry.

Anyway, I don't quite get the "Base self-contained in terms of function calls" sentiment. Seems like this will only make it harder to use |> in a non-macro context. I personally believe |> is a worthwhile thing to learn about for a newbie, despite it being surprising. It at least saves effort at the REPL. It's quite fun to realize later that |> is a function just like any other infix function and reinforces the lesson that functions are just values.

Maybe just something formal: Please discuss/decide deprecations and/or syntax changes at the beginning of a release cycle, not at the end. Currently all the main developers and package responsible spend time and energy on finishing 0.6 and they just might have no time to think of another (good) idea.

"I'm not arguing that it's not useful, but rather that we should be consistent"

Sometimes usefulness beats consistency? I wasn't aware of the inconsistency, but I have found the |> syntax useful. If it's removed I won't feel I've gained anything tangible.

An explanation for my thumbs-down vote, if I may:

Much of what's currently in Base could happen in packages instead. Should we move dictionaries to a package? Maybe list operations like sort and shuffle? Collections operations, etc.? I'm sure there have been long and detailed discussions concerning what should and shouldn't be included in Base, but I presume there are three reasons some functionality might be included in base:
1) That functionality is necessary to enable other features in base.
2) That functionality is an essential part of the language, many Julia programmers and packages will use it, and therefore it's desirable to have a single implementation/syntax that everyone agrees on, rather than the fragmentation of lots of people rolling their own.
3) Including that functionality in base makes "raw" Julia more pleasant to use or makes it feel more full featured, which helps with language evangelism and adoption.

Something like sum probably hits all 3 points, and I'd argue that function piping hits the second and third point:

In both the initial (well-written) proposal and the discussion in this thread, a common theme is the existence of several packages providing pipe-like functionality through macros: Lazy.jl, Pipe.jl, ChainMap.jl, etc. The existence of multiple packages strongly suggests that many people in the community find piping a useful and desirable feature, and these packages' presence in this discussion thread suggests that many folks here understand and support the use of piping.

Given that piping is a common and popular feature in the Julia community and other languages, even in this discussion people seem to agree that it has many uses, especially at the REPL (where Julia shines), and there's already fragmentation in the Julia ecosystem...my read is not that it should be removed from Base, but rather that the piping syntax available in Base should be enhanced so that there's less need for fragmentation. Different packages offering different ways of e.g. plotting seems okay; different popular packages offering different ways of applying functions seems pretty scary.

I further argue that removing piping from Base but leaving the infix operator around is rather surprising: in Julia you can't define your own infix operators, but there is an unused infix operator |> hanging around that you can define as you please? If that's good functionality, why not give us a solid 10 or 20 infix operators to define as we please?

Lastly, I believe it's natural to keep piping exactly because it is different from other function application. It's a feature, not a bug, that it's different from other conventions of applying functions; this difference is what lets it shine in some use cases. And there are other cases where (hand-waving a bit) the noun comes before the verb, and many of these are exactly syntactic sugar in cases where raw function application is unwieldy. Off the top of my head, assignment x = 5 is putting the noun (symbol x) before the verb (bind to a value). Likewise for accessing fields of types t.a instead of getfield. And most profoundly, array indexing z[5] reads like "from z take the 5th element" and is generally more natural than getindex(z, 5).

If that's good functionality, why not give us a solid 10 or 20 infix operators to define as we please?

There's probably more than that if one includes all unicode ones in addition to the unclaimed ASCII ones like <|, ++, ...

Not reading whole thread -- but just wanted to say
that I love being able to pipe. I would vote useful over
consistency any day.

I have a very mild preference to keep it, but don't really care so long as it remains an infix operator. I feel like I probably wouldn't use function piping if it entailed importing a package, which tells me that I don't value it very much.

That being said, I don't think this "principle of least surprise" argument is compelling, as it makes some presumptions about a diverse user base. To native speakers of subject-object-verb languages, I suppose most of Julia's syntax violates the principle of least surprise, and function piping is rather comfortable...

Not reading whole thread

😕

I love being able to pipe

Again, I'm not arguing that one should not be able to pipe, but rather that the functionality could easily be had instead in one of the several existing piping packages. Removing the Base pipe allows for packages to more easily define their own piping semantics without having to adhere to or remain consistent with whatever Base provides.

in Julia you can't define your own infix operators

That's not true; anything that parses as an infix operator can be defined or redefined. As martinholters pointed out, <| and ++ are similarly available, among others.

I'm kind of neutral on this one, but I will second the sentiment that |> being backwards from normal function call syntax is the whole point of it. Even the biggest fans of piping are not asking (AFAIK) for e.g. sin <| x because that really is redundant with sin(x). |> is for those cases where it's easier on the eyes and/or brain to think of data flowing left to right without lots of parentheses.

I'd like for |> to be more powerful, e.g. allow x |> f(_) + 2g(_) |> h etc. and for it not to just be an operator. Every time anyone defined x |> f to mean something besides f(x) it really trips me up because the whole point of the operator as we've used it is that it's a different-order call syntax. Since we can overload call I can't see a good reason for having x |> f mean something else.

@StefanKarpinski More powerful pipes can already be obtained using macros. See for example Pipe.jl, which provides exactly the syntax you're describing. As long as |> is an operator (I personally don't see |> as being worth a special case), macros can use any piping delimiter that parses infix, even if it isn't a :call. As an example, one could similarly use @~ to pipe (at least as of this writing). That level of flexibility is one of the advantages of using macros in Julia.

We could add the functionality of Pipe.jl to the language, and then you'd have it without needing to write @pipe.

The main reason to deprecate |> would be if we want to reclaim the syntax for some other purpose that people like much better.

I guess I'm trying to argue that piping doesn't need to be part of the language, it can (and already does) live in a package.

But if there's nothing else we want |> for, I see little harm in leaving its (trivial) definition alone.

I don't believe there are currently any proposals to repurpose |> in Base. My argument for not defining it in Base is that it gives us more consistency without loss of functionality.

Would any "more powerful piping" proposals or package implementations be made simpler by not having this existing definition to worry about or work around?

@ararslan "That's not true; anything that parses as an infix operator can be defined or redefined."

From the manual "&& and || operators", they are parsed but can't be redefined (it's a good thing). I believe the only exceptions.

The so-called "logical operators" && and || are infix. [unary binary relation] "operator" is IMHO the incorrect term for them as they aren't. Not is a similar way to the logical bitwise & and | that do allow overloading (something I'm not sure is a good choice).

@PallHaraldsson Those are control flow, not operators in the same sense as &, |, +, etc.

Let's try to stay on topic here if possible, please.

@tkelman That's a good point. I suspect we can make future piping syntax backwards-compatible though. For example, if _ is reserved then |> can have a special meaning when its arguments contain _, and otherwise do the same thing it does now.

There's another issue: to make |> work for your object, do you define |> or the "function call operator" (i.e. adding methods to it)? It might be cleaner if |> were built-in syntax for function call, to ensure f(x) and x |> f are always the same.

The consensus here is very clearly against, so I'll go ahead and close the issue. I appreciate the discussion, everyone.

I know this issue is closed. Just wanted to say "thank you" for keeping the operator.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

manor picture manor  Â·  3Comments

m-j-w picture m-j-w  Â·  3Comments

omus picture omus  Â·  3Comments

TotalVerb picture TotalVerb  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments