Julia: naming: setdiff, diff, gradient

Created on 8 Mar 2018 · 26Comments · Source: JuliaLang/julia

I think there are a couple issues with these names:

They have a common suffix but are unrelated.
Every other function that begins with set involves assigning something, e.g. setenv, setfield, setrounding

Other possible names for setdiff: difference, complement, without, delete, remove

Other possible names for diff: finitediff, differences, decumulate

arrays collections deprecation

Source

JeffBezanson

Most helpful comment

The problem with \ is that if I see A \ B in Julia code I assume it is a linear solve.

Honestly, I think setdiff is fine. I'm doubtful that this operation is so common in Julia code that it needs a binary operator.

stevengj on 9 Mar 2018

👍10

All 26 comments

Ref. https://github.com/JuliaLang/julia/issues/20041#issuecomment-355351429 and downstream discussion regarding diff. Best!

Sacha0 on 8 Mar 2018

The standard notation for this is A \ B, and some authors use A - B. Would one of those be too confusing?

stevengj on 8 Mar 2018

👍1

diff: I prefer differences from the proposals. decumulate sounds more general, while I feel, it should be a kind of inverse of cumsum. Then also differences(cumsum(x)) == x or cumsum(differences(x)) == x should hold.

setdiff: I would tend to difference and make \ an alias for that. At the same time deprecate all set operations when applied to Tuple{AbstractVector,AbstractVectorVector}. See also https://en.wikipedia.org/wiki/Set_(mathematics)#Complements

KlausC on 8 Mar 2018

I’m not sure there’s another reasonable short name for the set difference operation; calling this setdiff is quite standard. That said, I’d like an operator for this. I’d be happy to ditch diff and introduce decumulate which is the proper inverse of accumulate.

StefanKarpinski on 8 Mar 2018

I vote strongly against decumulate .. for a non-english speaker this sounds like something Sheldon Cooper from big bang theory would suggest.
diff in Matlab and probably in R and probably in NumPy means finite differencing .. please don't change that name.

If you must change then I vote for finitediff but thats is wrong too because there isn't an infinite diff,
also there isn't any diff by factor 2 or 3, the later cases are covered by gradient so another option is to deprecate diff entirely and rely on gradient instead

TsurHerman on 8 Mar 2018

👍3

I would support a general unaccumulate (which I think is a slightly better word than decumulate, since decumulate still sounds like it's cumulative but just doing the inverse operation), but I'd then also want to change the definition such that:

# pseudocode for one-dimension
unaccumulate(op, A) = unaccumulate(op, A[1], A)
unaccumulate(op, v0, A) = [v0, (op(A[i], A[i-1]) for i in 2:length(A))...]

With this definition, A == accumulate(op, unaccumulate(op, A)) == unaccumulate(op, accumulate(op, A)).

I think this is a bit of a tangent, though, since, like cumsum, we could still provide a helper function with a nicer name. In that vein, maybe uncumdiff :trollface:?

mbauman on 8 Mar 2018

If the names are getting a bit long it seems fine to me to use accum instead of accumulate.

JeffBezanson on 8 Mar 2018

👍 for Matt's general unaccumulate/decumulate definition. Nice to mitigate the common off-by-one-errors that occur with diff. Best!

Sacha0 on 8 Mar 2018

I'm skeptical that anyone will comprehend a neologism like unaccumulate, and I'm skeptical of the need for something generic here when people only really seem to use this with -…

+1 for merging diff into gradient, with keyword arguments to specify the different possible boundary conditions and output lengths.

Note also that @mbauman's definition of unaccumulate is not equivalent to diff or gradient if you look at the first element — you'd always end up discarding that with popfirst! or something. Seems weird to compute something via a definition that probably no one would actually want (if they are coming from diff)…

stevengj on 8 Mar 2018

👍1

(Note that gradient is deprecated.)

Sacha0 on 8 Mar 2018

😄2

I've used diff _a ton_ in Matlab where you need to do all sorts of contortions to vectorize code. I find I don't reach for it nearly as often in Julia. Looking back through my crusty old code (which includes 100k+ LOC written by myself and other engineers), I find that we used it in predominantly three different ways:

Because I actually want to look at the sequential differences: mean(diff(A)) or median(diff(A)).
Because I want to relate it back to the original array. In these cases we almost always used explicit padding like find(diff([0 waveform]) ~= 0) or [1 diff(A)] or add .5 to their "axis" so they plot in-between elements.
As a shorthand for A[2] - A[1] or subtracting two columns with diff(A(:,1:2),[],2) for A[:,2] .- A[:,1]

The third one is by far the most common (I'd estimate roughly 80%)… but it's largely irrelevant in Julia since diff([x,y]) isn't y-x but rather [y-x].

Uses 1 and 2 are actually quite distinct from each other, and the conflation between the two has been a large source of bugs in my experience. Here's one of my favorite examples, which includes both use-cases and two different ways of dealing with off-by-one errors:

A([1; find(diff(A)>2*median(diff(A)))+1])

So, yes, my desire to change the behavior of diff was very intentional… but upon reflection I'm not sure it's as big of a win as I had thought.

mbauman on 8 Mar 2018

❤1

I think I've seen python overload arithmetic for sets. Maybe + for union, * for intersect, - for setdiff, etc?

bramtayl on 9 Mar 2018

Python uses | for set union, & for set intersection, - for set difference, and ^ for the symmetric difference. These are aliases for union, intersection, difference, and symmetric_difference.

Julia has synonyms for ∪ (\cup) for union and ∩ (\cap) for intersect. The set-difference operator in Unicode is ∖ (U+2216), but this is so similar in appearance to \ (backslash) that I doubt we would want ∖ as a distinct operator. (Currently it is unsupported by the parser, I think for this reason.)

I think we don't currently have a symmetric difference function for sets?

stevengj on 9 Mar 2018

I propose to use backslash (like reverse division) and deprecate all set operations on AbstractArray (currently set operations with vectors are allowed).

KlausC on 9 Mar 2018

The problem with \ is that if I see A \ B in Julia code I assume it is a linear solve.

Honestly, I think setdiff is fine. I'm doubtful that this operation is so common in Julia code that it needs a binary operator.

stevengj on 9 Mar 2018

👍10

Personally, I use things like mean(diff(A)) a lot, especially to take a quick look at data properties, etc. It would be nice to keep this (even under a different name).

oschulz on 20 Mar 2018

Tangential, but I'm a little puzzled by the interest in mean(diff(A)) as it can be optimized to (A[end] - A[1]) / (length(A) - 1) and thus only depends on the endpoints and length of A (or slightly more complicated if A is more than one-dimensional).

GunnarFarneback on 20 Mar 2018

👍1

True, but that was just an example (since @mbauman mentioned it, together with median). For median(diff(A)), var(diff(A)), etc., there isn't such a simple shortcut. :-)

oschulz on 21 Mar 2018

Triage decided not to do anything here.

StefanKarpinski on 22 Mar 2018

Came across this issue today while writing some set differences in some of my own packages. No chance to reconsider the - (minus) or \ (backslash) notation for Set types only? I understand the possible confusion when there are both vectors (as in vector spaces) and sets in the same line of code, but does that confusion eliminate the usefulness of the notation for sets? As mentioned above, we have the nice union and intersection notation. Adding the set difference to these symbols would be great.

(A ∪ B) - (A ∩ B)

reads so much nicer than

setdiff(A ∪ B, A ∩ B)

juliohm on 4 Jan 2020

I will frequently, if I'm doing a lot of setdiff operations just locally define const \ = setdiff and then use backslash. This work pretty well since it's uncommon to do a lot of linear solves in the same codebase as one does lots of set diffs.

While it's somewhat tempting for the syntax, I'm pretty uncomfortable trying to actually stuff setdiff into either - or \: it's clearly not related to the \ operation on matrices and if you have - for setdiff then it's so very tempting to have + for set unions, which is a slippery slope I really don't think we want to go down since sets form a lattice rather than an abelian group.

StefanKarpinski on 5 Jan 2020

It's rather unfortunate that the set minus symbol ∖ looks identical to backslash symbol \ in so many fonts. If it were reliably double wide, then I think it would be fine to give it this meaning.

StefanKarpinski on 5 Jan 2020

I like the setdiff unicode. If we accept the argument that the Unicode can
be confused with backslash and therefore confuse readers of code that
contains both vectors and sets, we also have to accept the argument that
any two types defining + will be confusing in the same context. This is
argument is not very strong in my opinion. People can always type the
explicit setdiff if they think that in some contexts it could be confused
with backslash.

On Sun, Jan 5, 2020, 12:16 Stefan Karpinski notifications@github.com
wrote:

It's rather unfortunate that the set minus symbol ∖ looks identical to
backslash symbol in so many fonts. If it were reliably double wide,
then I think it would be fine to give it this meaning.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/26361?email_source=notifications&email_token=AAZQW3PTM4LWZZXWQ3QN54DQ4H2WJA5CNFSM4EUHOA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIDZBHQ#issuecomment-570921118,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZQW3NJNUS5WAKSLYGUHNLQ4H2WJANCNFSM4EUHOA2Q
.

juliohm on 5 Jan 2020

It's quite important for generic programming to work in Julia that we are strict about generic functions having a single meaning. So far that is working remarkably well. Let's not throw it out just so that we can have a little syntactic convenience.

StefanKarpinski on 5 Jan 2020

👍1

You mean introducing the unicode syntax would restrict the meaning of setdiff? How does it restrict meaning, I don't get it, appreciate if you can clarify.

juliohm on 5 Jan 2020

There's no problem with introducing a Unicode symbol for setdiff. However, the obvious Unicode character,∖, looks almost identical to \, which could potentially lead to confusion.

StefanKarpinski on 6 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Taking string concatenation seriously, or, a proposal to deprecate *, ^ for string concatenation and repetition

quinnj · 179Comments

crazy idea: change the `type` keyword

JeffBezanson · 167Comments

Arraypocalypse Now and Then

mbauman · 276Comments

syntax: separate array concatenation from array construction

StefanKarpinski · 145Comments

multiline comments

StefanKarpinski · 113Comments