I think there are a couple issues with these names:
set
involves assigning something, e.g. setenv, setfield, setroundingOther possible names for setdiff
: difference, complement, without, delete, remove
Other possible names for diff
: finitediff, differences, decumulate
Ref. https://github.com/JuliaLang/julia/issues/20041#issuecomment-355351429 and downstream discussion regarding diff
. Best!
The standard notation for this is A \ B
, and some authors use A - B
. Would one of those be too confusing?
diff
: I prefer differences
from the proposals. decumulate
sounds more general, while I feel, it should be a kind of inverse of cumsum
. Then also differences(cumsum(x)) == x
or cumsum(differences(x)) == x
should hold.
setdiff
: I would tend to difference
and make \
an alias for that. At the same time deprecate all set operations when applied to Tuple{AbstractVector,AbstractVectorVector}
. See also https://en.wikipedia.org/wiki/Set_(mathematics)#Complements
I’m not sure there’s another reasonable short name for the set difference operation; calling this setdiff
is quite standard. That said, I’d like an operator for this. I’d be happy to ditch diff
and introduce decumulate
which is the proper inverse of accumulate
.
I vote strongly against decumulate
.. for a non-english speaker this sounds like something Sheldon Cooper from big bang theory would suggest.
diff in Matlab and probably in R and probably in NumPy means finite differencing .. please don't change that name.
If you must change then I vote for finitediff
but thats is wrong too because there isn't an infinite diff,
also there isn't any diff by factor 2 or 3, the later cases are covered by gradient
so another option is to deprecate diff
entirely and rely on gradient
instead
I would support a general unaccumulate
(which I think is a slightly better word than decumulate
, since decumulate
still sounds like it's cumulative but just doing the inverse operation), but I'd then also want to change the definition such that:
# pseudocode for one-dimension
unaccumulate(op, A) = unaccumulate(op, A[1], A)
unaccumulate(op, v0, A) = [v0, (op(A[i], A[i-1]) for i in 2:length(A))...]
With this definition, A == accumulate(op, unaccumulate(op, A)) == unaccumulate(op, accumulate(op, A))
.
I think this is a bit of a tangent, though, since, like cumsum
, we could still provide a helper function with a nicer name. In that vein, maybe uncumdiff
:trollface:?
If the names are getting a bit long it seems fine to me to use accum
instead of accumulate
.
👍 for Matt's general unaccumulate
/decumulate
definition. Nice to mitigate the common off-by-one-errors that occur with diff
. Best!
I'm skeptical that anyone will comprehend a neologism like unaccumulate
, and I'm skeptical of the need for something generic here when people only really seem to use this with -
…
+1 for merging diff
into gradient
, with keyword arguments to specify the different possible boundary conditions and output lengths.
Note also that @mbauman's definition of unaccumulate
is not equivalent to diff
or gradient
if you look at the first element — you'd always end up discarding that with popfirst!
or something. Seems weird to compute something via a definition that probably no one would actually want (if they are coming from diff
)…
(Note that gradient
is deprecated.)
I've used diff
_a ton_ in Matlab where you need to do all sorts of contortions to vectorize code. I find I don't reach for it nearly as often in Julia. Looking back through my crusty old code (which includes 100k+ LOC written by myself and other engineers), I find that we used it in predominantly three different ways:
mean(diff(A))
or median(diff(A))
.find(diff([0 waveform]) ~= 0)
or [1 diff(A)]
or add .5 to their "axis" so they plot in-between elements.A[2] - A[1]
or subtracting two columns with diff(A(:,1:2),[],2)
for A[:,2] .- A[:,1]
The third one is by far the most common (I'd estimate roughly 80%)… but it's largely irrelevant in Julia since diff([x,y])
isn't y-x
but rather [y-x]
.
Uses 1 and 2 are actually quite distinct from each other, and the conflation between the two has been a large source of bugs in my experience. Here's one of my favorite examples, which includes both use-cases and two different ways of dealing with off-by-one errors:
A([1; find(diff(A)>2*median(diff(A)))+1])
So, yes, my desire to change the behavior of diff
was very intentional… but upon reflection I'm not sure it's as big of a win as I had thought.
I think I've seen python overload arithmetic for sets. Maybe + for union, * for intersect, - for setdiff, etc?
Python uses |
for set union, &
for set intersection, -
for set difference, and ^
for the symmetric difference. These are aliases for union
, intersection
, difference
, and symmetric_difference
.
Julia has synonyms for ∪
(\cup
) for union
and ∩
(\cap
) for intersect
. The set-difference operator in Unicode is ∖
(U+2216), but this is so similar in appearance to \
(backslash) that I doubt we would want ∖
as a distinct operator. (Currently it is unsupported by the parser, I think for this reason.)
I think we don't currently have a symmetric difference function for sets?
I propose to use backslash (like reverse division) and deprecate all set operations on AbstractArray
(currently set operations with vectors are allowed).
The problem with \
is that if I see A \ B
in Julia code I assume it is a linear solve.
Honestly, I think setdiff
is fine. I'm doubtful that this operation is so common in Julia code that it needs a binary operator.
Personally, I use things like mean(diff(A))
a lot, especially to take a quick look at data properties, etc. It would be nice to keep this (even under a different name).
Tangential, but I'm a little puzzled by the interest in mean(diff(A))
as it can be optimized to (A[end] - A[1]) / (length(A) - 1)
and thus only depends on the endpoints and length of A
(or slightly more complicated if A
is more than one-dimensional).
True, but that was just an example (since @mbauman mentioned it, together with median
). For median(diff(A))
, var(diff(A))
, etc., there isn't such a simple shortcut. :-)
Triage decided not to do anything here.
Came across this issue today while writing some set differences in some of my own packages. No chance to reconsider the -
(minus) or \
(backslash) notation for Set types only? I understand the possible confusion when there are both vectors (as in vector spaces) and sets in the same line of code, but does that confusion eliminate the usefulness of the notation for sets? As mentioned above, we have the nice union and intersection notation. Adding the set difference to these symbols would be great.
(A ∪ B) - (A ∩ B)
reads so much nicer than
setdiff(A ∪ B, A ∩ B)
I will frequently, if I'm doing a lot of setdiff operations just locally define const \ = setdiff
and then use backslash. This work pretty well since it's uncommon to do a lot of linear solves in the same codebase as one does lots of set diffs.
While it's somewhat tempting for the syntax, I'm pretty uncomfortable trying to actually stuff setdiff into either -
or \
: it's clearly not related to the \
operation on matrices and if you have -
for setdiff then it's so very tempting to have +
for set unions, which is a slippery slope I really don't think we want to go down since sets form a lattice rather than an abelian group.
It's rather unfortunate that the set minus symbol ∖
looks identical to backslash symbol \
in so many fonts. If it were reliably double wide, then I think it would be fine to give it this meaning.
I like the setdiff unicode. If we accept the argument that the Unicode can
be confused with backslash and therefore confuse readers of code that
contains both vectors and sets, we also have to accept the argument that
any two types defining + will be confusing in the same context. This is
argument is not very strong in my opinion. People can always type the
explicit setdiff if they think that in some contexts it could be confused
with backslash.
On Sun, Jan 5, 2020, 12:16 Stefan Karpinski notifications@github.com
wrote:
It's rather unfortunate that the set minus symbol ∖ looks identical to
backslash symbol in so many fonts. If it were reliably double wide,
then I think it would be fine to give it this meaning.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/26361?email_source=notifications&email_token=AAZQW3PTM4LWZZXWQ3QN54DQ4H2WJA5CNFSM4EUHOA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIDZBHQ#issuecomment-570921118,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZQW3NJNUS5WAKSLYGUHNLQ4H2WJANCNFSM4EUHOA2Q
.
It's quite important for generic programming to work in Julia that we are strict about generic functions having a single meaning. So far that is working remarkably well. Let's not throw it out just so that we can have a little syntactic convenience.
You mean introducing the unicode syntax would restrict the meaning of setdiff? How does it restrict meaning, I don't get it, appreciate if you can clarify.
There's no problem with introducing a Unicode symbol for setdiff. However, the obvious Unicode character,∖
, looks almost identical to \
, which could potentially lead to confusion.
Most helpful comment
The problem with
\
is that if I seeA \ B
in Julia code I assume it is a linear solve.Honestly, I think
setdiff
is fine. I'm doubtful that this operation is so common in Julia code that it needs a binary operator.