Data.table: Row order of joins changed from 1.10.4 to 1.10.5

Created on 7 Jan 2018  Â·  6Comments  Â·  Source: Rdatatable/data.table

In the news to 1.10.5, it says that row-ordering of non-equi joins has been fixed. However, I can't find any mention that also the row order of equi joins is changed if an index with more than the join columns is present (see also example below):

  • 1.10.4 Row order of additional columns is potentially changed.
  • 1.10.5 Row order of all columns is maintained.

I like the new behaviour, in fact, I was astonished, when I first saw the old behaviour, but I think this should be mentioned in the NEWS since it is a potentially breaking change.

library(data.table)
dt <- data.table(x = c(1,1), y = c(2,1))
setindex(dt, x, y)
dt[J(x=1), on = "x==x"]
## in 1.10.4
#    x y
# 1: 1 1
# 2: 1 2

## in 1.10.5
#    x y
# 1: 1 2
# 2: 1 1
duplicate

Most helpful comment

@arunsrinivasan how embarassing that I changed it myself wiithout noticing . . . Thanks so much for pointing out.
I will add the example as a test and modify the news entry as well to make clear that row order can be changed.

All 6 comments

agree old behavior seems like a bug (?)

maybe worth adding a dedicated issue and closing it with the relevant
commit & mentioning in NEWS as such

On Jan 6, 2018 11:45 PM, "MarkusBonsch" notifications@github.com wrote:

In the news to 1.10.5, it says that row-ordering of non-equi joins has
been fixed. However, I can't find any mention that also the row order of
equi joins is changed if an index with more than the join columns is
present. See here:

library(data.table)
dt <- data.table(x = c(1,1), y = c(2,1))
setindex(dt, x, y)
dt[J(x=1), on = "x==x"]

in 1.10.4

x y

1: 1 1

2: 1 2

in 1.10.5

x y

1: 1 2

2: 1 1

I like the new behaviour, in fact, I was astonished, when I first saw the
old behaviour, but I think this should be mentioned in the NEWS since it is
a potentially breaking change.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/Rdatatable/data.table/issues/2559, or mute the thread
https://github.com/notifications/unsubscribe-auth/AHQQdSZf4qrZh1r97Ot1GkJPdZmrI0Gtks5tIEvfgaJpZM4RViWy
.

This seems to be an issue due to exact=TRUE not set to attr() while extracting indices, which seems to have been fixed in https://github.com/Rdatatable/data.table/commit/e334ed1b7c4b9d4fda605869e7e578e3bfa3e64a#diff-650e4a11ed4384f8e560a6ebfff4ff53 (which @MarkusBonsch seems to have committed, and there's a NEWS entry as well)...

Can be closed. Useful to add this example as a test to catch any future regressions.

@arunsrinivasan how embarassing that I changed it myself wiithout noticing . . . Thanks so much for pointing out.
I will add the example as a test and modify the news entry as well to make clear that row order can be changed.

@MarkusBonsch no worries :-). Glad you caught it and reported it!

@arunsrinivasan Great. Not sure you saw that Markus added you as reviewer on that PR for this one. I looked too and looks good, and since you've commented here too, I'll merge it then.

@mattdowle no, I didn't. Thanks for letting me know. Do we get notified if added as reviewer? Will keep an eye hereafter.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alex46015 picture alex46015  Â·  3Comments

st-pasha picture st-pasha  Â·  3Comments

mattdowle picture mattdowle  Â·  3Comments

andschar picture andschar  Â·  3Comments

rafapereirabr picture rafapereirabr  Â·  3Comments