Hi,
In R, there is a way to perform a rolling join as shown below.
library("data.table")
DT <-
data.table(
x = rep(c("a", "b", "c"), each = 3),
y = c(1, 3, 6),
v = 1:9)
setkey(DT, x, y) # Necessary for following self join.
DT[J("a", 4:6)]
DT[J("a", 4:6), roll = TRUE]
The relevant output looks like this (J is shorthand for self join):
DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
DT[J("a", 4:6)] # v columns does not have "2".
x y v
1: a 4 NA
2: a 5 NA
3: a 6 3
DT[J("a", 4:6), roll = TRUE] # v column rolls "2" forward.
x y v
1: a 4 2
2: a 5 2
3: a 6 3
In Pandas, the closest that I could think of is to use Join then DataFrame,fillna(method='pad') . But this will not yield the above R rolling join result since the first 2 rows would be NA in the joined column (it starts with (a,3) and not (a,2))
I am just wondering whether there is an equivalent operation in Pandas for this?
Kind regards,
Kris
see https://github.com/pydata/pandas/issues/2996 and #5401
Hi,
I have taken a look at both of the above mentioned issues, but it is not really what this question is about.
Below is an equivalent code for Python using Pandas, as my first post was for R
import pandas as pd, numpy as np
df = pd.DataFrame( np.hstack(([['a']]_3 + [['b']]_3 +[['c']]*3 , [[1],[3],[6]] * 3 , [ [i] for i in range(1,10)])))
df = df.set_index([0,1])df2 = pd.DataFrame( np.hstack(([['a']]*3 , [ [i] for i in range(4,7)] )))
df2 = df2.set_index([0,1])



So my question is whether there is an operator that is equivalent to R join argument roll = True above?
eg, something like df2.join(df, roll=True), and this gives:

the 2 in a,4 and a,5 comes from the fact that the original df.ix[a,3] is 2, which is the index right before df.ix[a,6] . In other words, this is a rolling join - rolling upward.
Looking at this another way, rolling join is similar to using join, then fillna(method='pad') but the slight difference is that the first few rows would still end up being NA, like the above case.
Thanks again,
Kris
closing as stale. pls reopen if still an issue.
join + fillnaA rolling join is _not_ the same as a join and a fillna! That would only be the case if the key of the table that is joined against (in terms of data.table that would be the left table and a right-join) has equivalents in the main table. A data.table rolling join does not require this.
There is a solution though in pandas. Let's assume your right data.table is table A and your left data.table is table B.
tag to A which are all 0 and a column tag to B that are all 1.tagfrom B (can be omitted, but it is clearer this way) and call the table B'. Keep B as an original - we are going to need it later.C = C.assign(groupNr = np.cumsum(C.tag))query) on tag get rid of all B'-rows.groupNr to the original B (integers from 0 to N-1 or from 1 to N, depending on whether you want forward or backward rolling join).groupNr.R newbie here. @Make42 Could you please provide short python code snippet with all steps or at least clarify step 7 because I am not sure I get it.
@PaluchowskiMatthew: I did an example for you. Maybe you can give me some points on stackoverflow: http://stackoverflow.com/a/43539437/4533188
@Make42 Awesome thanks! +1 on stackoverflow