Several places in the available vignettes refer to this mysterious vignette about join and rolling join. When will it be up? Thanks
See #944 . There are many ideas for vignettes there.
If you want some examples, you could see stackoverflow.com or maybe my notes.
Yes, there's no time frame for this currently. See the #944 issue for progress.
Hi,
So..does this "joins & rolling joins" vignette exist..? since it's referred to in many of the existing (very helpful!) vignettes...but doesn't exist when searched for. need it! ...help!

@batcheneden please read above and follow #944 for updates.
@MichaelChirico thank you- Just wanted to make sure indeed there isn't an existing one , despite the reference in existing vignettes.
thanks!
As #944 is pretty epic issue, I would like to re-open this one, asking for joins vignette only. I've seen people asking for it also on twitter, lets try to have it for 1.13.0 then.
Noting for reference: joins vignette would be a good place to have an example of replacing nested ifelse with a join.
summarizing the scope
on fields)[f]ifelse and fcase, and lazyness of the latter one)on or with key, or pre-computing index - explain it is good to setkey/setindex if you are repeatedly joiningCopied from #944.
I have a question on joining by reference, while preparing the vignettes. The X[Y, new_col := old_col] performs something similar to a traditional left join on X. However, if there are multiple matches to Y's keys in X, only the last (or first?) matching value of the key is retained. ~Is this explicitly documented somewhere? I had tried searching for this back when I encountered it, but had to resort to my understanding of updating by reference for the reason~ Documented in set. For a reproducible example,
> X = data.table(a = c(1, 2, 3), m = c("a", "b", "c"))
> Y = data.table(b = c(1, 1, 4), n = c("x", "y", "z"))
> X[Y, new_col := i.n, on = "a == b"]
a m new_col
1: 1 a y
2: 2 b <NA>
3: 3 c <NA>
# an ideal left join - expected behaviour per a new user, given below
# not possible because updating row by reference isn't implemented
a m new_col
1: 1 a x
1: 1 a y
2: 2 b <NA>
3: 3 c <NA>
This is expected behaviour, but isn't exactly straightforward for a new user. mult does not impact the output either. Any suggestions on how I mention this in the vignette? Add merge as a workaround for a proper left join?
@jangorecki, given that #3453 is being prepared where a detailed overview of rolling joins is being covered by @Henrik-P, would it make sense to add separate vignettes for equi- and non equi- joins, as I believe the latter is far more relevant for time series analysis? The content of both vignettes at the moment will be significant given your scope above.
For Joins vignette:
https://github.com/Rdatatable/data.table/issues/2396
_Originally posted by @MichaelChirico in https://github.com/Rdatatable/data.table/issues/944#issuecomment-521710383_
@zeomal better to have 2 bigger vignettes, than 3 smaller IMO. We already have many vignettes.
@jangorecki, I've created a draft pull request for this vignette. It's a first version, bound to have many changes, but covers the basics of equi-joins. This is my first pull request ever, so if I've done something wrong, please correct me.
mult usage can nicely present _temporal joins_, like:
setkey(customer_addresses, customer_id, address_date)
customer_addresses[customer, mult="last", on="customer_id"]
customer_addresses[address_date<as.IDate("2020-01-01")
][customer, mult="last", on="customer_id"]
Probably better to use some know dataset, like airlines, whenever possible.
@jangorecki it's been a while since the mergelist and changes to nomatch have been there on the milestones, so I thought I might go ahead an complete the vignettes, at least from an equi-join standpoint. Currently, I'm using the nycflights13 package , but it looks like that isn't present on travis-ci and thus the build is failing. You also mentioned to use a known dataset - nycflights13 fits that criteria, yeah? Is there some alternate way I can link to the package or data that doesn't involve loading it (I'm not sure linking directly to the CSV files on Github is appropriate)?
@zeomal we have flights dataset in vignettes directory, see https://github.com/Rdatatable/data.table/blob/7aa22ee6b245b9308352acd66384373a99376c13/vignettes/datatable-intro.Rmd#L50-L55 for examples of usage.
@jangorecki correct me if I'm wrong - I think that this includes only the flights dataset of the series. That is, airlines, planes, airports, weather are missing - which would be needed to demonstrate joins. I don't see any of the other files here either, so let me know if I'm missing something. The only other dataset I might be able to use is the Lahman one, but present as RData files in the same location (used in the .SD vignette).
@zeomal you are absolutely correct.
So... my question is, what should I use as a data source for the vignette that will be feasible? @jangorecki
No worries, I am not replying because I don't have any valuable answer to questions asked, at the moment.
Choosing datasets for presenting join cases is very complex task. I don't want to make recommendation without first properly investigating the problem. There are many factors to consider and it is not easy to construct dummy datasets that will cover them well, so choosing real datasets is even more tricky.
Joining should present (of course) matches, some matches, zero matches, duplicated matches (many-to-many join), rolling matches, mult arg, overlapping, non-equi, multi col, matches on NA....
Complete list of features to cover could be really long.
Personally I think that using multiple different datasets (each 10-20 rows to be visually easier to grasp) in this single vignette could be more appropriate but do not take that as a suggestion to write this vignette now like this. I would like to hear from @mattdowle and @arunsrinivasan as they are authors of (AFAIR) all join scenarios in data.table.
Most helpful comment
Noting for reference: joins vignette would be a good place to have an example of replacing nested
ifelsewith a join.