Data.table: Add a API definining header hiding the function pointer as commonly done

Created on 26 Jul 2020  路  4Comments  路  Source: Rdatatable/data.table

Congratulations on getting 1.3.0 out. It is impressive to watch your well-oiled release process which I should do more often :)

It is lovely that the beginnings of a C API are now out. Big thanks to @lsilvest for getting it going and all of you for making it happen with #3751 and building on the older #1694.

This is immediatly useable in something like the following (and I apologise for bring C++ and Rcpp in but it makes one-liners, here broken for readability, easier to test)

R> Rcpp::cppFunction(' SEXP mysub(SEXP x, SEXP rows, SEXP cols) { 
  SEXP(*fun)(SEXP, SEXP, SEXP) = (SEXP(*)(SEXP,SEXP,SEXP)) R_GetCCallable("data.table", "CsubsetDT"); 
  return fun(x,rows,cols);
}')
R>

As your documentation states, if one is familiar with what WRE says or willing to read, this is easy to use as we proxy here from R giving us the desired subsetter from C(++).

R> library(data.table)
R> dt <- data.table(iris)
R> mysub(dt, 1:4, 1:4)
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1:          5.1         3.5          1.4         0.2
2:          4.9         3.0          1.4         0.2
3:          4.7         3.2          1.3         0.2
4:          4.6         3.1          1.5         0.2
R> 

Now, it is not uncommon to export APIs such as this---and fully understood it is subject to change and will change---by an API defining header. I have done that twice in packages exporting C API portions from R itself (RApiDatetime, RApiSerialize) and added it to other packages (xts. In each case a header was added which client packages include and which makes the _C function pointer assignment and SEXP-counting_ internal to a simpler helper function. With it (and I have a version here which I could PR) the C(++) snippet becomes

R> Rcpp::cppFunction("SEXP mysub2(SEXP x, SEXP rows, SEXP cols) { return subsetDT(x,rows,cols); }", 
     include="#include <datatableAPI.h>", 
     depends="data.table")
R> mysub2(dt, 1:4, 1:4)
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1:          5.1         3.5          1.4         0.2
2:          4.9         3.0          1.4         0.2
3:          4.7         3.2          1.3         0.2
4:          4.6         3.1          1.5         0.2
R> 

I.e. we now only include a single header which can be added to other packages via LinkingTo:. And all (programmer) users need than is to call the function declared in the header which I named here subsetDT().

The code in the header is essentially a three-liner and easy to maintain and update if and when the API changes -- and can shield the users from API updates as it picks up your API name and re-exports it. It simply has to be updated in concordance with the change in src/init.c. (It is entirely up to you to declare this as subsetDT() as internally, or as CsubsetDT as exported called and imported. You even make it CsubsetDTapi() to be extra explicit. I like subsetDT() myself :).

Let me know what you think. If it is deemed unimportant feel free to close, if you want to send a PR I can do so as I have the file in a branch here.

Most helpful comment

Ok, I can and will set-up a simple PR then.

One thing I just realized as you wrote that is that we can do what the R API does and make that a #define. By default you get that (butt-ugly but safe) DT_ prefix, but if you set the right #define up it becomes the shorter name that echose the name in your C sources, here, subsetDT. Good idea?

(Also, DT_subsetDT() is a monster. Just saying. I really wish C had namespaces. Maybe I just add a properfly if-def'ed for C++ namespace so that those of us including into C++ would get DT::subset() which I like best here.)

All 4 comments

I agree, the plan was to have a dedicated file for exported C functions. Your idea is perfect. As for the naming I would prefer DT_ prefix instead, so in other packages code it is easy to spot calls to data.table C functions.

Ok, I can and will set-up a simple PR then.

One thing I just realized as you wrote that is that we can do what the R API does and make that a #define. By default you get that (butt-ugly but safe) DT_ prefix, but if you set the right #define up it becomes the shorter name that echose the name in your C sources, here, subsetDT. Good idea?

(Also, DT_subsetDT() is a monster. Just saying. I really wish C had namespaces. Maybe I just add a properfly if-def'ed for C++ namespace so that those of us including into C++ would get DT::subset() which I like best here.)

Great. Implemented and merged by your #4645. I just noticed the PR was missing the magic "closes 4643" (now edited in) so it didn't close automatically.

If you'd like to add a news item, please feel free in a new PR. I usually add a news item where I can, but in this case, you and others here are better equipped than me to write this one. Or it could be left out of NEWS too as a strategy to more easily make breaking changes to it in future, if you prefer.

I just noticed the PR was missing the magic "closes 4643"

True, and my bad. But when I wrote it I meant it more as an opening in a dialogue of "what interface do we want?" (i.e. wrt to naming the functions, with or without namespace) rather than a "final" opus magnus.

Anyway, thanks for merging -- we can refine as needed.

Was this page helpful?
0 / 5 - 0 ratings