Data.table: [Request] setkeyv can be accelerated if key already exists

Created on 6 Sep 2017  路  2Comments  路  Source: Rdatatable/data.table

setkeyv() can be accelerated significantly in cases where the key already exists.
This can be useful e.g. if you create a function that takes a data.table as an argument and you need to set a key but don't know if the user has already set the key on the input.
With the new implementation, you can just use setkey without worrying about speed penalties.

setkeyv() does two things:

  1. determine the correct sorting calling forderv
  2. Rearranging the data.table by reference calling Creorder

Currently, if the key already exists, the call to forderv is still executed and only step 2 is skipped.
The only reason is a sanity check that the data.table is really sorted by the key.
I believe, it is not necessary to perform this sanity check each time, especially since it has been around for quite a while so that potential bugs should have popped up.

Most helpful comment

@MarkusBonsch thanks for doing this. I had hacky work-arounds that did this myself, and I didn't get to actually submit a PR. And of course thanks to the data.table team for all they do.

All 2 comments

Great! Your PR merged. See comment there : https://github.com/Rdatatable/data.table/pull/2332#issuecomment-327575497

@MarkusBonsch thanks for doing this. I had hacky work-arounds that did this myself, and I didn't get to actually submit a PR. And of course thanks to the data.table team for all they do.

Was this page helpful?
0 / 5 - 0 ratings