Is your feature request related to a problem? Please describe.
Formatting of the C++ code base in libcudf is inconsistent, making it difficult to quickly and easily read source files.
Describe the solution you'd like
We should use clang-format to automatically and consistently format all of the C++ source files. It should be made a part of our PR merging process to automatically format the new code to fit the agreed upon style.
Describe alternatives you've considered
Asking developers to manually enforce a consistent code format is untenable.
Additional context
We will need to decide on a style to use. Note that format styling is a highly personal issue bordering on religious. We must acknowledge upfront that there is no one style that is going to make everyone happy, and likely whatever style we do choose will make some individuals unhappy. This is just a fact of life. We need to pick one and move on.
https://clang.llvm.org/docs/ClangFormatStyleOptions.html
I suggest we start with one of the available pre-configured options. These are:
We can either use one of these styles as-is, or modify one to our liking.
FWIW, in my experience this is a risky mechanism. Is it better to have perfectly matching code styles, or to have a consistent and informative git change history? Automatic reformatting tends to destroy your git history: either on the whole repository at once, orphaning and increasing the difficulty of every existing bug, or on one file at a time (better, but still painful).
I would expect it to destroy history once at time of the initial rewrite.
After that I suspect that we could put this in as a git pre-commit hook so
that formatting changes happen as commits are authored. I don't think it
would affect history negatively after that.
On Wed, Jan 2, 2019 at 1:47 PM H. Thomson Comer notifications@github.com
wrote:
FWIW, in my experience this is a risky mechanism. Is it better to have
perfectly matching code styles, or to have a consistent and informative git
change history? Automatic reformatting tends to destroy your git history:
either on the whole repository at once, orphaning and increasing the
difficulty of every existing bug, or on one file at a time (better, but
still painful).—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rapidsai/cudf/issues/540#issuecomment-450995317, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AASszDOoxuU2lSnRBVnjogbjjMuCik53ks5u_SjXgaJpZM4ZWuBC
.
I recommend that we try this out on a smaller RAPIDS C++ repository first; namely, RMM. If it goes well there, we can implement it here as well.
See https://engineering.mongodb.com/post/succeeding-with-clangformat-part-1-pitfalls-and-planning for a very thorough article series on adding clang-format to a large, pre-existing code base.
cuML is now taking steps to do what I proposed: https://github.com/rapidsai/cuml/pull/591
Now that cuML has merged their clang-format solution, we should be able to trivially copy what they've done into libcudf: https://github.com/rapidsai/cuml/pull/616#event-2389426036
Let's aim for 0.9.
@jrhemstad @harrism Just for the record, I am completely, 100% in support of adding clang-format to cuDF. Considering that we are in the midst of porting all of the algorithms, this is probably a great time (as reformatting does indeed make some noise in git history) to do it.
I have tested out cuML's .clang-format file and it is pretty good (it is based on Google's style guide). I would recommend a couple changes:
And although I personally don't prefer it (but only b/c i don't use std::enable_if_t in my projects), if you have a lot of std::enable_if_t template constraints, this predirective is good to turn on:
AlwaysBreakAfterDefinitionReturnType: All
The first two (1 & 2) changes I feel really strongly about. It change the following code:
int tid = threadIdx.x;
int blkid = blockIdx.x;
int blksz = blockDim.x;
int gridsz = gridDim.x;
to
int tid = threadIdx.x;
int blkid = blockIdx.x;
int blksz = blockDim.x;
int gridsz = gridDim.x;
The third change (3) enables "tabular" case statements. If we don't turn it on, we would destroy the following:
case GDF_INT8: return f.template operator()<int8_t>(std::forward<Ts>(args)...);
case GDF_INT16: return f.template operator()<int16_t>(std::forward<Ts>(args)...);
case GDF_INT32: return f.template operator()<int32_t>(std::forward<Ts>(args)...);
case GDF_INT64: return f.template operator()<int64_t>(std::forward<Ts>(args)...);
case GDF_FLOAT32: return f.template operator()<float>(std::forward<Ts>(args)...);
case GDF_FLOAT64: return f.template operator()<double>(std::forward<Ts>(args)...);
case GDF_BOOL8: return f.template operator()<bool8>(std::forward<Ts>(args)...);
case GDF_DATE32: return f.template operator()<date32>(std::forward<Ts>(args)...);
case GDF_DATE64: return f.template operator()<date64>(std::forward<Ts>(args)...);
case GDF_TIMESTAMP: return f.template operator()<timestamp>(std::forward<Ts>(args)...);
case GDF_CATEGORY: return f.template operator()<category>(std::forward<Ts>(args)...);
And on a side note, one of the very sad things about clang-format is that is doesn't support more alignment of "internal" line components, such as the return on each case line, but it is the price we pay for consistent formatting.
I am happy to send an email out with the tweaked .clang-format file and instructions on how to set it up locally for devs to try it out and weigh in on the predirectives. Or we can just decide among a few us and go from there.
Let me know.
PS - You can't automate it, but I would also like to have conformance on east const through out the codebase.
Although I will say, this:
template <typename TypeFrom>
typename std::enable_if_t<std::is_arithmetic<TypeFrom>::value ||
std::is_same<TypeFrom, cudf::bool8>::value ||
std::is_same<TypeFrom, cudf::category>::value ||
std::is_same<TypeFrom, cudf::nvstring_category>::value,
void>
going to this:
template <typename TypeFrom>
typename std::enable_if_t<std::is_arithmetic<TypeFrom>::value || std::is_same<TypeFrom, cudf::bool8>::value ||
std::is_same<TypeFrom, cudf::category>::value || std::is_same<TypeFrom, cudf::nvstring_category>::value,
void>
is rather upsetting.
And actually clang thinks __device__ is a declaration, and if you turn on align declarations, it messes up:
template <typename TypeFrom, typename TypeTo>
struct DeviceCast {
__device__ TypeTo
apply(TypeFrom data) {
return static_cast<TypeTo>(cudf::detail::unwrap(data));
}
};
@codereport it's good to have a volunteer to take this issue on. Assigning you. :) I don't think we can enforce east const conformance because I believe Cython doesn't like it...
Surprising that Clang thinks __device__ is a declaration since it has a CUDA mode.
I'm fine with all your other suggestions except line width of 100/120. I think this screws up code reviewing in Github.
@harrism @codereport once the clang-format configuration is squared away and the code is reformatted I'd like to hook this up as a pre-commit hook and into the CI checker to keep things nice moving forward.
Most helpful comment
@harrism @codereport once the clang-format configuration is squared away and the code is reformatted I'd like to hook this up as a pre-commit hook and into the CI checker to keep things nice moving forward.