The OMR codebase contains several components each of which has its own coding style. The inconsistencies between (and within) components, coupled with undocumented styles has the potential to consume a lot of human time during reviews to get PRs into shape for merging. To improve code consistency and avoid stylistic noise between reviewers/contributors during semantic PR reviews we propose to add automated tooling to the project to enforce a predefined code format style which will be checked on each contribution.
In summary automatically formatted code is:
git commitTo be discussed during the OMR Architecture (#4560) call. Here are some topics to get us started:
To be discussed during the OMR Architecture (#4560) call and updated subsequently.
Note: You should be able to use git-hyper-blame from the Chromium repo, which to quote the linked man page:
git hyper-blame is like git blame but it can ignore or "look through" a given set of commits, to find the real culprit.
This is useful if you have a commit that makes sweeping changes that are unlikely to be what you are looking for in a blame, such as mass reformatting or renaming. By adding these commits to the hyper-blame ignore list, git hyper-blame will look past these commits to find the previous commit that touched a given line.
(TIL: This has been implemented in upstream git: https://github.com/git/git/commit/ae3f36dea16e51041c56ba9ed6b38380c8421816)
I have summarized the the concerns raised during our OMR Architecture call below.
// clang-format off and // clang-format on to disable formatting on hand formatted areas we really care about](https://youtu.be/21yPv8GsvY4?t=2630)Before forging ahead with an official vote and proposal I think it is important to settle on what exactly the format we will be proposing. To aid in the search for a reasonable format which we can get accustomed to I have ran clang-format with all major built-in formats on the OMR codebase and created branches so we can look at the output.
Some styles are preferred to others. Please state your preferences and examine the branches below and hyperlink any areas in which the code formatting makes code hard to read in a particular format, or propose alterations to existing styles which we can prototype. I hope that after some iterations of configuration we can land at a style which we can put to a vote. I will tentatively propose a 1 week cut-off for discussions and circle back at that point to evaluate whether we are making forward progress and steps to take from there.
> clang-format-9 -version
clang-format version 9.0.0-2 (tags/RELEASE_900/final)
Given that Mozilla has adopted Google style, I would expect that style to start to rot slowly over time.
I would recommend avoiding it.
Regardless of the style chosen, in the compiler the forward declarations for classes in the TR namespace consume a lot of real estate when reformatted. Prior to reformatting, I'll have someone sweep through the code and turn them into:
namespace TR {
class A;
class B;
class C;
}
It will be less tedious to do prior to reformatting. It should be a safe and easy thing to do.
what column width were these run at - there is a heck of a lot of ugly line wrapping going on in at lest the LLVM sample...
The webkit format is nearly unreadable because of how the Boolean conditions are wrapped https://github.com/fjeremic/omr/blob/51db3486144d561fb44a94f5fb899d5c540d3c29/compiler/optimizer/VirtualGuardHeadMerger.cpp#L539
Looks like nearly everything is on one line and you have to scroll forever...
Do any of the styles allow the Boolean operators to be at the start of a line rather than the end? It is a style that was adopted in the optimizer a while ago and it makes reading a lot easier since you can more quickly see how things combine... None of these styles seem to have that?
Regardless of the style chosen, in the compiler the forward declarations for classes in the
TRnamespace consume a lot of real estate when reformatted.
Sure, this can easily be done on any style we end up proposing.
what column width were these run at
All the styles were ran with default column width values. I believe they all use column width 80 or unlimited. Style 6. based on WebKit from our previous try at clang-format which most closely matches some of our current patterns uses a column width of 120 which matches that of the GitHub review interface. You can find the same line in that style here:
which has boolean operators broken at the start of a line if they overflow the column limit.
I'm going to reference #350 as well since there is some useful information there from the previous attempt, most notably some experimentation and a mailing list thread with a ton of discussion about formatting that we had before [1].
There was also more discussions on some of the choices in a Slack thread which can be found in [2].
[1] https://www.eclipse.org/lists/omr-dev/msg00023.html
[2] https://eclipse-omr.slack.com/archives/CCQ8B4B39/p1545061929006600
Hi,
Couple of comments:
a) You should not try to choose a new format. The approach should be to get clang to generate existing format as far as possible. This will minimise impact on everyone. I strongly recommend against bike shedding conversations on format choice.
b) The only real issue is the impact on existing pull requests - this needs to be addressed first.
Regards
a) You should not try to choose a new format. The approach should be to get clang to generate existing format as far as possible. This will minimise impact on everyone. I strongly recommend against bike shedding conversations on format choice.
Thanks for the feedback, I would tend to agree. Based on previous discussions in [1] and [2] that format would be the one we landed on in #3390 which was 6.
[1] https://www.eclipse.org/lists/omr-dev/msg00023.html
[2] https://eclipse-omr.slack.com/archives/CCQ8B4B39/p1545061929006600
b) The only real issue is the impact on existing pull requests - this needs to be addressed first.
This problem can be solved following the technique outlined by the MongoDB folks described in detail in [3] (under _Rescuing stranded changes_ section). We will adopt the same approach for all in-flight PRs.
[3] https://engineering.mongodb.com/post/succeeding-with-clangformat-part-3-persisting-the-change
Do any of the styles allow the Boolean operators to be at the start of a line rather than the end?
Webkit seems to do that https://github.com/fjeremic/omr/blob/clang-format-3390/compiler/control/OMROptions.cpp#L4277-L4280
The webkit format is nearly unreadable because of how the Boolean conditions are wrapped
Wonder how in the link above, it did wrap the lines.
EDIT: actually @andrewcraik , in webkit, VirtualGuardHeadMerger is formatted with boolean operators at the beginning of the line, and with line breaks: https://github.com/fjeremic/omr/blob/clang-format-3390/compiler/optimizer/VirtualGuardHeadMerger.cpp#L577-L579
Comments on nested elements are unexpectedly indented, see
https://github.com/fjeremic/omr/blob/clang-format-webkit/port/common/omrcuda.cpp#L1223-L1227
Looks ok in style 6: https://github.com/fjeremic/omr/blob/clang-format-3390/port/common/omrcuda.cpp#L1197-L1201
Perhaps that's just a bug in 5 (I have a hard time believing it was intentional)?
That raises another question, though: what do people think we should do when the chosen format changes (or is 'fixed')? Do we re-apply it to the entire repository? Or if the change is minor, allow the code to be corrected as it is modified?
EDIT: actually @andrewcraik , in webkit, VirtualGuardHeadMerger is formatted with boolean operators at the beginning of the line, and with line breaks:
I have no idea how I saw it as broken as it was when I viewed it originally.
It is interesting that the original format for that block was:
else if (
!tailSplitEnd->getOpCode().isReturn() &&
!tailSplitEnd->getOpCode().isJumpWithMultipleTargets() &&
tailSplitEnd->getOpCodeValue() != TR::athrow &&
!(tailSplitEnd->getNumChildren() >= 1 && tailSplitEnd->getFirstChild()->getOpCodeValue() == TR::athrow)
)
The leading operators are an improvement, but it is interesting that the separation onto new lines was thrown out. I'm guessing the style tries to compact code even if it is explicitly line broken by the dev?
Perhaps that's just a bug in 5 (I have a hard time believing it was intentional)?
It is a not a bug, just the way WebKit formats things:
> clang-format-9 -style=WebKit -dump-config | grep ColumnLimit
ColumnLimit: 0
According to the documentation for ColumnLimit when it is set to 0:
A column limit of 0 means that there is no column limit. In this case, clang-format will respect the input鈥檚 line breaking decisions within statements unless they contradict other rules.
So by default WebKit style will not re-flow comments, meaning if there is a tab within a comment (as is such in the case you pointed out) it will not change the tabs. The tab before the start of the comment is changed however, hence why we get the mix that you observe. This has been addressed in #3390 already as Irwin pointed out.
[1] https://clang.llvm.org/docs/ClangFormatStyleOptions.html
The leading operators are an improvement, but it is interesting that the separation onto new lines was thrown out. I'm guessing the style tries to compact code even if it is explicitly line broken by the dev?
That's correct. Developer line brakes are not followed. Line breaking rules will apply only when the line exceeds the set column limit (this is true for all styles).
That raises another question, though: what do people think we should do when the chosen format changes (or is 'fixed')? Do we re-apply it to the entire repository? Or if the change is minor, allow the code to be corrected as it is modified?
Great question. I don't know the answer but I would imagine since clang-format at that point will be part of the workflow it won't nearly be as intrusive since we'll be accustomed to tool formatted code already. I do however believe applying such changes piece-wise may be difficult as automation will check formatting on a file basis, rather than diff basis (this is not easy maintain from what I've read). So I would think if there are future changes to the style they will be applied in bulk at once.
It is a not a bug, just the way WebKit formats things:
According to the documentation for ColumnLimit when it is set to 0:A column limit of 0 means that there is no column limit. In this case, clang-format will respect the input鈥檚 line breaking decisions within statements unless they contradict other rules.
So by default WebKit style will not re-flow comments, meaning if there is a tab within a comment (as is such in the case you pointed out) it will not change the tabs. The tab before the start of the comment is changed however, hence why we get the mix that you observe. This has been addressed in #3390 already as Irwin pointed out.
If it had left the tabs in comments alone, I could understand, but it expanded the tabs differently (to 8 spaces) in the comment compared with outside the comment (4 spaces). Either tabs are 4 or 8 spaces, but not both.
That raises another question, though: what do people think we should do when the chosen format changes (or is 'fixed')? Do we re-apply it to the entire repository? Or if the change is minor, allow the code to be corrected as it is modified?
Great question. I don't know the answer ...
I wasn't asking anyone to predict the future; rather I was asking for opinions on what should be done in that situation.
I wasn't asking anyone to predict the future; rather I was asking for opinions on what should be done in that situation.
This is because the IndentWidth and ContinuationIndentWidth in vanilla WebKit is set to 4 but TabWidth is set to 8:
> clang-format-9 -style=WebKit -dump-config | grep -E "^(ContinuationIndentWidth|IndentWidth|TabWidth|UseTab)"
ContinuationIndentWidth: 4
IndentWidth: 4
TabWidth: 8
UseTab: Never
And UseTab is set to Never, but because ColumnWidth is set to 0 it does weird things to tabs within and around comments. It is unfortunate, but we cannot get away without fiddling with the base styles since our codebase was not formatted from the beginning.
What does the formatter do with tabs elsewhere, like within string literals?
IMO, such things should be expunged from the source, but we don't live in an ideal world.
I do think, however, that we should be using TabWidth: 4.
What does the formatter do with tabs elsewhere, like within string literals?
It does not touch string literals.
I do think, however, that we should be using TabWidth: 4.
We can make this amendment to #3390.
I would like to use c++11-style template angle brackets. In #3390 the belief was that xlc didn't support the new syntax, is that actually true?
I do think, however, that we should be using TabWidth: 4.
We can make this amendment to #3390.
Since style 6 (#3390) doesn't use tabs, I'm not sure the tab-width is going to affect anything.
IMHO what really matters is that, if we use tabs:
If we can do this, then tab-width will be inconsequential. Style 6 sidesteps the problem by banning tabs altogether.
The problem with tabs was the conversion of (leading) tabs in a multi-line comment to 8 spaces instead of 4 to be consistent with the indent increment of 4.
In the cuda comment example, the leading tabs were left unconverted.
Hi everyone,
I've built up the automation on my personal fork as an example of what can be done. The automation runs an extra job as part of the Travis CI build which clang-format checks that each commit within a PR is in conformance with the project defined style. The automation then uses the GitHub Status API to tag the HEAD commit in the PR with a separate status badge indicating pending/success/failure of the clang-format job.
You can find the sample automation here (note the two PR checks that happen as part of a single Travis build):
https://github.com/fjeremic/omr-clang-format-automation/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc
As part of the automation I've implemented MongoDBs algorithm for rebasing PRs which were opened before the formatting took place onto commits representing the same changes after the formatting. This is all done via automation so hopefully not much action for the end-user other than running the script. There are several PR examples on my fork which demonstrate the tooling in place. You can check them all out in the above link.
Lastly I will be putting together an official committer vote with a full proposal to implement automated code formatting early next week to keep the momentum going.
_Edit:_
Forgot to mention. In case the automation fails, i.e. a commit is not formatted correctly the automation provides a git patch file which the user can grab and apply to their commit to fix up the formatting without even having clang-format installed.
You can see and example of this in the following PR:
https://github.com/fjeremic/omr-clang-format-automation/pull/3
By following the "Details" link on the failed _continuous-integration/travis-ci/clang-format_ job and looking at the build output for patch file which is displayed. Applying the patch file correctly formats the commit in a way that the formatting check will pass.
One thing I have noticed with style 6 is that some of the optimizer and control files can get some pretty high levels of indent:
https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/compile/OSRData.cpp#L1314
https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/optimizer/Inliner.cpp#L895
https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/optimizer/VPHandlers.cpp#L541
https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/optimizer/VPHandlers.cpp#L2246
Use of nested loops and conditions happens in a lot of places. Could we consider an indent of 2 rather than 4 to help reduce line wrapping and reduce the overall 'width' of the code? It seems like it might make things a bit easier to grok... It will make life on narrow screens a lot easier too...
There are also a few weirdly indented hanging parens (see https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/optimizer/Inliner.cpp#L1157 as an example
This ternary also looks quite odd - almost like a format bug https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/optimizer/Inliner.cpp#L1762
It is also sad that the bracket fixups don't seem to work with end of line comments leading to some inconsistency in the braces like https://github.com/fjeremic/omr/blob/78db3731f245ede0829df93d0e0b9400dd716c2c/compiler/optimizer/Inliner.cpp#L3117. That is a limitation of the tool and something I can live with, it just seems suboptimal and desirable to fix...
Overall, the style is, in general, readable on the files I have checked. I will check some more tomorrow to see if there are any fun format issues lurking.
Use of nested loops and conditions happens in a lot of places. Could we consider an indent of 2 rather than 4 to help reduce line wrapping and reduce the overall 'width' of the code? It seems like it might make things a bit easier to grok... It will make life on narrow screens a lot easier too...
Here is a branch with indentation level of 2 spaces and the same lines of code noted:
https://github.com/fjeremic/omr/tree/clang-format-3390-indent2
https://github.com/fjeremic/omr/blob/clang-format-3390-indent2/compiler/compile/OSRData.cpp#L1318
https://github.com/fjeremic/omr/blob/clang-format-3390-indent2/compiler/optimizer/Inliner.cpp#L910
https://github.com/fjeremic/omr/blob/clang-format-3390-indent2/compiler/optimizer/VPHandlers.cpp#L532
https://github.com/fjeremic/omr/blob/clang-format-3390-indent2/compiler/optimizer/VPHandlers.cpp#L2201
There are also a few weirdly indented hanging parens (see fjeremic/omr:compiler/optimizer/Inliner.cpp@
78db373#L1157 as an example
This is an outlier due to an #ifdef inside of an if expression. The parenthesis can be removed here. As we encounter such code moving forward we can definitely improve it as one will be able to see what the formatter will do to the code as we're writing it.
This ternary also looks quite odd - almost like a format bug fjeremic/omr:compiler/optimizer/Inliner.cpp@
78db373#L1762
The formatter will try to respect developer line breaks as long as they don't break any of the formatting rules applied. In this case it doesn't break any rule. You can see the same line break exists in today's master branch so the formatted code is definitely not a regression:
https://github.com/eclipse/omr/blob/master/compiler/optimizer/Inliner.cpp#L1914-L1917
It is also sad that the bracket fixups don't seem to work with end of line comments leading to some inconsistency in the braces like fjeremic/omr:compiler/optimizer/Inliner.cpp@
78db373#L3117. That is a limitation of the tool and something I can live with, it just seems suboptimal and desirable to fix...
This is something which we will unfortunately not be able to easily fix. The tool does not play nice with comments following statements because it does not reflow code from before to after comments or vice versa. This is something we would fix as developers are working on the code in the area. Over time I would imagine these will not be an issue.
Just for the record I vote against such reformatting and if I have right of veto I would use it to block, at least for GC part of the code.
There number of problems are on the surface:
Also I would not underestimate subjective personal factor. The 90% of developer's work is a code reading. Even we move out of brackets discussion what is "correct" or "wrong" look of the code (coding standards document prescribes basics, is not it?) look of reformatted code would be "unusual" for code owner and other developers use to work with it. It might reduce effectiveness of development significantly.
I believe such reformatting initiative is risky and there is no direct demand to do it here and now.
This initiative could potentially compromise development and serviceability.
Code after formatting can not be reviewed properly due the large volume
Probably the best way to deal with this is to sample the codebase, and especially files that you think/know are formatted a certain way, to see how it is affected. If there's a strong reason to keep the formatting, there are comments that can be put in the code to indicate sections of code that the formatting tool shouldn't touch.
Formatting would invalidate all outstanding branches/forks because of changes in white spaces. Every single outstanding change line would have a conflict required to be resolved. It would be huge impact for people doing research work (for example most of GC Team members)
This, I believe, is pretty easy to deal with; devs would just have to run clang format on their branches and then rebase.
We would not have ability compare code "before" and "after" used a lot for debugging
https://github.com/eclipse/omr/issues/4577#issuecomment-556507639 should address that issue.
Also I would not underestimate subjective personal factor. The 90% of developer's work is a code reading. Even we move out of brackets discussion what is "correct" or "wrong" look of the code (coding standards document prescribes basics, is not it?) look of reformatted code would be "unusual" for code owner and other developers use to work with it.
This is definitely true, and is often downplayed. However, at the same time, as devs, we're pretty good at adapting to things. Plus, just as the intangible human subjective/personal preferences play a role, I believe the intangible psychological benefit of one unified project must also play a role. It may seem silly, but a unified codebase right from the coding standard up to the quality bar has to, at some level, strengthen the project (be it the code or the community); we're all experts in different components, but we contribute the same quality code to the same project with the same expectation as anyone else.
but a unified codebase right from the coding standard up to the quality bar has to, at some level, strengthen the project
Let's be clear though - this discussion is NOT about the coding standard. It's about whitespace placement which is the least interesting thing to focus on.
Let's be clear though - this discussion is NOT about the coding standard.
I was under the impression that coding standard was interchangeable with coding convention; seeing as it's not, I'll change what I said above to:
It may seem silly, but a unified codebase right from the coding convention up to the quality bar has to, at some level, strengthen the project (be it the code or the community); we're all experts in different components, but we contribute the same quality code to the same project with the same expectation as anyone else.
clang-format handles whitespace only and doesn't address the bigger picture about how code is written - things like the rules in the omr coding standard:
Never combine an assignment and a test, except in a while condition. In a while loop, always use an explicit test.
When possible, try to avoid negative conditions. Positive conditions are easier to understand.
Macros should be "contained" and "context-free".
Formatting doesn't address any of them or many other conventions in the OMR coding standard that help to write readable and debuggable code.
I think it's important to be clear that this issue is addressing formatting only as there are a lot of mental connections to "coding standard" that won't be addressed by this issue
I think it's important to be clear that this issue is addressing formatting only as there are a lot of mental connections to "coding standard" that won't be addressed by this issue
Yeah that makes a lot of sense. The title of this PR does say "format", but I suppose it is still worth being very explicit about exactly what that means.
I think it's important to be clear that this issue is addressing formatting only as there are a lot of mental connections to "coding standard" that won't be addressed by this issue
Absolutely agree. Formatter using "code as tape" concept and manipulate with white spaces. However the only fractional part of coding standards addresses issues like "where I should put bracket". Most of coding standards applying rational to improve correctness of code but not it's readability only
As a major contributor to GC code I officially asking do not apply automatic formatting to OMR/GC code @rwy0717
How automated formatting tool supposed to be used from Eclipse? Do you have recommendations?
How automated formatting tool supposed to be used from Eclipse? Do you have recommendations?
Did some digging around [1] [2]. Given that one of the goals above is
Provide optional git hooks for users to be able to automatically format their code on git commit
After installing cygwin, clang, and the git hook that will be provided (added to <root>/.git/hooks/pre-commit), JGit/EGit should just work.
[1] https://zauner.nllk.net/post/0001-git-hooks-currently-supported-by-jgit/
[2] https://www.eclipse.org/forums/index.php/t/1067404/
How automated formatting tool supposed to be used from Eclipse? Do you have recommendations?
Outside of configuring a git pre-commit hook the easier route for Eclipse is just to install the CppStyle plugin which runs clang-format on the repository using the repository defined style:
https://marketplace.eclipse.org/content/cppstyle
They also have a website and GitHub repo with tons of info:
http://www.cppstyle.com/
Does pre-commit script actually change the content in 'master' files or create a temporary variant that is to be committed?
Does it give feedback what kind of formatting was applied so that a developer learn them over time?
Does per-commit script actually change the content in 'master' files or create a temporary variant that is to be committed?
There are different ways to implement this. [1] and [2] are two such examples. There are others of course. These are very simple hooks to implement. [1] formats all the files which were touched when you execute git commit and it then re-adds the files changed into the commit. With [1] you don't see any of the changes. [2] just formats all the files changed and leaves the changes as unstaged. The user than manually amends the commit with the changes.
Does it give feedback what kind of formatting was applied so that a developer learn them over time?
It depends on what kind of hook you install, or if you even use a hook. If you want to see the changes you can just install a hook which will leave the formatting fixes unstaged so you can see the diff.
Personally however I don't use a git hook and rather just configure my editor to clang-format on save. So I write code trying to match the style of the repository and as soon as I save my editor runs clang-format and I'm done with it. This is typical of most editor plugins such as the Eclipse one linked previously, VS Code [3], vim [4], etc.
Most editors also support selecting (highlighting code) and running clang-format on it if you would like to see what the formatter does to a piece of code.
[1] https://github.com/andrewseidl/githook-clang-format
[2] https://gist.github.com/alexeagle/c8ed91b14a407342d9a8e112b5ac7dab
[3] https://marketplace.visualstudio.com/items?itemName=xaver.clang-format&ssr=false#overview
[4] https://github.com/rhysd/vim-clang-format
Tried cppstyle. Easy to install in Eclipse (good). This one re-formats on explicit ctrl-shift-f, not on a save (ah, well). But it did not quite format it all that way I expected. Removal of extra spaces and lines kind of worked, but repositioning of curly braces did not. Wonder if there is more to it, that i'm missing (EDIT: probably clang-format itself)...
Overall, it comes down to how much investment (effort in inital rebase, effort per code line/block/commit, and updating/configuring the tool) vs how much return (just space formatting).
I don't have strong opinion (as usual, for process related things), but for a typical GC developer (where code is already reasonable well formatted and standardized) it feels like it's gonna be a bit of annoyance for little benefit.
As a major contributor to GC code I officially asking do not apply automatic formatting to OMR/GC code
+1. Probably OMR/GC + OMR/port.
+1. Probably OMR/GC + OMR/port.
Any particular reasons for why you think we should not proceed? If you can help us understand some of your concerns perhaps there is a technical solution to the problems which may have already been discussed. We can learn a lot from the many projects which have gone down this road before us.
Overall, it comes down to how much investment (effort in inital rebase, effort per code line/block/commit, and updating/configuring the tool) vs how much return (just space formatting).
The return on this investment can be summarized with the four bullet points in the original post:
In summary automatically formatted code is:
- Easier to write
- You don't have to worry about formatting as a tool will take care of it for you
- Easier to read
- When all code is consistent you know exactly what to expect and you don't have to switch mentally of thinking a different way based on which part of the codebase you are in
- Easier to maintain
- Your commits will not contain changes unrelated to your diff because your editor chose to format a bunch of whitespace in surrounding areas based on your editor configuration
- Uncontroversial
- We won't have to propose stylistic changes in PRs anymore thus saving reviewer and contributor resources and getting code merged faster
In my opinion the biggest one is the reviewer/contributor time savings. Countless hours of churn are spent nitpicking spacing and formatting issues during reviews which consumes both reviewer and contributor time. Every time formatting issues have to be fixed testing needs to be re-spun and there is a time delay between reviewer asking the contributor to actually make the changes. Not to mention the reviewer missing some formatting nitpicks and asking for fixes multiple times.
This is especially a problem with new contributors to the project as we don't currently have a defined style for the various components in OMR, so a contributor has to guess by looking at nearby code what it should be, or even worse just write code in their own personal style which will typically block the merge of a PR due to reviewers flagging that the code formatting is inconsistent with the code surrounding it.
Automation eliminates this problem and removes the noise from the PR reviews which lets the reviewers/contributors focus on the technical contribution, rather then the stylistic formatting thereof. It arguably lowers the bar to entry for new contributors to the project by making use of an industry-standard tool (used by LLVM, Google, WebKit, Mozilla, et.).
Other large benefit is the consistency. No matter which part of the codebase you look at, tool formatted code will look consistent, so you know exactly what to expect, and exactly how certain patterns of code look like, because each instance will be identical. There are no surprises and no mental "switch" needs to be made when writing/reading different parts of the repository.
Any particular reasons for why you think we should not proceed? If you can help us understand ...
As long as the code is correct and readable, the rest is not that important to me. I am not interested in how to format/put white spaces.
As long as the code is correct and readable, the rest is not that important to me. I am not interested in how to format/put white spaces.
@hangshao0 I ask this question with all due respect, as I'm just trying to understand your point of view: If the code after formatting will also be readable and you have no interest in how to format/put white spaces, if the proposed formatting will change only where the white spaces are placed (the end result being the four points listed by @fjeremic above), I don't see why you're against it. Again, I'm not trying to be disrespectful, I'm just trying to get a better understanding.
if the proposed formatting will change only where the white spaces are placed (the end result being the four points ...
I am good with that. I understand there are people who care about the white spaces. I am just wondering if it is worth the effort (certainly "yes" for some people).
I am just wondering if it is worth the effort
That's a reasonable concern. Thanks for clarifying @hangshao0 :)
This is really well put together! I've been hoping to do something similar on my team's codebases. Glad to see such a detailed and informative discussion about the pros and cons - I'll be sharing it at work.
Overall, it comes down to how much investment (effort in inital rebase, effort per code line/block/commit, and updating/configuring the tool) vs how much return (just space formatting).
I don't have strong opinion (as usual, for process related things), but for a typical GC developer (where code is already reasonable well formatted and standardized) it feels like it's gonna be a bit of annoyance for little benefit.
I agree with Aleks and "formatting the GC code" is less favorable for me, for most existing GC code. there are no outstanding formatting issue, I'd rather keep the old code no change until I fully understand the code, also without formatting will be easy for us to trace back history for debugging.
There is some strong feedback in this issue but much of it seems to be from the point of view of a single component or a set of components that currently use the same code format (references like "formatting the GC code" or "OMR/GC + OMR/port").
I'm wondering whether anyone has any feedback to offer on how easy it is to work in OMR on both the compiler and some other component, requiring a developer to directly accommodate the two very different coding formats? Please try to keep any feedback to the trouble working with two formats rather than on perceived good/bad aspects of each format.
I'm wondering whether anyone has any feedback to offer on how easy it is to work in OMR on both the compiler and some other component, requiring a developer to directly accommodate the two very different coding formats? Please try to keep any feedback to the trouble working with two formats rather than on perceived good/bad aspects of each format.
Most recent example in OMR can be found in #4503. Note that each of them happens after a force push, so some were addressed, new ones introduced, etc. etc. I'll list the first 10 formatting related reviews, but there are more:
I get bitten by this often. Although not in OMR here is an example from my latest commit to OpenJ9:
This pretty much happens every time as I can never seem to get the formatting perfect.
Hi everyone,
The Eclipse OMR committers had a vote in #4631 to adopt style 6. (from the original post) on the entire OMR codebase. The vote did _not_ pass, however I'm still hopeful we can find a proposition which will. I am suggesting we leave this issue open for discussion for a while longer so we can discuss possible options as a community if we still see value in pursuing this further.
Thanks to everyone who contributed thus far. Looking forward to hearing from more of you!
Most helpful comment
Probably the best way to deal with this is to sample the codebase, and especially files that you think/know are formatted a certain way, to see how it is affected. If there's a strong reason to keep the formatting, there are comments that can be put in the code to indicate sections of code that the formatting tool shouldn't touch.
This, I believe, is pretty easy to deal with; devs would just have to run clang format on their branches and then rebase.
https://github.com/eclipse/omr/issues/4577#issuecomment-556507639 should address that issue.
This is definitely true, and is often downplayed. However, at the same time, as devs, we're pretty good at adapting to things. Plus, just as the intangible human subjective/personal preferences play a role, I believe the intangible psychological benefit of one unified project must also play a role. It may seem silly, but a unified codebase right from the coding standard up to the quality bar has to, at some level, strengthen the project (be it the code or the community); we're all experts in different components, but we contribute the same quality code to the same project with the same expectation as anyone else.