I noticed through GitHub's file view that the Linguist report on files generates an incorrect line count. With the exception of files containing incomplete lines (no newline marker at the last line) the displayed number is always +1.
This can be easily verified locally with any file or directly at GitHub. Linguist license, e.g., has 22 lines on the content view yet is reported as having 23 lines at the top.
Please confirm you have...
This problem happens with any file that has POSIX lines, which per definition are _"A sequence of zero or more non-\BlobHelper:
I imagine that -1 was given as a second argument to split to prevent truncation of empty lines at the end of the file. However this is also what keeps the last null field after the last newline character stored in the lines array — then silently considered as a full line in loc:
The solution seems to be simple enough if we added a chomp to remove the last newline (without side effects on incomplete files or files with empty lines at their end):
data.chomp.split(encoded_newlines_re, -1)
I'm not sure if this will play well with the different encodings though, so I decided to avoid a direct PR before hearing specialists. If not, a solution using regex (similar to encoded_newlines_re) and the \z anchor could be used, maybe? Are there other important facts which I'm missing here?
@lildude Does the line count actually comes from Linguist?
I was wondering that as well. Nevertheless, it wouldn't remove the fact that Linguist itself is reporting the incorrect count.
@lildude Does the line count actually comes from Linguist?
Sorry about the delay. Yes, the line count comes from Linguist.
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
I'm not sure if this will play well with the different encodings though, so I decided to avoid a direct PR before hearing specialists. If not, a solution using regex (similar to
encoded_newlines_re) and the\zanchor could be used, maybe? Are there other important facts which I'm missing here?
@tessarin I think you should go ahead and open a pull request with the first proposed solution.
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
Reopening as Linguist isn't responsible for this on the GitHub.com side of things.
Reopening as Linguist isn't responsible for this on the GitHub.com side of things.
I've finally sorted out the GitHub side of things and you can see it in all it's glory now - the line count at the top now matches the content 🎉
Thank you very much @lildude!
Most helpful comment
I've finally sorted out the GitHub side of things and you can see it in all it's glory now - the line count at the top now matches the content 🎉