This is happening with my repository, github.com/88Alex/lithium-os, as well.
Happening in some of my repos too, aliceos-kernel and nasu2 specifically.
Ditto for few headers in mCtrl.
As the detection is based on a Bayesian analyzes of the file contents, there always will be (hopefully small) portion of bad detections, especially for languages sharing filename extension (.h) and similar contents (e.g. in case of C being more or less a subset of C++).
Perhaps the detection would improve if the Bayssian classifier would also assign some weight to a rules based on a broader context in the project. For example:
foo.c, then more likely foo.h is C rather then C++.bar.cc, then more likely bar.h is C++ rather then C.*.c files in a project and none *.C, *.cc, *.cpp, *.cxx, then all *.h are more likely plain C, not C++.+1.
On 8 August 2013 08:36, Martin Mitáš [email protected] wrote:
As the detection is based on a Bayesian analyzes of the file contents,
there always will be (hopefully small) portion of bad detections,
especially for languages sharing filename extension (.h) and similar
contents (e.g. in case of C being more or less a subset of C++).Perhaps the detection would improve if the Bayssian classifier would also
assign some weight to a rules based on a broader context in the project.
For example:
- If the project contains file foo.c, then more likely foo.h is C
rather then C++.- If the project contains file bar.cc, then more likely bar.h is C++
rather then C.- When there are many *.c files in a project and none *.C, *.cc, *.cpp,
*.cxx, then all *.h are more likely plain C, not C++.—
Reply to this email directly or view it on GitHubhttps://github.com/github/linguist/issues/554#issuecomment-22320310
.
Regards,
Alexander Kitaev
This is happening in with my repository too https://github.com/gagarin79/charon. Only C++ code (*.hh and *.cc files), But statistics showing C 85.8% and C++ 14.2%
Now I'm getting headers detected as Objective-C! D:
This issue is very annoying, despite the fact my project is entirely C It says 15% C++ and 5% Objective C.
I was having a similar issue with my pure C project. Linguist was marking some headers as C++ instead of C. It appeared to be happening to headers without any function declarations or C-style code. Here's a code snippet of a header detected as C++
#ifndef WORLDBOX_CLIENT_H
#define WORLDBOX_CLIENT_H
#include <worldbox/worldbox.h>
#include <worldbox/core.h>
#include <worldbox/graphics.h>
#endif /* WORLDBOX_CLIENT_H */
I'm unfamiliar with how Linguist works or whether this is already implemented, but an idea I've got that could resolve these types of inaccuracies would be to, in addition to the existing hack that checks for similarly named C files, scan the file contents for namespace declarations, templates, classes, etc. with the assumption that the absence thereof meaning the header is C.
And to preemptively address issues where C++ headers are being marked as C, perhaps headers included from C++ files can be assumed to be C++ headers.
This gets to a whole other issue with include paths and such, but I think it would be reasonable to assume headers are in the root directory of the repository, or in a folder named include or something similar.
Yep, thanks. We're currently working on, and testing, various heuristics that go beyond the ordinary statistical classifier, specifically to address C++ and C issues.
Happened to me as well:
https://github.com/iskernel/c-mantras/search?l=c%2B%2B
Thanks for the report. This is a common issue. We're tracking progress for a fix in #1626.
Most helpful comment
As the detection is based on a Bayesian analyzes of the file contents, there always will be (hopefully small) portion of bad detections, especially for languages sharing filename extension (.h) and similar contents (e.g. in case of C being more or less a subset of C++).
Perhaps the detection would improve if the Bayssian classifier would also assign some weight to a rules based on a broader context in the project. For example:
foo.c, then more likelyfoo.his C rather then C++.bar.cc, then more likelybar.his C++ rather then C.*.cfiles in a project and none*.C,*.cc,*.cpp,*.cxx, then all*.hare more likely plain C, not C++.