Linguist: language stats don't match file counts (C should be C++)

Created on 17 Sep 2017  路  4Comments  路  Source: github/linguist

This project https://github.com/elucideye/drishti is incorrectly classified as C in the github search. I noticed the language stats bar incorrectly reports a high C=87% to C++=10% ratio despite the more accurate per language file counts of C++=408 and C=5. What is the recommended fix in cases where the language counts are actually correct but not reflected in the statistics?

drishti_language_bar

drishti_language_summary

Most helpful comment

To finish the thread... I left that generated font table as is (C) and added the linguist-generated tag in .gitattributes per your suggestion, and added -*-c++-*- to various files to force C++ labels in cases where it was appropriate. The .gitattributes file looks like this (per docs):

src/lib/drishti/hci/gpu/VeraFont_16_2048.h linguist-generated=true

The language stats updated immediately and are much closer to what I would expect:

drishti_linguist_new

Thanks again.

All 4 comments

The language stats are correct. Linguist considers the "amount of code" when determining the statistics, not just the number of files.

In your case, you've got 13 MB of C files (well, it's mostly just src/lib/drishti/hci/gpu/VeraFont_16_2048.h) and 1.51MB of C++ files, which works out at approximate 87.6% to 10.1% of the entire repository size.

Removing this file, marking it as vendored or generated using a manual override will exclude it from the stats and give you more realistic looking stats.

I didn't see that file included when I looked at the 5 reported C files from the language search above.

The C file index here:

https://github.com/elucideye/drishti/search?l=c

Only lists the following 5 files:

C src/app/qt/facefilter/Device.h
C src/app/qt/facefilter/QVideoFrameScopeMap.h
C src/lib/drishti/core/padding.h
C src/lib/drishti/graphics/drishti_graphics.h
C src/app/qt/facefilter/GLVersion.h

I'm guessing that file index uses a different process. I'm currently adding emacs mode hints -*-c++-*- to all C++ files based on an older issue I saw, and will try the .gitattribute overrides if that doesn't work. Thanks for the help.

Your large file won't be included in the search results because it is too large: only files smaller than 384kb are indexed hence you don't see it.

https://help.github.com/articles/searching-code/ has more details on the the various search criteria GitHub uses.

To finish the thread... I left that generated font table as is (C) and added the linguist-generated tag in .gitattributes per your suggestion, and added -*-c++-*- to various files to force C++ labels in cases where it was appropriate. The .gitattributes file looks like this (per docs):

src/lib/drishti/hci/gpu/VeraFont_16_2048.h linguist-generated=true

The language stats updated immediately and are much closer to what I would expect:

drishti_linguist_new

Thanks again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

haskellcamargo picture haskellcamargo  路  3Comments

arfon picture arfon  路  6Comments

oldmud0 picture oldmud0  路  6Comments

Haroenv picture Haroenv  路  4Comments

d4nyll picture d4nyll  路  3Comments