This repository:
https://github.com/mattusifer/imdb-sentiment-classifier
The repo was very large before the last commit, containing over 100,000 text files, so I had assumed that linguist wasn't able to scan the entire repo to determine the language.
Since I removed those text files yesterday afternoon it still hasn't picked up the language from the now relatively small repo.
I think it has not been updated yet, because if I run it against Linguist locally, I get:
97.16% Java
2.84% Shell
I see, thanks for the input. How long does it normally take to update?
It is usually updated when you push to the repository. Maybe @arfon could help?
I recall this happening too. Perhaps there's a filesize limit capping it?
Yes, there's a file size limit.

E.g. https://github.com/larsbrinkhoff/test has two .f files.
But https://github.com/larsbrinkhoff/test/search?l=forth only finds one.
I don't remember what the limit is. I vaguely recall 100000 bytes somewhere. Linguist mentions 128K and 1M. I made foo.f 99999 bytes, and bar.f more than 1 meg.
The language is still detected in the in the https://github.com/larsbrinkhoff/test repository despite the file size being capped. Would this type of issue really prevent my repo from being fully scanned by linguist?
Besides, I don't think we have any huge files in that repo.
@mattusifer Right, but supposedly it wouldn't be detected if the smaller foo.f file weren't there.
I'm not sure if this is the exact same failure mode as for your repository. Just adding another data point.
@arfon Could you trigger an update of statistics for this repository: https://github.com/mattusifer/imdb-sentiment-classifier?
@arfon Could you trigger an update of statistics for this repository: https://github.com/mattusifer/imdb-sentiment-classifier?
This is actually a caching bug on GitHub. I've pinged the appropriate team internally for a fix.
Thanks @arfon !
This repository looks good - all set to close this out on my end. Thanks for everyone's help!
The same issue.
https://github.com/georgy7/simpler-cache-api
Tiny repository, only 2 java-files. A few lines of code, many comments.
@georgy7 your issue is caused by the fact the path to your Java files includes cache and is thus being treated by Linguist as vendored code (see https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml#L12-L13) and thus ignored when determining the language stats of a repo.
You'll need to implement a manual override like detailed at https://github.com/github/linguist#vendored-code
@lildude Thanks, this helped.