Linguist: .gitattibutes seems to be ignored for 'sql' language

Created on 5 Jun 2017  路  7Comments  路  Source: github/linguist

My repository (http://hepabolu.github.com/mytap) is misidentified as SQLPL, while it's actually (MySQL) SQL. So I followed instructions and added .gitattributes with the following content

*.sql linguist-language=sql
*.my linguist-language=sql

This results in the repo being identified as Shell.

Even when I rename all *.sql files to *.mysql the repo is still identified als Shell.
The size of the combined *.sql files are bigger than the single shell script, so that should take precedence.

I've also tried 'Sql' and 'SQL' as language in case the name is case-sensitive, but that makes no difference.

What needs to be done to make the repository be identified as sql? Or is this a bug in the linguist recognition?

Most helpful comment

You can now override language definitions and file paths. You just need to make sql detectable for linguist library.
Adding *.sql linguist-detectable=true in .gitattributes solved this issue for me!

All 7 comments

So here's the thing... SQL is considered a data language and not a programming language by Linguist. Data languages do not count towards the language statistics. Accordingly, when you implemented your override, Linguist did the right thing by showing your repo as predominantly shell as it knew what to do with all those misidentified SQL files and thus only considered your programming language files.

We attempted to classify SQL as a programming language in the past but this led to a lot of wildly inaccurate clarifications of things that weren't even remotely close to SQL so it was reverted.

With this in mind there is no way to force GitHub to report the language of you repo as SQL or MySQL (or any other data language). The closest you can get is by picking another SQL-like programming language and forcing an override... but that wouldn't technically be correct either 馃槈

Thanks for the explanation. Too bad this doesn't work. Maybe it's worth to add this information to the documentation? I've searched the documentation for anything related to SQL but couldn't find anything.

Just for the record: what makes 'SQLPL' a programming language, since it shows up now?

On this topic: where can I find the rules for identifying files as SQLPL? I couldn't find them and it turns out that some files are identified as 'SQLPL' and others as 'PLpgSQL'. I'd like them to be at least identified the same language across the board.

Maybe it's worth to add this information to the documentation?

I seem to recall we had something documented somewhere about data languages but I can't find it right now. I agree we should probably document something, though it would need to be generic enough to cover all data languages which don't count towards the statistics.

Just for the record: what makes 'SQLPL' a programming language, since it shows up now?

I don't know if SQLPL or PLpgSQL (or any of the other SQLs TBH) really are or should classified as programming languages, but they are initially being associated with your files due to the .sql extension (SQL which doesn't count towards the stats, SQLPL, PLSQL and PLpgSQL which do) and then the heuristics and the classifier are used to whittle the list of languages down even further. A more generic explanation is in CONTRIBUTING.md.

I think we can close this issue now. Docs have been updated to make it clearer as to how Linguist works and the search gotcha. Feel free to open a new issue if docs need further clarification.

You can now override language definitions and file paths. You just need to make sql detectable for linguist library.
Adding *.sql linguist-detectable=true in .gitattributes solved this issue for me!

after providing
*.sql linguist-detectable=true
*.sql linguist-language=sql

The language stats bar recognizes the SQL files correctly, and most of my .sql files are recognized as SQL, but - minor issue and mostly a cosmetic one - in code search some of the .sql files are still labeled as PLSQL, SQLPL or PLpgSQL.

You can now override language definitions and file paths. You just need to make sql detectable for linguist library.
Adding *.sql linguist-detectable=true in .gitattributes solved this issue for me!

Thanks a lot.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lucasrodes picture lucasrodes  路  6Comments

henrywright picture henrywright  路  6Comments

GabLeRoux picture GabLeRoux  路  6Comments

Alhadis picture Alhadis  路  5Comments

TimothyGu picture TimothyGu  路  5Comments