There are thousands of people on Github making their own languages, most of whom will never see linguist support. Still, why not allow these languages to be highlighted on their own, without explicitly adding them to linguist?
It's already possible to configure linguist using .gitattributes. I'm requesting that the functionality is added which allows configuration of specific file types to match a specific grammar, all of which is done local to a repository. This would allow the hypothetical Mary making her Gollum language to show proper highlighting to those potentially interested without having to modify linguist.
Yes this would be nice
This would also be handy for people using less well-known languages yet to be added to Linguist.
It would also be useful for distinguishing variants of languages. For example, one could differentiate between javascript, web worker javascript, and node.js javascript.
Similar things for header-only c++ files.
That's an interesting idea but how would you check that the grammar Mary is using to highlight her Gollum files is open source? And who is liable if it's not?
On Tue, Sep 29, 2015 at 10:38:43AM -0700, Paul Chaignon wrote:
That's an interesting idea but how would you check that the grammar Mary
is using to highlight her Gollum files is open source? And who is liable
if it's not?
Mary would put the grammar into her repository (maybe a .gitlinguist file) as plain-text configuration. If she's actually using some copyrighted grammar, that's no different than if she had uploaded copyrighted source code.
If she's actually using some copyrighted grammar, that's no different than if she had uploaded copyrighted source code.
I'm not a lawyer but I think that might actually be different. In this case, GitHub would be running the piece of copyrighted source code (to highlight the file on the server-side). They might be liable for that ;)
Another issue is about performances. What if Mary uses (on purpose or accidentally) a very badly written grammar overloading GitHub's servers?
I'm not saying that these issues are insurmountable but just that this feature is much more complex to implement than it might appear. Given the time required and the risks involved, I'm not sure that it would be worth it (it's only my opinion here, I'm not part of GitHub staff).
On Wed, Sep 30, 2015 at 02:29:04AM -0700, Paul Chaignon wrote:
If she's actually using some copyrighted grammar, that's no different than if she had uploaded copyrighted source code.I'm not a lawyer but I think that might actually be different. In this
case, GitHub would be running the piece of copyrighted source code (to
highlight the file on the server-side). They might be liable for that ;)
I'm also not a lawyer and won't speculate further.
Another issue is about performances. What if Mary uses (on purpose or
accidentally) a very badly written grammar overloading GitHub's servers?
This is the biggest issue, in my opinion, as it's a potential avenue for DOS. A "poorly written" grammar could intentionally try to stall up the servers. I don't know the best way to handle this. My main goal here is conveying that there is a desire for this functionality and rallying more qualified folks to see if it can be accomplished.
I certainly appreciate your critical view, since this is, as you said, not a trivial feature.
This is a _really_ interesting idea. @pchaigno is right that there are several technical and legal hurdles that we would have to work through, but it's fun to entertain the idea.
/cc @aroben
I feel like this came up sometime in the past, but I can't find where.
I agree that this would be useful, but also agree with the concerns above about interpreting arbitrary grammars on GitHub's servers.
As a developer of a language that currently cannot get syntax highlighting, even for our own repo, until it takes over the world, let me tell you that it feels really frustrating and disempowering to be left out this way. This seems the opposite of what github stands for, and the reason we are given only talks about the adoption of our language, which seems like github imposed tyranny, not a security problem.
On the technical side, I understand that you have concerns about DOS attacks, but is the problem really insurmountable? How about a special mode for running these untrusted grammars with some sort of timeout and temporarily stopping the feature for repos that cause too frequent timeouts?
I'd also like to add another vector here with github enterprise. In that regard having repository local syntaxes makes even more sense (the hosting/ddos part is also not githubs problem). We build a game engine that uses it's own flavor of json (like many others) for a large number of different data or configuration files. We also have some domain specific languages that would be very good to have highlighting for. We're a big team and big part of our code reviews are for these kinds of files. We will very likely have our code on public github at some point.
Theres also the issue with conflicting file extensions which is kind of addressed by this feature. The .sjson extension seems to be used in environmental science and by some SRT subtitle tools as well.
comment spam? how does one report this?
We really need this!
If we're getting all bent out of shape about the grammar license, why not run the markup client side? Addresses the potential DOS to Github's servers issue too.
why not run the markup client side?
Without giving you a normative answer (since I'm not site staff), I can tell you there'd be obvious XSS risks to executing arbitrary code within a browser... and even with sufficient testing/sandboxing, it's still posing a potential vulnerability to achieve something very cosmetic.
It's a syntax highlighter... textmate can't do anything other than parse and tag a text stream.
If you're worried about exploit, then that argument is applicable to literally any client-side code running on any website ever.
Uhm, no, because in this case, the code in question is authored by users, and not moderated by GitHub staff. Even if its use is straightforward (lex, parse, render), malicious users will try all sorts of tricks to run arbitrary code.
To be honest, I think this is an issue best resolved using a third-party browser extension.
I maintain, this argument about exploiting code applies to literally any code anywhere. I don't see how there's an elevated risk in this particular case from a textmate renderer.
I also think this is more important than you give it credit.
I maintain, this argument about exploiting code applies to literally any code anywhere. I don't see how there's an elevated risk in this particular case from a textmate renderer.
You're missing the point. Try to approach this from a business perspective: adding any feature, especially one running client-side, requires reviewing and testing. These things cost funding and time, and there ideally needs to be an incentive on a commercial level. GitHub might be benevolent and transparent, but they're still a corporate entity. Ipso facto, they're bound by the constraints that affect any other corporation.
I also think this is more important than you give it credit.
GitHub have yet to implement a repository-level setting for adjusting the width of a tab-character. I've submitted requests imploring them to add such a facility (as have other users), and there's still nothing. They haven't seen any incentive in adding this, and I'd argue this is ten times more important than adding repository-level syntax highlighting.
I measure "importance" in the sense of how many users would benefit from the feature's inclusion. You can't possibly believe there are more language authors on GitHub than users whose code looks like crap because they follow good sense and use tabs.
Look, I'm sympathetic. I can't stand reading unhighlighted code, and can only imagine how much of a pain for language authors who have to deal with this on a regular basis. I just acknowledge there are factors beyond implementation to weigh in here, and the diversity of GitHub's userbase means they can't possibly please everybody.
If there's a solution to be had, it'd probably be beyond the scope of syntax highlighting and more in the domain of enabling a whitelist of CSS classes to be added to HTML tags in markdown comments (which include those used for highlighting source code).
You'll hear no argument from me that tab width is a million times more important... and they should do that. But that doesn't mean they shouldn't do this ;)
Anyway, this is certainly part of a larger issue: lack of control over code presentation.
Realistic example: Highlighting specific parts of code isn't possible unless you fudge it with an inline diff:
~diff
-@import "packages/file-icons/styles/icons";
-@import "packages/file-icons/styles/items";
-@{pane-tab-selector},
.icon-file-directory {
&[data-name=".git"]:before {
- .git-icon;
+ font-family: Devicons;
+ content: "E602";
}
}
~
There are pretty obvious limitations with this approach:
[+ -] to "control" highlightingdiff highlighting should be used for
Now, there's a solution to be found by whitelisting certain CSS classes, which would enable users to improve the clarity of code snippets if they desired. Obviously, this isn't as elegant as simply highlighting code using a grammar, but this could be handled using a third-party tool (something that takes a grammar URL and generates the HTML for you to paste into a comment).
Such a solution would be easy to implement... hell, the relevant repository is public. Wondering if I shouldn't be the one to prepare a PR...
While on the off-topic, @Alhadis and @TurkeyMan, you guys can poke the repository to display a custom tab-width by placing a file .editorconfig in the root of the project which can look something like this:
# editorconfig.org
root = true
[*]
indent_style = tab
indent_size = 2
end_of_line = lf
charset = UTF-8
trim_trailing_whitespace = false
insert_final_newline = true
You can get plugins for popular editors that will read this file, so that each repository on you disk can have different settings; see more on http://editorconfig.org/. That's no excuse for GitHub's default tab-width of 8 characters, of course, so we still have to deal with that. I see it only as a spaces mongering ploy to subvert the true usage of tabs and I'm sure it has influenced the decision for billions.
Amen, and that's actually the sole reason for including .editorconfig files in all my projects - even those which will never see collaboration. Heck, I even state the reason in the commit's subject-line.
Despite being a step in the right direction, it doesn't solve the real issue. Authors shouldn't be expected to make changes to their codebase in order to control their code's presentation. It also does nothing for historic repositories with arcane tab-stop widths, where adding new files would be akin to scribbling footnotes on the Declaration of Independence.
@Alhadis
If there's a solution to be had, it'd probably be beyond the scope of syntax highlighting and more in the domain of enabling a whitelist of CSS classes to be added to HTML tags in markdown comments
This isn't really a viable solution for files stored in the repository; it'll only work for code snippets in comments or markdown documents.
Whoops. My bad for overlooking that.
If it makes anybody feel better, I'm affected by this too. 馃榾 I'm less miffed than others, because the code can still be followed without colouring. Only when writing or editing it does syntax highlighting go from a "nice to have" to a definite "must have".
It's not really for pasting into comments though, the main moment that syntax highlighting is critical is when reviewing PR's, the diff must be highlighted. Obviously general code browsing is similarly important. Also populating the wiki/documentation.
This stuff is high frequency and has high turbulence; if the solution requires manually performing external markup and pasting the markup into the wiki, that's practically impossible to keep in sync, and it does nothing for the PR/diff/review case, which is the main thing I care about.
Git is a collaboration tool. Reviewing PR's is the main event; literally, git's killer-app.
Extremely hard to review code in 2017 when you're colour blind.
Checkmate. Really can't argue against that logic: and efficient code review IS of importance to GitHub, as recent site developments have proven. I suppose it's a real grey area, since legibility varies depending on the syntax of the language in question, as well as more subjective factors ("looks fine to me", etc).
If a language's grammar is also used for flagging invalid code constructs, a reviewer needs to see those bits of code coloured with lurid red highlights. Starting to see how this can impose real-world issues...
(For the record, I'm also colour-blind...)
In short, this is not functionality Linguist can implement. It would need to be implemented within the main GitHub code base.
An internal feedback issue has been opened referencing this issue for consideration by our developers.
Please see my comment at https://github.com/github/linguist/issues/2598#issuecomment-366277627 for more details.
As this is not something Linguist has any influence over, I'm closing this issue.
Most helpful comment
As a developer of a language that currently cannot get syntax highlighting, even for our own repo, until it takes over the world, let me tell you that it feels really frustrating and disempowering to be left out this way. This seems the opposite of what github stands for, and the reason we are given only talks about the adoption of our language, which seems like github imposed tyranny, not a security problem.
On the technical side, I understand that you have concerns about DOS attacks, but is the problem really insurmountable? How about a special mode for running these untrusted grammars with some sort of timeout and temporarily stopping the feature for repos that cause too frequent timeouts?