Linguist: .x 3D models reported as RPC language

Created on 19 Oct 2019  ·  15Comments  ·  Source: github/linguist

Preliminary Steps

Please confirm you have...

Problem Description

There https://en.wikipedia.org/wiki/.x is a description of a file format that is confusing linguist.
The file format is designed for exchanging 3D model data, it is not a programming language.
Linguist is currently detecting such files as rpc language.
Or that is what I think is happening. Either way, the repositories are misclassified.

URL of the affected repository:

https://github.com/ucpu/degrid

Quick search has shown a lot more incorrectly classified repositories. This is why I would prefer a systematic fix rather than per-repository overrides.

https://github.com/urmuelle/SpecialEffectsExamples
https://github.com/btmitchell1/towerdeffps
https://github.com/nirme/rpg_game_engine
https://github.com/ASzot/xna_game_engine

Last modified on:

2019-10-19

Expected language:


C++

Detected language:

RPC

All 15 comments

Haven't seen that extension used before, in both my years of 3D and programming. The Wikipedia page you linked to mentions it's been deprecated for a long time:

As of 2014, the file format has been deprecated for a long time [2] and the interchange role is better served by a more modern format like Autodesk FBX.

In other words, it's unlikely this format has enough in-the-wild distribution across Linguist to qualify for admission. Your best bet is to simply exclude the .x file(s) using an override:

~gitattributes
*.x linguist-vendored
~

When searching for language:rpc on github, all repositories on first 5 pages are incorrectly classified.
There are 96 repositories classified as rpc, but 0 files as rpc code (apparently all .x models are too large to be considered code).

.x has been deprecated by Microsoft. But it is still used. Some people prefer it over fbx because of less buggy implementations.

Furthermore, linguist is classifying "Apollo Guidance Computer", which has been discontinued 44 years ago, so deprecation obviously is not a reason to exclude a language?

Furthermore, linguist is classifying "Apollo Guidance Computer", which has been discontinued 44 years ago, so deprecation obviously is not a reason to exclude a language?

Going by the +6,200 people who have forked Apollo-11's source code after its publication on GitHub went viral mid-2016, I'd say there's almost no relation. 😉

Now, if you're convinced there's enough lingering usage of .x to warrant recognition as a (data) language by Linguist, we'd welcome a pull-request (this can help you).

/cc @dylex for input, as he was the one who added .x as an XDR/RPCGEN extension in #3472.

Further thinking about this, another approach is to make the rpc classification more accurate to ensure that model files with .x extension are not classified as rpc. Would this be any simpler?

It sounds like these are some other language that linguist doesn't know about (i.e., not "Logos" either)? If someone wants to add a definition for it I'd be happy to help/review with the disambiguation.

There doesn't appear to be anything in these files that would be triggering the RPC heuristic, so I assume it's just defaulting given that there are no other potential options.

Would this be any simpler?

@malytomas Erh, yes and no. Improving a heuristic is always a blessing, but in this particular instance, it won't affect the X/RPC (mis)classification because Linguist only disambiguates between filetypes it recognises (i.e., those that are supported on GitHub.com).

I know diddly-squat about any of the formats being discussed here, so... that's really why I'm unburdening the task of disambiguation upon you.

@dylex Yes, that's (quite literally) why I just finished typing. 😉 A language has to be supported before it can be disambiguated... which is a glaring limitation of Linguist, because in numerous scenarios the "identify no language at all" analysis has been sorely missed — just look at any PR mentioning Solidity's .sol extension... 😀

We try only to add languages once they have some usage on GitHub. In most cases we prefer that each new file extension be in use in hundreds of repositories before supporting them in Linguist.

https://github.com/search?l=&q=extension%3Ax+size%3A%3E10000+%22xof+0303txt+0032%22&type=Code
About 30k+ files, just with this particular version header.
Searching for the word xof only gives 46k+ files.
I consider this convincing. :D I hope you agree?

https://docs.microsoft.com/en-us/windows/win32/direct3d9/reserved-words--header--and-comments Here is some specification.

I will try to create a PR when I get some more time.

I consider this convincing. :D I hope you agree?

Jesus. Yes, by all means go for it. :|

I wonder how many of those .x files are actually RPC? @dylex, any idea how we can narrow these search results down to only RPC or Logos files?

Hard to say... program xdr hyper opaque union are all somewhat unique to RPC, and anything named *svc* or *_prot likely is. Most of those .x files classified as RPC or C look like RPC to me, though a few are Haskell Alex (lexer) files (which mostly start with "{"), and scattered random makefiles or xml or who knows what. I don't really know what Logos is. Many of the Logos-classified ones actually seem to be some sort of ruby thing or linker scripts. I actually can't find any files I'm sure are Logos.

Some of these microsoft directx files seem to start with xof 0302txt 0064 instead. Since I would never expect at RPC or Logos or Alex or liker script to start with xof, adding that as the heuristic seems simple enough once there's a definition.

I'd be happy to add a haskell alex language def, too, while we're at it, if there's interest, though there aren't all that many of them and many are already identified as Haskell (which is almost right).

@malytomas Here's something to help you get your PR together quicker. I've harvested a subset of 2,779 files from the extension:x size:>10000 "xof 0303txt 0032" search you linked to earlier, with proof of at least 618 unique repos between 564 users.

(Obviously the net result is a lot higher than these totals, but Harvester happens to have a hard limit of < 3,000 results… but that's not a problem here, because we only need proof that repositories number in the hundreds).

Feel free to quote this research in your PR body. I recommend using an entry name like DirectX 3D File or something that distinguishes it from anything X11-related.

(Between “Logos” and “X”, I'm not sure which format has the worst language name...)

I'd be happy to add a haskell alex language def, too, while we're at it, if there's interest, though there aren't all that many of them and many are already identified as Haskell (which is almost right).

@dylex Sorry, I missed your reply because we both responded at once. If there isn't sufficient distribution of "Alex" files, then he's obviously not popular and can't sit with the cool kids. 😉 In other words, it isn't about what we think, it's about how many people have given the filetype sufficient real-world distribution to be eligible for support on GitHub. More on that over there.

@malytomas I've smashed together a grammar for highlighting DIrectX .x files (it'll look better once it's active on GitHub). Just ping me when you get your PR ready and I'll tell you what to update. 👍

PR merged. I consider this done. Thanks :D

Was this page helpful?
0 / 5 - 0 ratings