Gitea: Proposal to add missing support for short-hash src URLs

Created on 27 Mar 2019  路  10Comments  路  Source: go-gitea/gitea

Similar to #211, it would be convenient to support abbreviated commit IDs in file content/browsing URLs. This is a supported pattern in other Gitea URLs already (commit and archive at least), and also doable with similar interfaces like gitweb, cgit, bitbucket and github. Shorter IDs are handy for reducing the URL length when linking from media where column count is at a premium and wrapping tends to be avoided when possible (mailing list threads, IRC discussions, ...).

kinenhancement kinproposal kinui

Most helpful comment

Short SHA are not permanent links, any additional commit could suddenly make your link no longer work. (Admittedly this is the case with the full SHA, but in that case we're at a whole different world of trouble.)

In the early days of the Linux kernel you could get away with 7 SHA now you almost always need a 10-12 SHA.

All 10 comments

Are there any objections to this? I haven't looked closely at the code paths involved, but if it's as simple as it was to add for archive downloads then I might be able to take a stab at submitting a PR for it myself. Many thanks in advance for considering!

I think accessing URL with shortened commit ID should generate a redirection, not display the content.

Duplicate Content

@mateusza I agree, and there is a request for this on the PR discussion.

That sounds entirely reasonable as well. Either way solves the challenge I have.

I think we could support a redirection, but if we did so, we may make it harder to create shortened links in the first place. If I wanted to create a shortened link, I would first navigate to the page in the browser, then edit the URL bar to shorten the link, then hit enter to verify that I did so correctly; if the correct page reloaded, I would then copy that to my email/IRC/whatever. If we redirected there, I would end up with the same long URL again, so I would have to edit the URL externally, then paste it back in to the browser to verify it worked.

Granted, it's a minor annoyance, but unless redirecting to the full SHA has other benefits, this makes me lean toward simply supporting it without a redirect.

This is not a strongly-held belief, I'm happy with either way. :)

I'd prefer the redirect, that way we don't have multiple pages that show the same thing (might mess up webcrawlers on the site is one issue that I see, among others), but instead the user would always end up on the canonical page.

Edit: Also see @zeripath's response below.

Short SHA are not permanent links, any additional commit could suddenly make your link no longer work. (Admittedly this is the case with the full SHA, but in that case we're at a whole different world of trouble.)

In the early days of the Linux kernel you could get away with 7 SHA now you almost always need a 10-12 SHA.

With 7 hex chars (3.5 bytes or 28 bits) the odds that ID will collide with another in the repository (assuming a completely even distribution and spherical cows with no wind resistance) is a bit north of 1 in 250 million.

Edit for minor clarification: you would need 250 million commits for a 1:1 chance of a collision of a given ID, though if your project has a mere million commits then the chance is something like 1 in 25. So while I agree that the odds aren't great when you have a repository in the hundreds of thousands of commits neighborhood (or perhaps even in the tens of thousands), the fact stands that 7 hex digits is what many familiar tools (including Git itself) use as a standard abbreviation length.

With 7 hex chars (3.5 bytes or 28 bits) the odds that ID will collide with another in the repository (assuming a completely even distribution and spherical cows with no wind resistance) is a bit north of 1 in 250 million.

Yes. But what we should look at is probability of ANY id colliding with at least one other id. And the numbers look very different here.

https://preshing.com/20110504/hash-collision-probabilities/
https://en.wikipedia.org/wiki/Birthday_problem

Edit for minor clarification: you would need 250 million commits for a 1:1 chance of a collision of a given ID,

No.

With 16^7 +1 = 268435457 commits you would have 100% chance there exists at least one 7-digit collision. But you never have exactly 100% chance of collision of any given short ID. Theoretically, there could be milions of objects and only one of starting with "0" digit. Why not? Unlikely, but not impossible.

Yes. But what we should look at is probability of ANY id colliding with at least one other id.

I think it depends on what you're concerned about. If these are done as redirects to the equivalent URL with the full-length commit ID then the main risk seems to be when someone includes a link with the shortened version in, say, an archived mailing list post which can't be easily corrected later and then, at some point, that shortened ID begins to collide with another in the same repository. Not every commit is going to be linked to by someone in such a manner, so for me it comes down to the odds that a particular abbreviation collides rather than the odds that there could exist a collision of at least one abbreviation somewhere within the repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kolargol picture kolargol  路  3Comments

flozz picture flozz  路  3Comments

kifirkin picture kifirkin  路  3Comments

jonasfranz picture jonasfranz  路  3Comments

haytona picture haytona  路  3Comments