Gitea: Filenames for wiki pages with special characters

Created on 7 Oct 2019  路  7Comments  路  Source: go-gitea/gitea

Gitea currently saves wiki pages with special characters in the name as files that contain escaped characters in the name. (Example: Title with comma, ampersand & [brackets] = Title-with-comma%2C-ampersand-%26-%5Bbrackets%5D.md)

I propose that Gitea should be saving these pages as filenames with _unescaped_ characters. (Example: Title with comma, ampersand & [brackets] = Title-with-comma,-ampersand-&-[brackets].md)

This would be better because sometimes these wikis are cloned and edited locally. Unescaped filenames are better for that. Also, other services like Github use unescaped characters and so any wikis mirrored from these services come with unescaped characters already. (See #8284 and #8408)

This would present a significant change because all the unit tests currently assume that filenames are saved with escaped characters. Also, backwards compatibility with existing wiki pages would be a concern. However, the unit tests could be adjusted and the solution presented in #8408 would solve the compatibility problem (with the logic reversed to satisfy the new unit tests).

For now, I'm hoping for feedback and discussion about this proposal. I'm also willing to prepare a pull request to make this change.

kinbug stale

Most helpful comment

@Tekaoh And Git stores any repo data (including the wiki repo) as blob object with SHA-1 hash as blob's name. If you interested see explanations from Pro Git book.

Since the hash just contain 0-9 and a-f, the object can be stored anywhere, even on Windows (NTFS).

All 7 comments

At least NTFS (on Windows) allows filenames with ,, &, and [ ], not sure about FAT file system.

However, consider the case when NTFS also forbids <, >, :, ", /, \ | ? and *. If any of those forbidden characters are present on wiki page file name, Gitea should throw Illegal characters on file name error (only when Gitea is installed in Windows).

That's an interesting point. I wonder how Gitea handles cases of files in regular repositories with these illegal characters in the filenames. They could potentially be pushed from Linux machines or created in the web browser. Only filenames in wiki repositories seem to be escaped currently.

Also, I wonder what would happen if you tried on Windows to clone a wiki repo from Github that has pages with these characters in the titles since Github uses unescaped characters in wiki filenames. Although that's a curiosity of Git's behavior, not Gitea's.

Consider repositories where you have:

MyFile.txt
myfile.txt
myFile.TXT
etc.

Those don't have special characters though, so escaping them is irrelevant. Since you're allowed to have those filenames on Linux, I wonder what would happen if you create them on Linux and push to Gitea running on Windows.

Gitea doesn't actually store the files, just Git trees. So those files in a repo wouldn't necessarily exist in the Windows filesystem even if Gitea is installed on Windows. That should actually be true of filenames with special characters that NTFS doesn't like as well, I think...

@Tekaoh And Git stores any repo data (including the wiki repo) as blob object with SHA-1 hash as blob's name. If you interested see explanations from Pro Git book.

Since the hash just contain 0-9 and a-f, the object can be stored anywhere, even on Windows (NTFS).

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

This issue has been automatically closed because of inactivity. You can re-open it if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jakimfett picture jakimfett  路  3Comments

jonasfranz picture jonasfranz  路  3Comments

adpande picture adpande  路  3Comments

internalfx picture internalfx  路  3Comments

kolargol picture kolargol  路  3Comments