Gitea: 500 when adding review comments on specific lines for PR:s

Created on 20 Jun 2019  路  26Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): 1.8.2
  • Git version: 1.8.3.1
  • Operating system: CentOS 7
  • Database (use [x]):

    • [X] PostgreSQL

    • [ ] MySQL

    • [ ] MSSQL

    • [ ] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [X] No

    • [ ] Not relevant

  • Log gist:

[Macaron] 2019-06-20 16:08:16: Started POST /MyTeam/MyRepo/pulls/19/files/reviews/comments for 172.16.2.102

2019/06/20 16:08:16 http: superfluous response.WriteHeader call from code.gitea.io/gitea/vendor/gopkg.in/macaron%2ev1.(*responseWriter).WriteHeader (response_writer.go:59)

[Macaron] 2019-06-20 16:08:16: Completed POST /MyTeam/MyRepo/pulls/19/files/reviews/comments 302 Found in 57.718947ms

Description

I get a 500 internal server error sometimes when adding review comments to specific lines for a PR. I haven't figured the pattern yet because it only happens on specific lines of the files. But once I find a line that causes the 500 internal server error then it happens every time.

It could be related to #6236 as the files we work in are not in UTF-8 but instead in iso-8859-15 and we use the swedish characters 脜, 脛 and 脰 but I'm not getting the same error message in the log so therefore I'm creating a new issue.

If it's relevant I started running the server on version on 1.7.0 and has since upgraded to 1.8.1 and then 1.8.2

The repo and the files are related to my work so I can't share those publicly.

Screenshots

kinbug revieweconfirmed

Most helpful comment

Changing the database will not solve this.

The problem is we assume that code lines are in utf8 so just store the line of code as a string in the database.

Git makes no such guarantees and code lines can be in any encoding.

We either:

  • Stop storing code lines in the db - and reference the line (we still need to reencode at some point.)
  • Have to reencode the codeline to ensure that it is in utf8 before putting it in the db.
  • Store the codeline as bytes and reencode it when it comes back out.

There are plenty of other places where we make the same wrong assumption: at the plumbing level git just stores bytes not strings.

  • If your commit message was converted from SVN or CVS it can be non utf8. Git can report if it was told that this message was utf8 or not and I think there is a way to convert.
  • If you filesystem is weird you can have non utf8 characters in the path. Hell if you're perverse you can do this on Unix extremely easily.
  • Git makes almost zero assumptions about file encodings - write your code in koi8 for all it cares, write the comments in a totally different encoding to the rest of the file...
  • If you have weird settings git will put messages into weird encodings.

All of these are potential failures with encoding - we don't see them very often because most people just use the default but it's a significant issue for anyone with old code. Unfortunately our code is full of assumptions regarding the string nature of all of these things.

So changing your MySQL table format ain't gonna fix this. Your codeline ain't value utf8 and although go hasn't complained the db certainly will.

All 26 comments

Update: It doesn't seem to depend on 脜, 脛, or 脰. I can get a 500 on other lines as well. Have tried recreating the problem on try.gitea.io but to no avail.

It also only happens when doing a review, adding a single comment works fine.

When 1.9.0 is released I'm going to update to it and see if the issue still remains then.

i can confirm this issue still remains on 1.9.2

@kstruessmann could you confirm that on https://try.gitea.io ?

I can't reproduce the error on https://try.gitea.io.
The issue acts a bit weird. It only happens to some users and if so it persists for them the whole PR. Other users can comment with no problems.
I also can't reproduce the error in our own testing server. The only difference there is an LDAP authentication in the production server.

````

Session ID, CSRF Token, Organisation Name and Repo name obfuscated

gitea.log

2019/09/06 13:11:27 ...s/context/context.go:137:HTML() [D] Template: status/500
2019/09/06 13:11:27 ...c/net/http/server.go:3012:logf() [I] http: superfluous response.WriteHeader call from gopkg.in/macaron%2ev1.(*responseWriter).WriteHeader (response_writer.go:59)
2019/09/06 13:11:27 ...s/context/context.go:315:func1() [D] Session ID: daa5464564
2019/09/06 13:11:27 ...s/context/context.go:316:func1() [D] CSRF Token: m_kdfkjdlkdjs645354354
2019/09/06 13:11:27 ...s/context/context.go:137:HTML() [D] Template: pwa/manifest_json

Console

server_1_ea9cafbbe31f | [Macaron] 2019-09-06 13:11:27: Started POST /Org/Project/pulls/5/files/reviews/comments for 192.168.235.144
server_1_ea9cafbbe31f | [Macaron] 2019-09-06 13:11:27: Completed POST /Org/Project/pulls/5/files/reviews/comments 302 Found in 38.186227ms
server_1_ea9cafbbe31f | [Macaron] 2019-09-06 13:11:27: Started GET /manifest.json for 192.168.235.144
server_1_ea9cafbbe31f | [Macaron] 2019-09-06 13:11:27: Completed GET /manifest.json 200 OK in 1.4949ms

````

We noticed no difference if the users connect directly to get or use the reverse-proxy.

The problem also appears when your first comment some line on the pull request(95 line for example), then push some changes whice makes file a bit shorter( for example 90 lines at all). After that, if you try to answer to previous comment you will see 500 code("some file has only 90 lines"). It means that you can comment specific lines, but if you chang files the system won't remember it, it can only see the current stage of file.

Just try to answer to this comment
https://try.gitea.io/pavel/test/pulls/1#issuecomment-12940
You will see 500 error.

Adding a comment didn't trigger the problem. Replying to the review did.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

I hit this bug from time to time. I also have repos with non utf-8 content, and I think that's the reason why it happens.

Yes, @guillep2k it is.

I am having this problem as well with Gitea 1.9.5.

Engineer 'A' started a pull request;
I commented on a line of code with 'Start review';
The next comment I was trying to add, triggered a 500 error.

Our code comments and pull request comments contains Chinese characters, the encoding is GBK.

@tanrui8765 which database are you using? If it's mysql, which character set did you use?

@tanrui8765 which database are you using? If it's mysql, which character set did you use?

In my case it's postgresql 9.2.24.

hello @lunny @guillep2k , I am using MySQL , character set is utf8.

MySQL version is 5.7.26, gitea database character set is utf8_general_ci.

Could you try use utf8mb4? You could change app.ini [database] charset=utf8mb4. And you should also convert your database to utf8mb4.

Sure, I'll give it a try, but I am concerning that, will it cause some new problems like data miss or disordered or something?

Changing the database will not solve this.

The problem is we assume that code lines are in utf8 so just store the line of code as a string in the database.

Git makes no such guarantees and code lines can be in any encoding.

We either:

  • Stop storing code lines in the db - and reference the line (we still need to reencode at some point.)
  • Have to reencode the codeline to ensure that it is in utf8 before putting it in the db.
  • Store the codeline as bytes and reencode it when it comes back out.

There are plenty of other places where we make the same wrong assumption: at the plumbing level git just stores bytes not strings.

  • If your commit message was converted from SVN or CVS it can be non utf8. Git can report if it was told that this message was utf8 or not and I think there is a way to convert.
  • If you filesystem is weird you can have non utf8 characters in the path. Hell if you're perverse you can do this on Unix extremely easily.
  • Git makes almost zero assumptions about file encodings - write your code in koi8 for all it cares, write the comments in a totally different encoding to the rest of the file...
  • If you have weird settings git will put messages into weird encodings.

All of these are potential failures with encoding - we don't see them very often because most people just use the default but it's a significant issue for anyone with old code. Unfortunately our code is full of assumptions regarding the string nature of all of these things.

So changing your MySQL table format ain't gonna fix this. Your codeline ain't value utf8 and although go hasn't complained the db certainly will.

We could convert the code lines to utf8 before we save it to database.

We could convert the code lines to utf8 before we save it to database.

If we do this, we'd need to use the whole file for encoding detection, not just one line, as it's not enough context. The detection library requires at least 1024 bytes; the more, the better.

BTW, I did a test today with just 'Add single comment' on lines of code. I can add many comments (added 6 comments) with no 500 error.

image

It seems like the problem only follows the 'Start review' ?

It seems like the problem only follows the 'Start review' ?

That's correct. The multi-step review is the one failing.

File encoding is something we should be very careful about if we ever implement a "suggest changes" feature like GitHub's. We could easily corrupt the file contents.

The problem also appears when your first comment some line on the pull request(95 line for example), then push some changes whice makes file a bit shorter( for example 90 lines at all). After that, if you try to answer to previous comment you will see 500 code("some file has only 90 lines"). It means that you can comment specific lines, but if you chang files the system won't remember it, it can only see the current stage of file.

Just try to answer to this comment
https://try.gitea.io/pavel/test/pulls/1#issuecomment-12940
You will see 500 error.

I also get this fault (gitea 1.11.6). But I think that is a different fault than the original issue? That doesn't seem to have anything to do with encoding.

I've fixed the encoding problem. I've also fixed the invalidation problem - I thought that made it back to 1.11 but certainly in 1.12. I've just fixed another issue that was causing NPEs in a template file in 1.12 and master.

I think whatever was the original bug that caused this issue should now be fixed and therefore I'm gonna close this issue.

@davidsvantesson are you able to reproduce your problem on try or on the release v1.12 branch? If you can could you open another issue?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thehowl picture thehowl  路  3Comments

thehowl picture thehowl  路  3Comments

kifirkin picture kifirkin  路  3Comments

BNolet picture BNolet  路  3Comments

jorise7 picture jorise7  路  3Comments