Gitea: Gitea interface unavailable after deleting member of Org "Owners" team

Created on 4 Jul 2018  路  8Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): 1.4.2
  • Git version: not relevant
  • Operating system: Windows Server 2012 R2
  • Database (use [x]):

    • [ ] PostgreSQL

    • [ ] MySQL

    • [x] MSSQL

    • [ ] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [x] No

    • [ ] Not relevant

  • Log gist:
2018/07/04 15:35:31 [...routers/org/teams.go:107 TeamsAction()] [E] Action(remove): read tcp GITEA_SERVER_IP:49445->MSSQL_SERVER_IP:1433: i/o timeout
2018/07/04 15:35:31 [...routers/org/teams.go:219 TeamMembers()] [E] GetMembers: get user '0': user does not exist [uid: 0, name: , keyid: 0]
2018/07/04 15:35:31 [D] Template: status/500
...
2018/07/04 15:36:47 [...routers/user/home.go:49 getDashboardContextUser()] [E] GetOrganizations: user does not exist [uid: 0, name: , keyid: 0]

Description

I am using Gitea on Windows with MSSQL as DB backend. By chance I found out that removing the first owner of an organization brings down the whole gitea web interface for us.

Steps to reproduce in our environment:

  1. Create a new Org using UserA
  2. Go into the Owners Team and add UserB as owner
  3. Remove UserA from the owners team

What happes is that the page does not refresh for about 30 seconds and then shows an 500 error.
All subsequent requests to the gitea webinterface then fail with this 500 error, no matter if using an existing session or accessing it without being logged in.

I was only able to get it to work again by restoring a DB backup previous to the deletion.

When deleting UserA the following entries are written to the gitea.log:

2018/07/04 15:35:31 [...routers/org/teams.go:107 TeamsAction()] [E] Action(remove): read tcp GITEA_SERVER_IP:49445->MSSQL_SERVER_IP:1433: i/o timeout
2018/07/04 15:35:31 [...routers/org/teams.go:219 TeamMembers()] [E] GetMembers: get user '0': user does not exist [uid: 0, name: , keyid: 0]
2018/07/04 15:35:31 [D] Template: status/500

each subsequent request to the web interface generates an entry similar to this in the log file:

2018/07/04 15:36:47 [...routers/user/home.go:49 getDashboardContextUser()] [E] GetOrganizations: user does not exist [uid: 0, name: , keyid: 0]
2018/07/04 15:36:47 [D] Template: status/500

I've used SQL profiler to see which queries are executed against the DB when the user is deleted, and there are only two queries that modify the database (all others are SELECT statements):

exec sp_executesql N'DELETE FROM "team_user" WHERE "org_id"=@p1 AND "team_id"=@p2 AND "uid"=@p3',N'@p1 bigint,@p2 bigint,@p3 bigint',@p1=6,@p2=1,@p3=2

exec sp_executesql N'UPDATE "team" SET "num_members" = @p1 WHERE "id"=@p2',N'@p1 bigint,@p2 bigint',@p1=1,@p2=1

The SQL profiler does not pick up any queries with errors or warnings, timeouts or high load when the deletion is performed. As all other queries are exeuted in in less than 10 milliseconds and given that I can reproduce this reliably on each try I do not think this is in any way load or network related, the i/o timeout message might be more of a red herring.

Let me know if you need anything else.

kinbug

Most helpful comment

And another update: I talked to someone more familiar with MSSQL transaction handling than myself and it seems that the reason for this is the following:

Default transaction isolation level for MSSQL is SERIALIZABLE (https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2016).

Statements cannot read data that has been modified but not yet committed by other transactions.

But Gita tries to do that:
2018/07/05 14:50:06 [I] [SQL] BEGIN TRANSACTION ... 2018/07/05 14:50:06 [I] [SQL] UPDATE "team" SET "num_members" = ? WHERE "id"=? []interface {}{1, 10005} ... 2018/07/05 14:50:07 [I] [SQL] SELECT TOP 1 "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE "org_id"=? AND "name"=? []interface {}{10092, "Owners"} 2018/07/05 14:50:37 [I] [SQL] ROLL BACK

The row for "Owners" in the team table is updated and then read within the same transaction. It cannot be read due to the lock on the record which leads the whole transaction to time out and being rolled back.

So it seems there are two issues:

  1. Gitea is unable to handle a transaction rollback gracefully in that situation
  2. The code needs to be changed so that after the delete and update query a commit takes place OR the transaction isolation level needs set accordingly for those queries

All 8 comments

Could you please post xorm.log if existing. If not set log LEVEL in your app.ini to Info as described here: https://docs.gitea.io/en-us/config-cheat-sheet/#log-log and restart your gitea instance. Now the SQL queries should be shown in console or log.

I already was running in LEVEL = Trace as loglevel but no xorm.log is present.
I've just tried again with Info Level but no avail.
In both cases the gitea.log shows the XORM log level is set correctly:

2018/07/05 11:37:55 [I] Log Mode: File(Info)
2018/07/05 11:37:55 [I] XORM Log Mode: File(Info)

Change log MODE to console and submit the console output. Maybe this will give us more information.

Maybe #4282 can help us finding the problem.

Some new information: Changing to MODE = file,console indeed leads to the console printing more details which are not present in the log files written to disk (maybe this is an issue in itself?)

The moment the error ocurrs:

[Macaron] 2018-07-05 14:50:06: Started GET /org/testorg23/teams/owners/action/remove?uid=11 for 10.10.10.10 2018/07/05 14:50:06 [I] [SQL] SELECT TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "id"=? []interface {}{11} 2018/07/05 14:50:06 [I] [SQL] SELECT count(*) FROM "notification" WHERE (user_id = ?) AND (status = ?) []interface {}{11, 0x1} 2018/07/05 14:50:06 [I] [SQL] SELECT TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "lower_name"=? []interface {}{"testorg23"} 2018/07/05 14:50:06 [I] [SQL] SELECT "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE (org_id=?) ORDER BY CASE WHEN name LIKE 'Owners' THEN '' ELSE name END []interface {}{10092} 2018/07/05 14:50:06 [I] [SQL] BEGIN TRANSACTION 2018/07/05 14:50:06 [I] [SQL] SELECT top 1 * FROM team_user WHERE (org_id=?) AND (team_id=?) AND (uid=?) []interface {}{10092, 10005, 11} 2018/07/05 14:50:06 [I] [SQL] SELECT * FROM "repository" INNER JOIN "team_repo" ON repository.id = team_repo.repo_id WHERE (team_repo.team_id=?) []interface {}{10005} 2018/07/05 14:50:06 [I] [SQL] DELETE FROM "team_user" WHERE "org_id"=? AND "team_id"=? AND "uid"=? []interface {}{10092, 10005, 11} 2018/07/05 14:50:06 [I] [SQL] UPDATE "team" SET "num_members" = ? WHERE "id"=? []interface {}{1, 10005} 2018/07/05 14:50:07 [I] [SQL] SELECT count(*) FROM "team_user" WHERE "org_id"=? AND "uid"=? []interface {}{10092, 11} 2018/07/05 14:50:07 [I] [SQL] SELECT TOP 1 "id", "uid", "org_id", "is_public" FROM "org_user" WHERE (uid=?) AND (org_id=?) []interface {}{11, 10092} 2018/07/05 14:50:07 [I] [SQL] SELECT TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "id"=? []interface {}{10092} 2018/07/05 14:50:07 [I] [SQL] SELECT TOP 1 "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE "org_id"=? AND "name"=? []interface {}{10092, "Owners"} 2018/07/05 14:50:37 [I] [SQL] ROLL BACK [Macaron] 2018-07-05 14:50:37: Completed GET /org/testorg23/teams/owners/action/remove?uid=11 200 OK in 30.1075173s 2018/07/05 14:50:37 [...routers/org/teams.go:107 TeamsAction()] [E] Action(remove): read tcp GITEA_SERVER_IP:50522->MSSQL_SERVER_IP:1433: i/o timeout

It seems to rollback a transaction after 30 seconds which leaves the database in a non-functional state. After that, pretty much every request to the web interface will lead to the HTTP 500.
E.g. below is the output when trying to browse to explore:

[Macaron] 2018-07-05 14:54:02: Started GET /explore/organizations for 10.10.10.10 2018/07/05 14:54:02 [I] [SQL] SELECT TOP 1 "id", "lower_name", "name", "full_name", "email", "keep_email_private", "passwd", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "description", "num_teams", "num_members", "diff_view_style" FROM "user" WHERE "id"=? []interface {}{2} 2018/07/05 14:54:02 [I] [SQL] SELECT count(*) FROM "notification" WHERE (user_id = ?) AND (status = ?) []interface {}{42, 0x1} 2018/07/05 14:54:02 [I] [SQL] SELECT count(*) FROM "user" WHERE type=? []interface {}{1} 2018/07/05 14:54:02 [...itea/routers/home.go:188 RenderUserSearch()] [E] SearchUsers: Count: sql: expected 37 destination arguments in Scan, not 1 2018/07/05 14:54:02 [D] Template: status/500

I was puzzled how a transaction that is cleanly rolled back both transactions can cause this issue and did the obvious test I missed before: Restarting the service after the error ocurred.

And sure enough the server seems to come up without any errors. So the issue is "runtime only". It seems that some internal component does not handle the rolled back transaction gracefully, maybe some kind of cache corruption.

The question remains of course why the Transaction timeout/rollback is ocurring in the first place.

And another update: I talked to someone more familiar with MSSQL transaction handling than myself and it seems that the reason for this is the following:

Default transaction isolation level for MSSQL is SERIALIZABLE (https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2016).

Statements cannot read data that has been modified but not yet committed by other transactions.

But Gita tries to do that:
2018/07/05 14:50:06 [I] [SQL] BEGIN TRANSACTION ... 2018/07/05 14:50:06 [I] [SQL] UPDATE "team" SET "num_members" = ? WHERE "id"=? []interface {}{1, 10005} ... 2018/07/05 14:50:07 [I] [SQL] SELECT TOP 1 "id", "org_id", "lower_name", "name", "description", "authorize", "num_repos", "num_members", "unit_types" FROM "team" WHERE "org_id"=? AND "name"=? []interface {}{10092, "Owners"} 2018/07/05 14:50:37 [I] [SQL] ROLL BACK

The row for "Owners" in the team table is updated and then read within the same transaction. It cannot be read due to the lock on the record which leads the whole transaction to time out and being rolled back.

So it seems there are two issues:

  1. Gitea is unable to handle a transaction rollback gracefully in that situation
  2. The code needs to be changed so that after the delete and update query a commit takes place OR the transaction isolation level needs set accordingly for those queries

This is duplicated with #4972 and should be closed by #5214. Please feel free to reopen it.

Was this page helpful?
0 / 5 - 0 ratings