Gitea: Database should use foreign keys

Created on 31 Mar 2017  路  27Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): 1.1.0
  • Git version: Not relevant
  • Operating system: Not relevant
  • Database:

    • [x] PostgreSQL

    • [x] MySQL

    • [x] MSSQL

    • [x] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [ ] No

    • [x] Not relevant

  • Log gist:

Description

Gitea's database should use foreign keys. This is much cleaner and makes it easier to do maintenance on the database (e.g. removing spam) without forgetting to delete certain records or deleting too many records.

kinenhancement revieweconfirmed

All 27 comments

Foreign keys are important, and it was discussed before.

The problem is that a migration that adds foreign keys will break on installations that already have orphan DB registries.

Maybe we should try to create references but ignore any error.

I don't like foreign key. It means your database data could be changed by your Go code and database events.

I like foreign keys.
It ensures your data integrity. Also you will get an error if trying do to same stupid things (bugs).

http://dbadiaries.com/foreign-key-constraints

I agree about using foreign key, but before being able to do that
we'll need a way to ensure they can be added and a way to deal with
the case in which they cannot. So maybe an admin panel to find
orphaned/broken records and repair them ?

i don't like foreign key, data must be Immutable.

Foreign keys are important to enforce a sane database

@sailfish009 No one enforces you to set foreign keys as CASCADE. There is also RESTRICT option.

While I like the idea of foreign keys for some things (comments on issues, repos on org/users), you'd still need to be careful with that relations you setup. If a user deletes their account, do you want their comments to go away as well? For any kind of corporate development, that would be considered a bug as potentially vital information might go lost if a user is deleted when an employee quits.

Aside from consistency, there's also performance gains to be had by having the RMDBS taking care of clean-up instead of having Go do that (Go really shouldn't be doing that...). But at the same time, SQLite and FK aren't really friends. While I don't suggest hampering performance for the sake of being friendly with SQLite, we must still consider these things.

Just my 2 cents.

I admit that I haven't reviewed the code, but why would you ever delete a user row from the database? Shouldn't you just set a flag to disallow login? If the concern is allowing the username to be recycled, wouldn't that cause the same issue with regard to comments, etc as FKs and deleting the user?

@jgdye but currently everywhere they are deleted so it is not possible to set on existing installs

@jgdye Removing spam is my main use case.

@lafricks Any FK solution is going to require large database adjustments on existing installs. I'm not a Go programmer, but I would suggest two stages to implementation. Start with adding the active flag and deactivating instead of removing rows, and when a FK solution that works is finished, have a maintenance option to unorphan comments by creating deactivated users. Perhaps also a config flag that tells Gitea whether to try to create the FKs or not.
@lesderid For the spam case I would think offering a choice of disabling and keeping comments or a cascading delete that removes all comments if a spammer would work well. If it's a spammer that has also made real contributions you don't want to lose, disabling instead of deleting would also prevent them coming registering again with the same name. FKs would help in the nuke the entire account case.

@lafriks apologies for the misspelling. Editing on my phone and couldn't see your username while typing.

FKs in SQLite is a no-op if not enabled so you could just setup the FKs and then continue to delete children as well, for Postgres etc the Delete-op would essentially be a no-op, and for SQLite we ensure that we don't leave orphans around.

As for already orphaned data, just have the migration delete orphans _before_ running Sync on that table.

Also, as always I'd recommend doing this incrementally, 1 or 2 tables per PR. Less bugs, easier reviews, easier to revert if it breaks 馃檪

PS. I'd start off with something like issues and issue-comments, if a repo is deleted all issues and related comments could and should be removed, since they would already be inaccessable anyhow 馃檪

@jgdye In my experience, the spammer(s) I have encountered create an account to spam and they use it to create hundreds/thousands of spam orgs and repos, so nuking the account is exactly what would need to be done.

Spammer or whatever reason. Foreign keys are best practice to keep the database in a consistent state.

related to delete user operation, some solutions are possibles:

  • Set to null the FK to comment / ticket (on UI, just replace "John Smith" by "Unknow user")
  • Like YouTrack or Google Business Apps, migrate all issues / comments / repos ? / ... to another user selected during the delete operation
  • Do nothing, just delete everything.

The second option seems good: _do you want migrate user data ?_ yes / no . If "yes", _what do you want migrate ?_ (multiple-choice) Comments / Issues / Repos. _Migrate to which user ?_

Usualy in corporate environment, an user is just disabled, and, many weeks (or months / years) later, account is deleted once and for all.

@0xbaadf00d depending on need there are multiple scenarios that need to be taken into account.

  • Deleting records when doing clean-up (from spamers etc)
  • When record is deleted by user (record must be kept in database with flag deleted)

Setting to null is not an option as in some tables that could cause bugs so I think only options is creating migrations that will create empty (flaged as deleted) records

Sure, is not a trivial question. But take care about soft delete (entry is still in database but flag as "deleted"). In Europe, we have some rules about personnal data (user first/last name, email, ...) and an user can ask to be hard deleted from database.

Here the multi-language EU document (get effective in 2018/01/01) :
http://eur-lex.europa.eu/legal-content/FR/TXT/?uri=CELEX%3A32016R0679

[...] In particular, a data subject should have the right to have his or her personal data erased and no longer processed where the personal data are no longer necessary in relation to the purposes for which they are collected or otherwise processed.

In case of spammer, just delete account without enabling "migration options" and all data related to the account data will be hard deleted.

If spammer is the main problem, why not use am email domain blacklist to deny all domains like @facebook.com, @yopmail.com, ... ?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

Activity!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

More activity!

More seriously, are there any updates on this?

No one is working on this. You're more than welcome to try. I expect that this would not be trivial to switch on.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

Activity!

As people have said before, sweeping stuff under the rug is not a solution to issue triage.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

flozz picture flozz  路  3Comments

jonasfranz picture jonasfranz  路  3Comments

jorise7 picture jorise7  路  3Comments

thehowl picture thehowl  路  3Comments

BNolet picture BNolet  路  3Comments