While DC-local clusters are relatively well understood and easy to implement (just share sessions/codebase/DB), multi-homed installs aren't quite as understood.
Does Gitea have any plans for a tool to allow syncing across multiregions in near-real time? Gitlab is already working on one - and I'd like to open this ticket to track the progress of a similar effort here, if it's a direction the maintainers would like to go.
Related or duplicate of #2959
I'd say this is related but distinct. #2959 clearly discusses a single-homed, multi-server cluster - a single region HA setup - when they discuss a single load balancer.
it would take to be able to run gitea on a multi-host
setup, likely behind a single nginx load balancer.
I'm proposing a system to allow multiple autonomous gitea installs (say - oregon, virgina, and london), which all share data in near-real time, to allow a single frontend experience, but are completely self-contained as well. In other words - multi-region async replication.
@benyanke sorry for my terse message above, I hope it wasn't too unfriendly. As you can probably tell with the latest news we are getting a lot more than usual traffic and so triage is important.
No worries! I really value the gitea project, and I'll be seriously looking at moving my main projects to a selfhosted instance of it given the news. I'm getting more comfortable with Golang, and I'd even be willing to work on this multi-region replication if I'm able.
Keep up the good work.
A part of multihome is using a DB that supports it so I've opened: https://github.com/cockroachdb/cockroach/issues/24846 with the CockroachDB project to allow it to support Gitea. This of course doesn't fully achieve what you are looking for, but it is a starting place.
Thanks for the kind message.
I was initially thinking distinct local DBs with data sync on the app layer, but using DB layer replication would indeed also handle it!
Looking forward to seeing this.
I know that also not exactly what you are talking but #1612 could achieve the same goal and offer better experience across DC.
For information, https://github.com/pingcap/tidb should also provide a distibuted database compatible with mysql to be used by gitea.
hmm not sure if I understand the terminology correctly.
But what about MariaDB/MySQL galera cluster? https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/
It's a simple and scaleable synchronous active-active multi-master topology.
So since it's already build-in (batteries included) in MariaDB 10.2, it's ready to go.
Has someone tried this? @benyanke @markuman ?
But what about MariaDB/MySQL galera cluster? https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/
Just following up - the multi-master DB doesn't really solve the problem of async/multi region, as mysql/galera clusters assume low levels of latency. That is - it's intended for use within the same datacenter/region, not multi region.
This issue is for something like two distinct instances of gitea, each with it's own DB, doing it's own application-layer replication.
Ok, point taken. Galera and Glusterfs setup would result in a sync replication (_single region, but multi AZ (DC)_) and fault tolerance setup.
I think for multi region, you must do it like gitlab[1].
Setup a read replica of gitea. So just a 2nd instance which mirrors all repositories from your master.
In case of failover, all repositories in your 2nd instance become the primary repository and your primary gitea instance must be mirror all repositories from the 2nd instance.
[1] https://docs.gitlab.com/ee/administration/geo/replication/
@markuman I'm currently a day-job gitlab-ee administrator, and the gitlab philosophy is excellent for this, I think. This is exactly how I'd suggest doing it in gitea:
Fully agreed on Galera and Glusterfs, or similar tools. These are all excellent tools for providing intra-region HA, just not inter-region HA. In fact, a really resilient setup probably uses all of these: intra-region tools to provide each region's install redundancy, then inter-region tools to provide the entire stack redundancy.
@benyanke I like the suggestion you made but would suggest modifying it such that there's no hard master/slave relationship between nodes, but instead there's a master, but when it goes down if there's 2 or more slaves they will sync writes between one another and then their combined "vote" on conflicts would force the master to merge those transactions in once it came back online.
Highly agree with this whole thread though, Gitea could seriously use replication functionality.
As an open-source alternative to a lot of the other git repo sharing/auth platforms out there, this kind of failover might even be best implemented at an organization level for master/slave relations, for example:
Say you have 3 nerds who want to have their sourcecode backed up and generally trust one another but might still want some fine-grained controls over the access to individual repos, they each create an org which is treated as a master that they have absolute admin control over, with a global first-come-first-serve registry of org names.
A bit better would be to encrypt org data on the replicas so it's there to pull if there's data loss on the master node, but otherwise secure (this would allow for things like contractors setting up shared repos between one another or for small client corporations.)
Or if you wanted to go another step with some less than pleasant keychains (I might actually help implement it if you wanted to go this route as I've done a bunch of this sort of thing in the past and have some code and schemas I could port in) you could have each such org act as described in the previous paragraphs, and have it fully encrypted on the replicas such that each org gets offsite backup and their data remains secure, but they can still permit access if desired - this gets hairier because then you have replication of the keychain to deal with, which isn't always as straightforward as files, but you can generally get around that by making collisions of things like usernames failover to machine\user or having a check against a known value to determine which user their password matches.
Most helpful comment
@markuman I'm currently a day-job gitlab-ee administrator, and the gitlab philosophy is excellent for this, I think. This is exactly how I'd suggest doing it in gitea:
Fully agreed on Galera and Glusterfs, or similar tools. These are all excellent tools for providing intra-region HA, just not inter-region HA. In fact, a really resilient setup probably uses all of these: intra-region tools to provide each region's install redundancy, then inter-region tools to provide the entire stack redundancy.