Hello,
I am working with the ForgeFed working group on federating software development platforms (already mention in #1612 and #184).
While the spec is still in the works, I think some basics, independent of the actual content and requirements of the spec can already be implemented to prepare for federation.
Building on this preparations it might be easier to build a Gitea-ForgeFed PoC to better understand the things the spec needs to cover.
Or you could even implement an Gitea-specific federation, although I guess that might be unfair to ForgeFed :P
To me one of this basics is addressing the challenge in the difference of assumptions centralized and decentralized systems make about users.
Federation is a decentralised system made up of centralized systems, so this differences will need to be addressed if you want to support federation.
One (to me central) difference in assumptions can be expressed through the question
of the interchangeability of the "user" concept of centralized systems and decentralized systems.
For example in both centralized and decentralized systems as PR object will be attributed to something that can be conceptualized as a user.
In a centralized system that user can be uniquely identified and addressed just by the username.
While usernames are still a thing in federated systems, there they are neither unique nor are they enough to identity the user, you need additional information for that. In federated systems this information normally is the instance. Together they then form the federated ID.
(Please excuse the cryptic nature of the following paragraph, but for completeness I want mention it here anyway:) While for the UI level user and instance would enough, on the technical level it's not enough for ForgeFed. Due to building on ActivityPub, in turn leveraging Linked Data concepts, which in turn build on the Web, internet-level username+instance is not enough and you won't get around URIs on the Web-level. Due to it and it's limitations already mentioned in the linked issues, I leave off WebFinger here.
Anyway, this issue is concerned with the change of the central identifier of user's from their username to an federated ID (in whatever form).
There are this strategies I am aware of to add a federated ID to an data model (there are not completely nor genuinely mine):
I, personally, am in favour of the third option because to me that is the clearer way to address the changes that are required to the object model.
If you just add the federated ID to the user entity this means you also have to break a central assumption about the user entity: their usernames no longer being unique. I feel uncomfortable with the implications this leads to, mainly concerning the mix-up of the local-authentication and general-identity domains.
You could also introduce federated users as new concept with their own entity.
This also means you will have to touch all the areas that are concerned about displaying or attributing users. I think you won't get around that if you want to do it good and right. But additionally you will have to duplicate things that you already have for users, like profiles, maybe display logic, etc. (Go not having generics even making this harder to implement and maintain non-redundantly.)
This brings me to the third approach, separating the user entity.
It builds on the assumption that the user entity actually is made up of two entities, the authentication (for login) entity and the identity entity.
The two entities are not independent in centralized systems, so they are combined into one. In federated systems on the other hand they are independent: Not every identity has login information associated with it. Due to that it does not make sense to combine them. This leads to the proposal of splitting up the user entity.
You then have one entity with (login-)username, password and the corresponding identity reference. And one entity with all the other identity suff, like the display name, (non-unique) username, federated ID, avatar URL, ...
While you still need to touch much code, in this case you than can use the identity entity for both local and federated identities, sharing the logic surrounding them (e.g. profiles) :)
So far my stance on this. While I am in favour of my approach feel free to advocate for the other approaches, in the end it's mainly a point of view, I am open to input. :)
Thanks for reading.
It actually really depends on what are use case about what federated/external user will be able to do. If they can not host repositories than it is ok to have federated user table and moving display name, username etc to it. Through it would require total rewrite on every part of code where local user table is used to now use federated user table that could be hell of a work :)
I totally agree @lafriks. Before we begin to do that, we need to know some questions:
If they can not host repositories than it is ok to have federated user table and moving display name, username etc to it.
Federated users can have repositories, but hosted on the instance of the user.
In other words, you don't only have federated users but also federated projects (incl. repos, issues, PRs, etc). But that's out of scope of this issue.
- Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
People like me who don't like the idea of development of Free and Open-Source software being centralized on, well, centralized, non-free services. (GitHub)
- Why they need this feature?
While there exists Free alternatives like Gitea and GitLab, their instances are isolated from each other and therefore have usage disadvantages. (And are subject to network effects.)
Even if I am motivated to make an account on, let's say Debian GitLab, it's limited to that instance.
In the worst case each project has it's own instance and I have so many accounts to check. (Yes there is email notifications, but we want something web-by otherwise Git+ML would be enough, too.)
- How they want to use this feature?
For federated collaboration, breaking GitHub's network effect.
I have 1 account on my home instance and can collaborate with any project that supports federation (i.e. ForgeFed).
https://github.com/go-gitea/gitea/blob/7217b703e95a3ab01b69f91879fb4d6532f0b2c5/models/user.go#L94-L97
Is it correct that
Name
is the user name (criztovyl
) andFullName
is the display name ("Christoph Schulz")?
But what is the reason for LowerName
? (strings.ToLower(u.Name)
)
In order not to use lower(Name)
in SQL queries.
In order not to use
lower(Name)
in SQL queries.
But why at all? Don't want to change it, just understand it. :)
I started toying a little bit around and stupidly moved some attributes from user.go
(type User
) to a new user_identity.go
(type Identity
) and added some code to AfterLoad
that fills the old attributes with values from the identity.
Afterwards I ran into problems writing a migration that moves the attributes from User
to Identity
. I will look further into this by studying existing migrations and xorm docs, but would be really happy for some further hints. :)
I will share some commits next time.
But why at all? Don't want to change it, just understand it. :)
Because the username is case insensitive, and when a user sends requests like https://try.gitea.io/AxIFiVe or https://try.gitea.io/Axifive need to quickly find the unique user, so we just call strings.ToLower(username)
and sends a simple SQL query.
Convert fields to lowercase for selection is a very expensive DB operation, it is much easier to add a second field.
I promised code.
I just pushed the code I am struggling with, you can find it here: https://github.com/criztovyl/gitea/blob/master/models/migrations/v200.go (200 to make merging/rebasing easier)
Line 30, the sync of the Identity model, works as expected: the table is created.
But Line 35, the sync of the User model, does not seem to work, the table still has the old columns afterwards.
Sync2 will only add columns.
If you want to delete columns you need to use:
Please note that migrations should not refer to things in models or elsewhere - those could be changed in future migrations. They have to be completely self contained - no references to other Gitea code.
So:
Here you need to actually have a copy of identity. Similarly here:
This was probably intended to be xorm: "-"
in which case it's probably not needed in the migration.
Looking at your proposed identity table how come you can't use login_source for this?
Thanks for the hints :)
The issue with login_source, as far as I analysed it, is that it still requires usernames to be unique on their own.
For federation, usernames are not unique, only the Identity is (where the unique identifier for such identities is typically an URI/IRI); usernames are more like a further-limited display name.
This was probably intended to be
xorm: "-"
in which case it's probably not needed in the migration.
I see, but how is the Identity referenced by the IdentityId filled? Is it automagically or do I need to add it somewhere?
And if I create the table initially, the I still need the Identity model, so I cannot leave it off, right?
And another quick question; is it possible to run gitea instance that has all the testdata from the fixtures in it's db?
I am sometimes of visual type. For example would I like to verify the profile is still displayed correctly, e.g. has the right organization assignments.
You can take a look at https://github.com/go-gitea/gitea/blob/master/contrib/pr/checkout.go as it loads fixtures to help us check PRs.
You could make a modified version perhaps.
go run -tags "sqlite sqlite_unlock_notify" contrib/pr/checkout.go -run
works like charm :)
besides not having hot code-reload, but I guess that's slightly more complicated :D
I totally agree @lafriks. Before we begin to do that, we need to know some questions:
1. Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others? 2. Why they need this feature? 3. How they want to use this feature?
Personal
Personal usage, it would definitely very good feature. I have an instance where I store my repositories, public and private (not all but a lot). If someone wants to collaborate on them they have to get a user on my instance and it works same the other way, if I see a project somewhere on a Gitea instance and I want to collaborate, like fixing something or resolve a feature request, I have to get a user there or send a pull request email with diff/ref-links.
If I can invite users from other instances on a private repository or they can simply join with their own users (if a specific instance is not blocked/blacklisted), that would be much easier.
Corporate
I don't know if companies will use it, but I can see potential opportunities. Right now, most of the companies are using GitHub because everyone has GitHub user and they can manage permissions/teams. As (hypothetical) company, if I can host my own Gitea where I can invite other users with Gitea user from different instances and put them in teams and give them permissions that would potentially kick off a new era where a company can safely say "Yes we can use Gitea because everyone can get one on their own" especially when larger groups / federated warriors can provide Gitea instances for users who does/can not host their own, like Mastodon has a lot of instances and you can choose one, use one and reach everything you want.
On the other side, I see it would be a lot of work to do and I'm not sure if it would work and can divide users into smaller fragments based on how they want to use it.
Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
As a personal user…
Why they need this feature?
I don't want to create an account on umpteen-hundred instances,
How they want to use this feature?
just to report an issue. That's why I often don't report it. Could I do that from an instance I already have an account with, regardless of where the repository in question is hosted, things would look different.
Btw: same applies for "minimal PRs/MRs" containing one-time fixes. So in short: if I don't plan to "really join" a project, but still contribute my 2 cents.
I totally agree @lafriks. Before we begin to do that, we need to know some questions:
- Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
I'm a personal Gitea user who collaborates with other people such as @a1batross. We both have our own Gitea instances but I'd like to easily be able to contribute code to one of his repositories, such as with a cross-instance pull request.
- Why they need this feature?
Much easier than making an account on the other instance - and that way you don't need to have open registrations to accept code from others. I don't want a million Gitea accounts.
- How they want to use this feature?
Cross-instance pull requests and issues mainly. I'd also like to make organizations include remote users (so they could directly push).
- Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
Personal because I think it will mostly help to reduce the number of active accounts for them.
Git hosting websites because it will reduce the networking effect and they wouldn't have to store accounts details for each and every one people that came by to report an issue at one time.
- Why they need this feature?
In order to make contribution into foreign projects easier, you wouldn't need to create an account to report an issue of do a quickfix.
- How they want to use this feature?
I would want to use it to only have one git identity and not a ton of accounts across Git repositories associated with different places.
I'd like to be able to do all that I currently do on Gittea, GitHub and GitLab in a single place. That is Issues, Pull/Merge requests, Organizations.
1. Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
Private. Running my own Git server makes my projects seem to be less available due to the smaller infrastructure. Having decentralization and interoperability with Github/GitLab makes my Projects as accessible and more available as if they were on these Platforms. No need for everybody to have accounts on my server makes it more attractive for the contributors.
2. Why they need this feature?
Seamless integration. Use the interface you want to collaborate on the projects you like.
3. How they want to use this feature?
Central authority on my servers and having CI or other cloud services interact with whatever Hosting software gives me the best features I want. No need for everybody to be compatible with all the different API's of GitLab/GitHub/Gitea and all the smaller ones.
Copying fediverse feedback from fr33domlover (ForgeFed team):
The current situation is that to interact with a repo hosted on a forge like Gitea or GitLab or githu8 etc. you need to have an account there. Each forge website is a separate island.
If you host your own forge, you get to make the rules etc. but you're isolated. It's harder to find you. Many people worry they'd get less contributions if they left githu8. It would also require contributors to create an account on each forge.
And githu8 is proprietary and centralized and isn't a community project by the people for the people.
The idea is that your project's visibility and access to it won't depend on whether the user's account is on your forge or some other forge. You can create one account and participate everywhere. Project search, discovery, recommendations, etc. find stuff from the whole network. No single forge has more power over users than any other.
Examples:
- Orgs/teams/groups can include users from different forges
- You can push commits to repos on different forges
- You can open issues, open merge requests, comment, send code review, etc. across forges
- Repos, CI servers, wikis, issue trackers etc. can be on different servers and still seamlessly work together
- How they want to use this feature?
In addition to what others have already mentioned: While surfing the web I encounter tons of interesting projects on that exist on their own instance gitea/gitlab instance. I often just want to star these repo's, but the friction here (signing up) is just too high. Instead I bookmark them in my browser, and mostly forget to check them later (too many bookmarks).
On github I use Stars both as a means of bookmarking as well as a token of appreciation for the project. Projects of particular interest I Watch so I have my notifications dashboard as an easy way for tracking activity.
Stars can be abused, but if a github repo has, say, 14k stars then it says something about its popularity and I can be reasonably sure that the project is useful and of good quality. On gitea / gitlab the stars don't tell me much currently.
Ideally in the future I have one - my own - fediverse-connected place where I have an overview of all my coding activities regardless where I performed them.
1. Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
I'm a personal-ish Gitea user
I have two Gitea instances, one for my art collective (where we keep the code for our projects), and another for me (where I do my personal & contract work, and also some projects with my students)
2. Why they need this feature?
Mostly because i don't want to use GitHub or its competitors.
This thread of questions is a good example, we all need GitHub accounts to be able to be part of a conversation, whereas with federation, we could use our federated accounts
3. How they want to use this feature?
If Gitea and other services were all federated, using a common protocol (ActivityPub), it would not only open up the possibility of interaction between federated VCS services, but to the fediverse at large, you'd be able to comment and subscribe to project updates from Mastodon, for instance.
Echoing what a lot of other users are saying, I'd like to be able to allow people to easily contribute to my projects, even simply just creating issues, without having to open up registrations on my server, and I'd like to be able to create issues or provide tiny contributions without the hassle of creating more accounts
In an ideal world, I would no longer have a GitHub account
1. Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
I'm all 3.
2. Why they need this feature?
As a private user:
The choice between hosting your data on github, gitlab and "somewhere else" is a grave one.
If I choose github/gitlab, I am at the mercy of their decisions as a platform, and have no control over my data.
If I choose something else (e.g my own gitea instance), it is difficult to discover the project[1], and contributions require either having one's own account (so now I'm a website host? that's no good) or going the email route (which is restrictive, not everyone likes that workflow either).
With federation (even if not necessarily forgefed, but ideally interoperable between various services), I get the best of both worlds - one can contribute from any instance using their own accounts, and discover any of my locally hosted projects, all while I get to keep control over my data and platform.
As a corporate host:
As I mentioned, we have multiple gitea instances, for a variety of reasons.
We would be interested in whitelist federation (i.e where federation is only enabled against specific other servers) to facilitate exchanges between our internal nodes and those of arbitrary software partners, without necessarily needing to give them internal access in any case.
As a website host:
Take everything from the private user part - this is now the sales pitch to prospective users.
You don't need to purely rely on "well, X is evil, and I'm not" as your only argument.
Further, the costs of people migrating to your service become much smaller (if not negligible).
3. How they want to use this feature?
As a summary / to reiterate:
As a private user (and website host by way of having users that are effectively just that):
As a corporate user:
1. Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
I do, both for my personal gitea and for the one we use at work.
2. Why they need this feature?
To collaborate on projects on other forgefed instances. There shouldn't be a need to create an user account on every system. In case of my private instance I wouldn't like to give random people access. And for the company instance obviously non-employees cannot get access.
3. How they want to use this feature?
I would use it to collaborate on projects, both open source and proprietary.
- Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
Personal user definitely.
- Why they need this feature?
Open source is all about collaboration. GitHub made open source mainstream by making working on code a social thing and making it trivial to report issues and contribute to projects. While decentralizing the space GitHub is in is a good thing on one hand, we're going way backwards in the terms of being easily to submit issues or patches.
- How they want to use this feature?
Federation solves the above mentioned issue, because it doesn't matter which instance or platform the user is on, they can just pull a remote project and contribute to it from their own instance, using their own local account.
Somehow GitHub did not sent me any notification mails about the new comments here, I completely missed them.
Anyway, I continue to play around in the background, right now I am struggling with xorm and eager loading.
I have:
type Identity struct {
ID int64 `xorm:"pk autoincr"`
UserName string
DisplayName string
}
type User struct {
ID int64 `xorm:"pk autoincr"`
Slug string // LowerName
IdentityId int64 `xorm:"NOT NULL DEFAULT 0"`
Identity Identity `xorm:"-"`
}
How do I make e.Get(&User{Slug: slug})
auto-load(?) the Identity
given by IdentityId
?
I don't think that will work. I think xorm can only do the other way around, i.e. if Identity
had a UserID
column, xorm could fill up an Identities
slice in User
.
If xorm is sufficiently equal to gorm you should just be able to Preload the Identities column to make that happen. In normal SQL that would be a JOIN so http://gobook.io/read/gitea.com/xorm/manual-en-US/chapter-05/5.join.html should help out.
Something like the sample
engine.Table("user").Alias("u").
Join("INNER", []string{"group", "g"}, "g.id = u.group_id").
Join("INNER", "type", "type.id = u.type_id").
Where("u.name like ?", "%"+name+"%").Find(&users, &User{Name:name})
Should do the trick acording to the docs.
Hmm, for my problem having a list of identities does not help.
And unfortunately the Join-Approach does not work either.
I looked around a little bit more in the codebase and found that Repository
has an Owner *User
that seems loaded manually using a GetOwner
function, I have applied the same pattern to my code now. :)
So I cleaned up my local repository, commited and pushed a draft version of the user/identity split that works for displaying an local users profile, e.g. localhost:8080/user1. :) See https://github.com/criztovyl/gitea/commit/d0f24f7919d15ee481f5eae7ec6131ff369eaa66
For now it only works for users, organizations do not work yet.
It's not much, but it's something. :)
With stubbing out all the meat that organizations have, the organization profile page works now, too. See https://github.com/criztovyl/gitea/compare/d0f24f7...3901206bc.
My code is rather exploitative, it's nothing final. :)
Why not add two columns to users table, is_local
, and remote_url
(perhaps domain
too, although not all gitea instances are hosted in main folder of a specific domain, hence why I went with romain_url
)? That way an extra table, and joins aren't needed.
It's not the first time someone suggests this, I thought about it, too.
But I fear(ed) that this could easily lead to accidentally comparing a user by username only, which is invalid because you must (in this specific case) compare all three is_local
, remote_url
and username
(can be optimized down to just comparing remote_url
and username
when remote_url = ""
means is_local
).
Another concern would be growing the users table and with it for example the index for lower_name
, which now needs to be a combined index of remote_url
and lower_name
. (On the other hand one could also set lower_name
just for local users... hmm.)
I prefer the authn/identity-split because I think it's cleaner to implement. (Because Identity is a new concept, it should not just happen to compare just usernames.)
But I guess I should make up my mind again, now that I can see how complex introducing the split is. :)
I slightly naive sometimes. ^^
@criztovyl there are cases where remote_url
or domain (sub domain)
changes, may change too
as you know naming things is hard
username
& email
?1. Who wants this feature? Personal gitea user / Companies with private gitea / Git hosting website via gitea or others?
Everybody opposed to racism, for example - since 2019, github, bitbucket and gitlab having been blocking users based on perceived geographical location - https://archive.today/U1R2L . With the huge wave of BLM demonstrations, it's not a good moment in history to support racism in the software community. I'm only speaking in my own name, as a user of gitea at https://codeberg.org . This is posted as https://codeberg.org/Codeberg/Community/issues/142 , though I think it should become an independent issue at Codeberg. The discussion there suggests _some_ interest at Codeberg.
2. Why they need this feature?
Because of the Tyranny of convenience (Keye 2009) (Wu 2018). The online social network functions of github/bitbucket/gitlab are extremely powerful. Type '@' and a few characters, and chances are you get prompted for the user ID of the person you want among a huge fraction of the free-software community. It's efficient, public (by default), and the pinged person can respond or ignore the issue without having to clean up his/her email box.
But https://codeberg.org and other non-racist git repository servers do not yet have ActivityPub type federation. This is one reason why users of github are dissuaded from moving to community-based servers: the _tyranny of convenience_ is that the easy social connections in the world-wide community don't - _yet_ - exist on small-community servers.
In other words, the feature is needed to store scholarly ephemera across servers that are independent of the centralised, secretive, racist (as of 2019/2020, see above) corporate servers.
3. How they want to use this feature?
For social networking focussed specifically on software development among the open community of (mostly free-software) software developers.
Locking this thread as we have now received a large amount of feedback to the questions above and now whoever proceeds to work on this will have this information they can parse.
Most helpful comment
Federated users can have repositories, but hosted on the instance of the user.
In other words, you don't only have federated users but also federated projects (incl. repos, issues, PRs, etc). But that's out of scope of this issue.
People like me who don't like the idea of development of Free and Open-Source software being centralized on, well, centralized, non-free services. (GitHub)
While there exists Free alternatives like Gitea and GitLab, their instances are isolated from each other and therefore have usage disadvantages. (And are subject to network effects.)
Even if I am motivated to make an account on, let's say Debian GitLab, it's limited to that instance.
In the worst case each project has it's own instance and I have so many accounts to check. (Yes there is email notifications, but we want something web-by otherwise Git+ML would be enough, too.)
For federated collaboration, breaking GitHub's network effect.
I have 1 account on my home instance and can collaborate with any project that supports federation (i.e. ForgeFed).