Gitea: Application Personal Access Tokens are stored as plaintext in the database. Easy fix maybe.

Created on 13 Apr 2018  路  36Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): 1.3.0
  • Git version: 2.7.4
  • Operating system: osx
  • Database (use [x]):

    • [ ] PostgreSQL

    • [ x] MySQL

    • [ ] MSSQL

    • [ ] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [ ] No

    • [ x] Not relevant

  • Log gist:

Description

After creating an application token, I checked the database, and the token is stored in plaintext in the database! This is bad because anyone that may gain access to the database would have access to all tokens.

Suggested fix is just to hash the token before storing it locally, for example, the token you give the user is 30ab72898b83c8549e510ee36cde7c7d7be01d97 becomes A6DCC734FFB0E5A1E871E10C1B2A48CA60E9104F8F61FD41BD1DC01789062D81 when ran through a sha256 hash.

This way you can take what the user provides turing an authorized token use, and run the value they give you through sha256, and validate that the result matches what's in your database.

...

Screenshots

screen shot 2018-04-13 at 5 09 00 am
screen shot 2018-04-13 at 5 08 51 am

kinbreaking kinenhancement kinsecurity

Most helpful comment

Ok, this is silly. We're talking about future solutions whilst in the meantime we've got keys stored in plaintext in the database and we've sat on it for 9 months.

Although it's nice to be able to look up our old keys by the API - in order to do so you actually need the raw password, so at that point you don't even need the key! I don't think there can be any sensible use of this retrieval API.

Creating keys isn't particularly difficult so if you lose one you can always create a new one - and in fact you should just delete the old one at that point.

In terms of how to store these things we should just hash em like passwords.

When some better solution comes along we can deprecate this method, but until then we should do something.

All 36 comments

@lunny can I work on this ? What key should be used for creating the hash ?

This would be breaking since you can get the token via the API

@adelowo Yes, please do it and send a pull request. The filed is https://github.com/go-gitea/gitea/blob/master/models/token.go#L21, as @BranndonWork 's suggest we could use a hash256, I don't think we need a key to do that.
@JonasFranzDEV this will not break anything if @adelowo add the migration of the field Sha1 on https://github.com/go-gitea/gitea/blob/master/models/migrations/migrations.go

@lunny I think that you're wrong since

curl -X GET "https://try.gitea.io/api/v1/users/JonasFranzDEV/tokens" -H  "accept: application/json" -H  "authorization: Basic CENSORED"

Will return my access token:

[{"name":"api","sha1":"31df5baaf69024d08a884261e180de00b3983bfa"}]

I've created the token here:
screenshot-2018-4-14 gitea git with a cup of tea

So If you only save the SHA256 hash of the access token you will have no way of providing the access token via the api. This would be breaking.

So we need some form of hashing key so as to keep default behavior ?

@adelowo I think it is not possible the hash the application keys securely if we want to expose those via the mentioned API.

Correct, while submitting this issue, I wasn't aware that we could pull the key via API.

What's the need for providing the key again after the fact? Aren't most tokens/secrets provided by vendors only given out once, and not available for retrieval if lost?

In my opinion, this grants full account access, and should be treated no differently than a username/password combination.

If you DO need to retrieve the actual token later, while securely storing it in the database, you can encrypt it using the users salt (which already exists in the DB) and a custom system level salt, which never changes, and is created in the filesystem during initial install. Then you can decrypt the token with the standard basic authentication method of user/password. Would that work?

I agreed with @BranndonWork we should remove the API even it will break the compatible but it's more secure. The token only be returned to user once when created via UI or API.

@BranndonWork Decrypting a user token with the username password combination would not work since you could also retrieve the token by an application token. I think that there's no way to not remove this possiblity of retrieving the token via API.

@JonasFranzDEV Sure, you could __request__ a token via the api, but it would then be visible only once. It would not be possible to retrive the same token later on.

If this change were to be made we need to be aware of integrations that use this, as we don't want to break drone integration with Gitea again (as an example of something that happened in the past with a breaking change)

@techknowlogick @lunny Guess when we implement this we have to raise the version to 2.0?

@daviian this is not a very big change, but need very careful!

We could go for GitHub Authorization API and depricate the /users/{name}/token API :thinking:

https://developer.github.com/v3/oauth_authorizations/#list-your-authorizations

We are working on a strategy of using signed JWT as access token to address the concerns raised here. The already issued tokens(sha1 hash), new tokens and sha1 hash of new tokens can all be used to access the API during transition. Admin can decide when to enforce access by JWT only after transition period. This will involve some changes in gitea api client, and in integrated apps such as Drone to use the token i/o hash, but these integration can continue to use hash until updated. it is agraceful transition, and will not break existing code or integration.

The token claims are stored in DB, and can work together with a server secret (env var) to reconstruct the token. However claims alone can not access API. Signed JWT can access api, and will only live in memory on server.

Here are the proposed structs. Can someone work on db migration and make it happen? it is quite tricky to get XORM work with all db drivers. Our code depends on this new structure in database. Thanks.

proposed access token struct

I'm against using JWT for this use case, just because this project's intent is to align with other forges, Gitlab/Github/Bitbucket all do app tokens (same with even gmail does it if you have 2fa on your email), and the way that Github does it (is that they don't store the full text unencrypted in the DB, but rather just some meta info around it) is my preferred approach, as that allows easy revocation of specifc app tokens, etc.

the sha1 hash of the jwt token is stored in db as it is now, and indexed. when a token comes in, we can easily calculate the sha1 hash, look for it by hash in the db, and reject if it is not there. removing the hash from db will revoke it too. in this sense, hash and plain text claims are all meta data about the token. this is similar to other providers' approach. the difference is our token carries some info rather than a random meaningless string.

it is possible to encrypt the token and store the encrypted text too. it is just extra step not really necessary.

Some big players such as AWS use JWT. If we stick to one simple signing algorithm (eg hs256), we can avoid the design flaws in JWT. HS256 is fast, and produces 32 bytes signature.

Gitea does not need to be exactly the same as other git hosting solutions. otherwise it should have been written in Ruby as both Github and Gitlab do that.

Another maintainer asked the same question to my pull request to gitea-sdk, and I have explained the rationale for jwt as access token.

Here are the key points again, for easier discussion and reference in the same thread.

That is a good question, and we have debated over this internally when redesigning the whole process. The primary purpose is to enhance security and to avoid storing token in the db.

JWT can contain a bit extra info such as user name, expiry time, intended use etc. it can be used to grant partial access to some resources. Another benefit is that we can verify the token signature at api gateway on separate machines without db query. This decoupled design helps the application to scale.

This is designed to guard against escalated privileges as a result of unauthorized access to db. Not so long ago a major cloud provider had an incident of not removing data when the volume was provided to another user. This has happened before and may happen again in the future, as cleaning scripts fail sometimes. The signing key only lives in memory and thus reduces the risks. of course this strategy won't cover everything. it is a small step towards a business wise reasonable endeavour.

Whatever it is, a decision needs to be made, hopefully soon.

A downside of JWT token is that it is tied to claims, or app permissions. Once it is issued, the permission can not be changed, as changing the permission will result in a new token. Our experience is that users don't change permissions often. If they do, they can delete the old token and generate a new one to replace. other developers' use case can be quite different.

If changing app permission is frequent and expected, then encrypt and store a random string as token is a better approach. In our internal debate we have discussed the pros and cons of possible approaches and we are aware of this downside.

the current Drone integration depends on basic auth on api routes to list and create app token when users log in Drone with Gitea password. So the tokens should be retrievable. Hashing the app token will break Drone integration, and will leave this issue unresolved for a long time.

If the community prefers to use a short random string as application token, I think the following approach will work and keep the existing integration during transition.

application token structure

  • The application token can be of a length different from sha1 hash. For example it can be 15 chars long.
  • the application token is not stored in db.

storage

  • We can store a sha1 hash of the new application token as we are doing now, and verify the token by calculating sha1 hash and look up by hash in db. the sha1 hash can be stored in the existing db field.
  • we will need to encrypt the token, store the encrypted text in db, and return the token to Drone (or other integrated app) upon request. When users log into Drone with Gitea username and password, Drone will request for a token via api using basic auth header. this needs to be a new field in db.
  • it is optional to create a separate field and store the application permissions as JSON.

during transition

  • during transition, if the incoming token is already sha1 hash (40 chars long), we will look up in db directly and verify it is the token issued before. this way tokens issued before will continue to work.
  • admin can notify users to generate and use a new token
  • Gitea api client needs to be modified to include both sha1 hash and the token in response.
  • old version of integrated app can continue to use sha1 in the response
  • new version of integrated app should use the application token to access api

after transition

  • when the transition is over, admin can enforce access by new application token only.

questions

  • what encryption method is desirable?

Please comment or share your concerns. The issue has been around for a long time now and it will remain insecure until we do something about it.

If no one is working on this already, I can work on it and prepare a PR. I already have code implemented in JWT. If the community really dislikes JWT, the code can be modified to use a short random string, and keep everyone happy.

any thoughts or insights?

I'd consider app tokens as passwords and treat them as such - and you wouldnt store passwords only hashed wtih sha1, would you?

I'd propose to store these the same way we store user passwords, which means not only hashed with sha1.

Also rather than complicate things, by changing existing behavior, we could wait until oauth is enabled, and then stick to what @kolaente said, and hash the token the same way passwords would be done. In fact I've already done this process for 2fa scratch tokens.

Future state of the token api endpoint could be:

{
  "id": 1,
  "url": "https://try.gitea.io/api/authorizations/1",
  "scopes": [
    "public_repo"
  ],
  "token": "abcdefgh12345678",
  "token_last_eight": "12345678",
  "hashed_token": "25f94a2a5c7fbaf499c665bc73d67c1c87e496da8985131633ee0a95819db2e8",
  "app": {
    "url": "http://example.com",
    "name": "example app",
    "client_id": "abcde12345fghij67890"
  },
  "note": "optional note",
  "note_url": "http://optional/note/url",
  "updated_at": "2011-09-06T20:39:23Z",
  "created_at": "2011-09-06T17:26:27Z",
  "fingerprint": ""
}

Also when I mention wanting to align with different forges that has no impact on the choice of programming languages (both bitbucket and pagure use python for example) we just want to provide a consistent experience for developers.

Edit: the above is how github has their token api (note it likely has the token crypted in the DB)

As a reminder, in order to keep integration with applications using older version of gitea sdk, the property "name" and "sha1" should be there, even if sha1 may not be used in the future.

Encryption is required to retrieve the token and return to older apps. This is mainly to be backward compatible.

Hashing is required to index, look up and validate a token. Hashing needs to be done no matter how the token will be presented in the token api.

The hashing can be improved, for example using one from the hs family to get a longer signature, or hashing the signature with sha1 to reduce the signature length. The data structure in DB mandates hash to be unique. this will prevent two tokens with the same hash.

Ok, this is silly. We're talking about future solutions whilst in the meantime we've got keys stored in plaintext in the database and we've sat on it for 9 months.

Although it's nice to be able to look up our old keys by the API - in order to do so you actually need the raw password, so at that point you don't even need the key! I don't think there can be any sensible use of this retrieval API.

Creating keys isn't particularly difficult so if you lose one you can always create a new one - and in fact you should just delete the old one at that point.

In terms of how to store these things we should just hash em like passwords.

When some better solution comes along we can deprecate this method, but until then we should do something.

@zeripath So all in for hashing keys?

Yes. In whatever secure way we can.

@zeripath I agree with that.

I'm for hashing of the keys, however as this is breaking for Drone, I'd prefer that #27 is completed first. To align with GitHub, they do hash, but copy the last 4 characters into a new column as plaintext so if you have a personal access token, you can compare the last 4 characters.

How is Drone requiring access to the tokens? For authentication? Then why is authentication not done the normal way comparing the provided token (after hashing) to the stored hash?

Also worthy of consideration: The 27th issue in this project (Integrate an OAuth2 provider) was placed on the 3rd of November, 2016, and has since not yet been done.
Meanwhile, all-bypassing authentication tokens are stored as plain-text in the database, and directly accessible through the API (and there are implementation problems there, to make things worse).

@0x5c Oauth is being worked on: #5378

Nice to see! But will it be finished soon?

@0x5c it's ready to get tested please feel free to spend a lock on it or test it.

Now OAuth2 has been merged, and support has been added to Drone for Oauth2, this can now be worked on. I've self-assigned this ticket.

@techknowlogick, we are trying to run Gitea on Google cloud. In our case, we would like to encrypt any secret data using cloud providers Key Management System (KMS). All major cloud providers AWS/Azure/GCE has their own version of KMS.

The approach we are trying is this, use XORM's FromDB/ToDB methods to dynamically encrypt and decrypt []byte before writing to database. Here is our type: https://github.com/gomodules/ksm-xorm-demo/blob/master/types/secret.go

We are using go-cloud SDK to do this in a cloud provider agnostic way. https://github.com/google/go-cloud/tree/master/secrets

Our goal is to ensure that in case our db is hacked, any user secrets are not exposed. We have been using this our private fork. I wonder if this can be done in this upstream project?

I think a KMS (such as one of the ones you listed above, or hashicorp's vault) would be out of scope of this specific change, however I think it could possibly be done in a future PR although I think a discussion on how it should be implemented is warranted.

Was this page helpful?
0 / 5 - 0 ratings