Kong: Oauth2 plugin (with postgres) not scaling well

Created on 16 Mar 2017  路  8Comments  路  Source: Kong/kong

Summary

Our API is protected by the oauth2 plugin. Clients must use the token endpoint to generate an access token using the "client_credentials" grant.

Under a load of about 100req/s, the CPU of our postgres db progressively increase to reach 100%. Kong is very slowed down.

We checked the queries being run and realized that this one is under-optimized:

SELECT "expires_in","authenticated_userid","id","credential_id", "Refresh_token","access_token","token_type","scope", extract(epoch from created_at)::bigint*1000 as created_at 
FROM oauth2_tokens LEFT OUTER JOIN ttls ON (oauth2_tokens.id = ttls.primary_uuid_value) 
WHERE (ttls.primary_key_value IS NULL OR (ttls.table_name = 'oauth2_tokens' AND expire_at > CURRENT_TIMESTAMP(0) at time zone 'utc')) AND "access_token" = 'xxx';

It looks like it's the query Kong runs when trying to check if the access token in the request header is valid. This query is generated by this piece of code: https://github.com/Mashape/kong/blob/2b71933aefe7852e4bdb4c7ba1e134a62a5b8161/kong/dao/db/postgres.lua#L222

Our ttls table has about 2M records.

Steps To Reproduce

  1. Setup Kong with postgres, protect an API with the oauth2 plugin
  2. Generates several millions of access tokens
  3. Use them to query any API endpoint.
  4. Observe postgres being very slow when Kong is checking if the token is valid.

Additional Details & Logs

  • Kong version: 0.9.3
  • Datasource: postgres (db.m4.2xlarge RDS instance on aws)

Side question: why is there a need of a join query, only to fetch a valid token in the oauth2_tokens table?

tasneeds-investigation

Most helpful comment

Hi guys,

today we have investigated on this. We resolved adding an index on primary_uuid_value column.
Here you can see the CPU usage graph of RDS. We have about 80 req/s.

screen shot 2017-06-16 at 16 11 51

This resolve the issue however we don't understand why you need to store a ttl for tokens. Isn't it enough to have the expires_in field in the access_token table?
In this way we could save disk, cpu and memory of the database.

Thanks

@PGBI @thefosk @thibaultcha

All 8 comments

The same for us here with 0.8.3. ~ 2M of tokens and relative ttls. We first tried to truncate the entire table oauth2_tokens then we deleted all rows from table ttls then in a few seconds the system has restored the normal status.

It is a critical bug for us because in that moment the gateway could not proxy any requests.

@p0pr0ck5 Do you expect the same behavior in 0.10.x versions?

No changes were applied to this that would improve performance in any way, so yes, the behavior will be the same in the 0.10.x series.

@thibaultcha Could generate some troubles remove the ttls record when expired?

Hi guys,

today we have investigated on this. We resolved adding an index on primary_uuid_value column.
Here you can see the CPU usage graph of RDS. We have about 80 req/s.

screen shot 2017-06-16 at 16 11 51

This resolve the issue however we don't understand why you need to store a ttl for tokens. Isn't it enough to have the expires_in field in the access_token table?
In this way we could save disk, cpu and memory of the database.

Thanks

@PGBI @thefosk @thibaultcha

@codebien nice!

The TTL interface is an abstraction that allows us to apply TTLs to data store rows for Postgres (this is a native Cassandra concept we leverage). TTLs here don't apply just to access tokens, but all data store entities. Eventually we want to refactor this approach to use a less heavy method with the joins currently in play.

Adding the index on primary_uuid_value is interesting. I would be curious to see how this interacts with other parts of Kong, and if a large number of writes would cause performance problems to constantly update this index (I suspect the answer is no, but I'd be curious to hear more about your experiences).

@codebien Your intuition seems correct. I have opened a PR to create the missing indexes.

Closing as this has been merged.

Was this page helpful?
0 / 5 - 0 ratings