Postgraphile: Users with multiple role membership -- security system architecture

Created on 2 Nov 2017  路  8Comments  路  Source: graphile/postgraphile

We are considering Postgraphile usage in our project and there are at least two very different ways of organizing security in Postgres: "all users as database users" or "check everything via RLS". Which one is better?

System description

On the application level we have:
tables: t1, t2, ..., tN
roles: user, manager, admin, ...
users: user1, user2, ..., userN (up to 10k users)

Roles are granted rights to access tables.
Almost all tables will be filtered on row level for most of the roles (based on column values: user_id and dep_id).

All these roles, users and permissions are application-specific. They are not Postgres entities yet.

Every app user can be a member of one or more roles.

Let's say, user role can read t1 with RLS and manager role have rw rights to t2. We might need to create user1, who is a member of both user and manager roles.

Thus, we can't map only our roles to Postgres. A user potentially has a unique combination of app roles and in case of implementing security on the db level, we most probably will need all our app users as db users, as we cannot do set role user, manager in Postgres.

I see two variants how restricted access to the database can be organized:

First variant

Create corresponding users and roles in Postgres via CREATE USER/ROLE. Update Postgres roles every time app users and roles are updated. In our example we will grant user and manager to user1. Graphile gets only role, user_id and dep_id in a JWT token, sets session parameters in every transaction and they are checked by RLS. Table level permissions work via set role user_role performed by Graphile.

Pros:

  • Standard Postgres backend-independent permission system (as Graphile docs say it should be).

Cons:

  • There is not much information about Postgres ability to handle 10k db users efficiently. Most probably it is not optimized for this.
  • Relatively difficult to implement and properly maintain synchronization between app users and Postgres users.

Second variant

Use only one Postgres role for Graphile connections, setting via JWT and Graphile a relatively complicated session parameter (let it be perms_obj), like the following:
```
{
"user_id": 42,
"perms":
{
"department": "DEP1",
"roles": [ "admin", "user" ]
}
}
````

and parse this object in a universal boolean check_rls(perms_obj, other_params) function while checking RLS policies for every row.

Pros:

  • No need to implement and maintain synchronization between app and Postgres users.
  • No need to create thousands of db users in Postgres.

Cons:

  • Performance may suffer as Postgres planner doesn't know the selectivity of this general purpose check_rls(perms_obj, other_params) function.
  • Permission system heavily depends on backend (Graphile), which goes against Graphile philosophy.
  • User roles and departments are available to anyone being transmitted in a JWT token.

Which of these two variants is better for our use case?

Are there other reasonable options?

Most helpful comment

@benjie @dvdrozdov

Very interesting links, thanks!
However, I think now that we shouldn't rely on RLS, as inlining is more magic than science, and postgres security barriers (whether in views, functions or via RLS) are optimization barriers as well. Moreover, complicated RLS subqueries make query plans much harder to read.

Postgres 10 does not use "complicated RLS subqueries" anymore. Instead quals will be ordered and prioritized now. More information can be found in the commit message of the change here: https://github.com/postgres/postgres/commit/215b43cdc8d6b4a1700886a39df1ee735cb0274d That should speed things up.

Maybe it's worth to open a bug report/feature request regarding the inlining problem you face? Looking at the "Inlining conditions for scalar functions" in the wiki link you provided I think the problem is the subquery. However maybe an exception can be made for SELECT EXISTS(...) (sub)queries?

BTW: Did you try to to mark the function LEAKPROOF (...$$ LANGUAGE sql STABLE LEAKPROOF;)?
Maybe this helps to improve performance?

All 8 comments

This issue was discussed on Gitter and the conclusion was to test if Postgres can inline RLS queries into the main query:

If it matches certain criteria (stable, SQL, one statement) then the PostgreSQL planner can inline it; e.g. with user_can above, a query select * from foo where user_can(foo.department, 'admin') would become select * from foo where exists(select 1 from roles where user_id = current_user_id() and department=dept and role = requested_role)

I tested this scenario and it seems that even Postgres 10.0 isn't smart enough to inline such queries:

-- The aim of this script is to check if Postgres can inline correlated RLS
-- queries with an EXISTS clause and a subquery inside

CREATE TABLE t1 (
    id SERIAL PRIMARY KEY,
    dep text,
    data text
);

INSERT INTO t1(dep, data) VALUES
('it', 'it data'),
('fin', 'fin data');

INSERT INTO t1(dep, data) 
  SELECT 'dummy','dummy' FROM generate_series(1,1000);

CREATE TABLE rights (
    username text,
    dep text
);

INSERT INTO rights(username, dep) VALUES
('rlsuser', 'it');

ALTER TABLE t1 ENABLE ROW LEVEL SECURITY;

CREATE OR REPLACE FUNCTION public.rls_check_dep(
    cur_dep text, 
    cur_user text)
RETURNS boolean AS $$ 
SELECT EXISTS (
    SELECT 1 FROM rights r 
    WHERE r.dep = cur_dep 
        AND r.username = cur_user);
$$ LANGUAGE sql STABLE;

GRANT EXECUTE ON FUNCTION public.rls_check_dep(text, text) TO public;

CREATE ROLE rlsuser;

GRANT SELECT ON t1 TO rlsuser;
GRANT SELECT ON rights TO rlsuser;

DROP POLICY user_read ON t1;
CREATE POLICY user_read ON t1 FOR SELECT 
   USING (rls_check_dep(dep, current_user));

VACUUM;

SET ROLE rlsuser;

-- rlsuser=> 
SELECT * FROM t1;
-------------------------------------------------------------------------------------------------
-- Seq Scan on t1  (cost=0.00..271.53 rows=334 width=68) (actual time=0.155..6.287 rows=1 loops=1)
--   Filter: rls_check_dep(dep, (CURRENT_USER)::text)
--   Rows Removed by Filter: 1001
-- Planning time: 0.217 ms
-- Execution time: 6.311 ms
--(5 rows)

RESET ROLE;

-- Query, as we expected it to be inlined into RLS check (
-- postgres=# 
SELECT * FROM t1
WHERE EXISTS (
    SELECT 1 FROM rights r 
    WHERE r.dep = t1.dep 
        AND r.username = 'rlsuser');

---------------------------------------------------------------------------------------------------------------
-- Hash Semi Join  (cost=1.02..23.39 rows=334 width=15) (actual time=0.017..0.143 rows=1 loops=1)
--   Hash Cond: (t1.dep = r.dep)
--   ->  Seq Scan on t1  (cost=0.00..16.02 rows=1002 width=15) (actual time=0.006..0.055 rows=1002 loops=1)
--   ->  Hash  (cost=1.01..1.01 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
--         Buckets: 1024  Batches: 1  Memory Usage: 9kB
--         ->  Seq Scan on rights r  (cost=0.00..1.01 rows=1 width=32) (actual time=0.003..0.004 rows=1 loops=1)
--               Filter: (username = 'rlsuser'::text)
-- Planning time: 0.199 ms
-- Execution time: 0.163 ms

SELECT version();
----------------------------------------------------------------------------------------------------------------
-- PostgreSQL 10.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

Notes:

  • I couldn't make Postgres inline an RLS checking function with even EXISTS(SELECT 1) inside;
  • Such permission checking functions are better to be declared as SECURITY DEFINER, as ordinary users shouldn't be able to access tables with users, roles, etc. However, SECURITY DEFINER seems to be an optimization fence where even a simplest function like SELECT true won't be inlined.

Interesting article on inlining constraints in general:

https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions#Inlining_conditions_for_scalar_functions

Note however that these may have changed since 9.5, and they don't apply specifically to RLS policies.

The following are also a good read if you want to understand what sort of decisions are going on under the hood, and why things are the way they are:

http://rhaas.blogspot.co.uk/2012/03/security-barrier-views.html
https://blog.2ndquadrant.com/how-do-postgresql-security_barrier-views-work/

Very interesting links, thanks!

However, I think now that we shouldn't rely on RLS, as inlining is more magic than science, and postgres security barriers (whether in views, functions or via RLS) are optimization barriers as well. Moreover, complicated RLS subqueries make query plans much harder to read.

BTW, problems with inlining can be bypassed by putting a security checking query into the policy directly:

DROP POLICY user_read ON t1;
CREATE POLICY user_read ON t1 FOR SELECT
   USING ((
    EXISTS (
        SELECT 1 FROM rights r
        WHERE r.dep = t1.dep 
            AND r.username = current_user)
));

-- rlsuser=> 
SELECT * FROM t1;
-------------------------------------------------------------------------------------------------------------
-- Seq Scan on t1  (cost=0.00..1038.06 rows=501 width=15) (actual time=0.038..0.325 rows=1 loops=1)
--   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
--   Rows Removed by Filter: 1001
--   SubPlan 1
--     ->  Seq Scan on rights r  (cost=0.00..1.02 rows=1 width=0) (never executed)
--           Filter: ((t1.dep = dep) AND (username = (CURRENT_USER)::text))
--   SubPlan 2
--     ->  Seq Scan on rights r_1  (cost=0.00..1.02 rows=1 width=32) (actual time=0.005..0.007 rows=1 loops=1)
--           Filter: (username = (CURRENT_USER)::text)
-- Planning time: 0.189 ms
-- Execution time: 0.374 ms
--(11 rows)

which makes this RLS query almost 17 times faster (6.311 ms compared 0.374 ms).

Hi ,
Thank you for the detailed article . I am trying to understand how roles and security works , and I have a question .

If I set roles , manager , buyer ,..etc , but I do this only at application level , and this are not also Postgres entities (in the sense that I do not mach this roles to roles in the database ) , is it true that in this case I do not have security ?And if I do not use RLS .

Because it looks to me that if a user is checked for being a manager , via his credentials , and if he has this role , then , if no database role exists in the database matching this role-- and with appropiate permissions - , when connecting to postgraphql , he has access to the whole database or schema, where he has been authenticated .For manager is this ok maybe , but if user has other role , it might not be ok , I think .

To me it means that we have to use either a matching manager role in the database or use rls .

Am I understanding it correctly ?

Because , somewhere , somehow , in the database , the rights of the user should be specified ...

Thank you for your help ..

If you implement neither role-based access control (via grant) nor RLS (via create policy) then your database will have no security and PostGraphile will allow anyone to modify anything. To protect your data if you expose PostGraphile to untrusted third parties you must use one or both of these options.

Thank you !

@benjie @dvdrozdov

Very interesting links, thanks!
However, I think now that we shouldn't rely on RLS, as inlining is more magic than science, and postgres security barriers (whether in views, functions or via RLS) are optimization barriers as well. Moreover, complicated RLS subqueries make query plans much harder to read.

Postgres 10 does not use "complicated RLS subqueries" anymore. Instead quals will be ordered and prioritized now. More information can be found in the commit message of the change here: https://github.com/postgres/postgres/commit/215b43cdc8d6b4a1700886a39df1ee735cb0274d That should speed things up.

Maybe it's worth to open a bug report/feature request regarding the inlining problem you face? Looking at the "Inlining conditions for scalar functions" in the wiki link you provided I think the problem is the subquery. However maybe an exception can be made for SELECT EXISTS(...) (sub)queries?

BTW: Did you try to to mark the function LEAKPROOF (...$$ LANGUAGE sql STABLE LEAKPROOF;)?
Maybe this helps to improve performance?

[semi-automated message] Hi, there has been no activity in this issue for over 8 weeks so I'm closing it to keep the issues/pull request manageable. If this is still an issue, please re-open with a comment explaining why.

Was this page helpful?
0 / 5 - 0 ratings