Postgraphile: Best practice for secure Data Isolation in a Multi-Tenant System

Created on 25 Jul 2018  路  10Comments  路  Source: graphile/postgraphile

I'm submitting a ...

  • [ ] bug report
  • [ ] feature request
  • [X ] question

I am wondering if any has or knows a "cookbook" on how to integrate Graphql (and of course here I mean Postgraphile!) in a "best practices" method when the database is multi-tenant and we need to protect data of one customer from another.

A little background and use case:

We have a current SaaS SSR type Web application written 15 years (!!) ago in, the now defunct, Apple WebObjects. We have 300+ customers and the DB isn't too large, around 300 GB Postgres 9.5, but soon to be upgraded to 10.X. Since the system was Server based all security, etc. and queries were isolated to the back-end where we had things to make sure the data isolation between customers was kept. To that end the data model is already setup to have the customer_id across all the tables, etc. (and has been since day one) to help isolate the data, and that's what we do in the current system.

Leap to today and we are re-writting the application in Vue (SPA) and want to use Graphql instead of writing and going through REST hell with the UI side. We are using Apollo Client and so far everything is great .. except .. we have a gaping security hole the size of a lorry! :) (That's for Benjie's benefit..LOL) Anyone can currently query ANY data .. We do have basic JWT security in place, but it really isn't hard for any semi-hacker to grab the JWT and then hit the Graphql service and mine all the data they want. Obviously this can't actually be deployed in production! :(

So I've been reading a LOT of security mechanisms, etc. within Postgres and some articles here in the docs on security but to be honest, I'm pretty much confused as to what it means in the real world. :) I was wondering if anyone knows a "cookbook" or article or anything on best practices on how to stitch the proper security to make sure no queries (CRUD) cross "organizational" boundaries. With the JWT we did implement the me() thingy -- to get the current user, etc. But what to do with it exactly?

I am hoping to keep this "light weight" without (maybe) the use of RLS and complex roles, etc. And have the isolation logic on the server side. This MAYBE possible with the Hasura Postgres solution they just released to open source and I'm exploring that, but am hoping there is an equally good way with Postgraphile.

I am pretty positive we're not the first by a long stretch wanting to use Graphql / Postgraphile in a SaaS multi-tenant modern Web app, so hoping maybe the audience can point me to some good docs.

Thanks in advance!

/Steve

馃挰 discussion 馃搫 add-to-docs

Most helpful comment

I typically do something like this:

create table users (
  id serial primary key,
  username citext not null unique
);

create table organizations (
  id serial primary key,
  name citext not null unique
);

create table organization_users (
  user_id int not null references users  on delete cascade,
  organization_id int not null references organizations  on delete cascade,
  admin boolean not null default false,
  primary key (user_id, organization_id)
);

create table protected_data (
  id serial primary key,
  organization_id int not null references organizations on delete cascade,
  secret text
);
create index on protected_data (organization_id);

create function current_user_id()
returns int as $$
  select current_setting('jwt.claims.user_id', true)::int;
$$ language sql stable;

create function current_user_organization_ids(require_admin bool default false)
returns int[] as $$
  select array_agg(organization_id)
    from organization_users
    where user_id = current_user_id()
    and (require_admin is false or admin is true);
$$ language sql stable;

alter table protected_data enable row level security;

create policy select_organization_admins_only on protected_data
  for select
  using (organization_id = any(current_user_organization_ids(true)));

Performance wise:

  • the current_user_organization_ids function is a simple SQL statement that uses an index (the first column of the primary key) so can execute extremely quickly.
  • the current_user_organization_ids(true) function call has no non-constant arguments so PostgreSQL can call it once up front and use the result for the rest of the statement - i.e. it does not need to be called for every row
  • using organization_id = any(...) means that the organization_id index can be used

All in all, this works out pretty fast and fairly expressive.

鈿狅笍 This SQL was written directly into GitHub, it has not been tested.

鈿狅笍 I've omitted a lot of stuff like roles, grants, other policies, etc - you'll need to add those in for a full example.

All 10 comments

There's a Plugin For That 鈩笍

https://www.npmjs.com/package/postgraphile-plugin-connection-multi-tenant

I'm not sure if it's quite far enough for your goals, but it's a good start. Might be worth talking to the plugin author.

The main part is this (simplified/made explicit for your use case):

/* boilerplate */
addArgDataGenerator(() => ({
  pgQuery: queryBuilder => {
/* end of boilerplate */

    queryBuilder.where(
      sql.query`${queryBuilder.getTableAlias()}.customer_id = current_setting('jwt.claims.customer_id')::int`
    );

  },
}));

Basically this adds another check to the WHERE clause in the SQL query generated by the field. The next part would be to actually figure out which fields that should apply to; the plugin above only applies them to the root fields, but assuming you don't allow your tenants to "break out" of their jails by adding relations to records with different customer_id values that might be sufficient. Personally I'd hook every single field that returns a table connection. You also have to think about then applying this to mutations also. Making this whole mechanic easier is on my (ever-growing) todo list.

Personally I'd do this with RLS because it seems a perfect use-case for RLS, and by using it I can be certain that no matter what server logic I have no-one can break out of their "jails".

I'd query pg_class and pg_attribute to get the relevant tables and generate the relevant policies (AS RESTRICTIVE, PG10+) using meta-programming and then run them in a migration and pat myself on the back for a job well done 馃槈 I'd also add default current_setting('jwt.claims.customer_id') to the customer_id field so that newly created records would automatically be constrained to their jails and add COMMENT ON COLUMN <table>.customer_id IS E'@omit' to prevent the customer_id even being visible in the GraphQL schema.

Some excellent ideas! More to think about! Thanks!

So what makes this more difficult in our use case is that we have some pretty complex "roles" and -- worse -- there isn't a single 1:1 relationship between a user who signs in .. and the customer .. they can actually belong to multiple customers .. sort of like github and the different orgs you can belong to..

Also -- some users are "super user admins" -- who can do anything to any customer .. some are "customer admins" (who can admin their organization) .. but not every org they belong to .. so a user could be an customer admin for one customer, but just a regular user in a different one .. etc.

In fact, the nested tree of RBAC stuff in the current system is quite complex .. and figuring out how to protect all this in a Graphql context is making my head spin..

I typically do something like this:

create table users (
  id serial primary key,
  username citext not null unique
);

create table organizations (
  id serial primary key,
  name citext not null unique
);

create table organization_users (
  user_id int not null references users  on delete cascade,
  organization_id int not null references organizations  on delete cascade,
  admin boolean not null default false,
  primary key (user_id, organization_id)
);

create table protected_data (
  id serial primary key,
  organization_id int not null references organizations on delete cascade,
  secret text
);
create index on protected_data (organization_id);

create function current_user_id()
returns int as $$
  select current_setting('jwt.claims.user_id', true)::int;
$$ language sql stable;

create function current_user_organization_ids(require_admin bool default false)
returns int[] as $$
  select array_agg(organization_id)
    from organization_users
    where user_id = current_user_id()
    and (require_admin is false or admin is true);
$$ language sql stable;

alter table protected_data enable row level security;

create policy select_organization_admins_only on protected_data
  for select
  using (organization_id = any(current_user_organization_ids(true)));

Performance wise:

  • the current_user_organization_ids function is a simple SQL statement that uses an index (the first column of the primary key) so can execute extremely quickly.
  • the current_user_organization_ids(true) function call has no non-constant arguments so PostgreSQL can call it once up front and use the result for the rest of the statement - i.e. it does not need to be called for every row
  • using organization_id = any(...) means that the organization_id index can be used

All in all, this works out pretty fast and fairly expressive.

鈿狅笍 This SQL was written directly into GitHub, it has not been tested.

鈿狅笍 I've omitted a lot of stuff like roles, grants, other policies, etc - you'll need to add those in for a full example.

[semi-automated message] We try and keep the open issues to actual issues (bugs, etc); this seems like more of a discussion right now, so I'm closing it but please feel free to keep discussing it below 馃憤

Thank you @benjie

So far (I am currently testing this out in my own implementation) this solution looks to be a great solution for multi-tenant databases or databases where permissions need to be more granular.

I did find a typo (just missing a closing bracket), which I thought I would flag before this goes into the docs.

create policy select_organization_admins_only on protected_data
  for select
  using (organization_id = any(current_user_organization_ids(true));

Should be

create policy select_organization_admins_only on protected_data
  for select
  using (organization_id = any(current_user_organization_ids(true)));

Good catch; fixed 馃憤

hello, I'm sorry if I bring this topic back but , I know this is a pseudo code, but which is the reason to use a table in order to link organisation and user? should be simpler to move the org_id in the user table? in this way it is possible to delete the user on cascade if we delete the organisation...

@francescovenica In some applications/platform etc. a user can belong to multiple organizations - or to none. Take Github for instance as an example. If your use case is different - you might as well create a tenant/org column in the users table.

ops sorry, I miss that point, it make sense! Out of scope, to remove the users do you think is better to remove them through the app or with a trigger on the db that check if there are other organizations linked to a particular user?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Venryx picture Venryx  路  4Comments

james-ff picture james-ff  路  4Comments

jayp picture jayp  路  3Comments

giacomorebonato picture giacomorebonato  路  3Comments

kilianc picture kilianc  路  4Comments