Postgraphile: Improving RLS support...

Created on 8 May 2018  路  11Comments  路  Source: graphile/postgraphile

First of all, Postgraphile IS SUPER AWESOME. You are doing an amazing job, I cannot thank you enough!

I'm submitting an improvement request regarding RLS support.

  • [x] improvement request

PostGraphile version: 4.0.0-beta.5

-- NOTE:
-- Table `public.flow_step` is behind a RLS policy controlled by the user rights.

-- NOTE:
-- Table `public.flow` is also behind a RLS policy. It is controled depending on the `public.flow_step` access.
-- Meaning that if the user has access to the specific `public.flow_step`, he has access to referenced `public.flow` too;
-- Respectively, no access to `public.flow_step` = no access to `public.flow`.

-- NOTE:
-- The `private.set_next_flow_step(uuid)` function allows the user to set the next `public.flow_step` without him
-- knowing what the next step actually is. The function itself is defined as `SECURITY DEFINER` meaning that the
-- function will have access to ALL steps which will allow it to calculate the next one. (My Postgraphile config
-- does not allow introspection of the `private` schema so the function is safe).
CREATE FUNCTION private.set_next_flow_step(
    flow_id uuid
) RETURNS public.flow AS
$$
    WITH next_flow_step AS (
        SELECT fs.id FROM public.flow_step AS fs
        WHERE (
            -- Logic for calculating the next step goes here... --
        )
        LIMIT 1
    )
    UPDATE public.flow
        SET step_id=(SELECT id FROM next_workflow_step)
    WHERE (id = set_next_flow_step.flow_id)
    RETURNING *
$$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

-- NOTE:
-- The `public.set_next_flow_step(uuid)` function is the introspected/public version of the above mentioned function.
-- It PERFORMs the `private.set_next_flow_step(uuid)`, re-SELECTs the `public.flow` in focus and RETURNs it. (The re-SELECT
-- is very important for finding out if the user has access to the next step, because the `public.set_next_flow_step(uuid)`
-- function is run with `current_user` rights)
-- IMPORTANT: Since the user might not have access to the next flow step, the result can be either `NULL` or `public.flow`.
CREATE FUNCTION public.set_next_flow_step(
    flow_id uuid
) RETURNS public.flow AS
$$
DECLARE
    next_flow RECORD;
BEGIN
    PERFORM private.set_next_flow_step(set_next_flow_step.flow_id);

    SELECT * INTO next_flow FROM public.flow WHERE (id = set_next_flow_step.flow_id)
    RETURN next_flow;
END;
$$
LANGUAGE plpgsql VOLATILE;

----
-- The CLIENT runs a GraphQL mutation joining the `public.flow` in the response...
----

-- NOTE:
-- Postgraphile will run the above functions in 2 steps.

-- 1. step
-- PARAMETER: $1 = <some-unique-uuid>
with __local_0__ as (
    select __local_1__.* from "public"."set_next_flow_step"($1) __local_1__
)
select (__local_0__)::text from __local_0__

-- 2. step
-- PARAMETER: $1 = <result-of-step-1>
with __local_0__ as (
    select (str::"public"."flow").*
    from unnest(($1)::text[]) str
)
-- ... rest of the query ... --

-- FIXME: 
-- Now the fun part, if the 1. step produced a NULL value (because the user does not have access to the next step),
-- the parmeter will behave like so: $1 = '{"(,,,,,,,,,,)"}', instead of $1 = NULL!

-- This poses HUGE problems like: if `public.flow` has domain columns which are defined with `NOT NULL`,
-- the function will ALWAYS fail throwing the following exception: `domain X does not allow null values`

Hope I provided sufficient information! If not, please ask.

Kind regards,
Denis

All 11 comments

I... Have no idea what you're asking. Please explain in words what you're after and then I can read the code with a little context.

I realised now, that this does not apply only forthe above RLS implementation... I'll elaborate.

Lets say we have the following domains:

CREATE DOMAIN created_time timestamp WITHOUT TIME ZONE NOT NULL DEFAULT (now() AT TIME ZONE 'UTC');
COMMENT ON DOMAIN created_time IS 'Timestamp of the creation of data entries';

CREATE DOMAIN updated_time timestamp WITHOUT TIME ZONE NOT NULL DEFAULT (now() AT TIME ZONE 'UTC');
COMMENT ON DOMAIN updated_time IS 'Timestamp of the performed update on data entries';

and a table:

CREATE TABLE public.document (
    id    uuid PRIMARY KEY,
    title text,

    created_at created_time,
    updated_at updated_time
);

Obviously the public.document cannot exist without having created_at and updated_at columns set!

Now lets say we have a PL/pgSQL function defined like so:

CREATE FUNCTION public.update_or_delete_document(
    document_id uuid
) RETURNS public.document AS
$$
DECLARE
    next_document RECORD;
BEGIN
    --                                                                                 --
    -- Here goes the logic which decides if the document should be updated or deleted. --
    --                                                                                 --

    SELECT * INTO next_document FROM public.document WHERE id = update_or_delete_document.document_id;
    RETURN next_document; -- next_document is either `public.document` or NULL
END;
$$
LANGUAGE plpgsql VOLATILE;

_I know that an update_or_delete function does not make sense (the RLS implementation above does), but bear with me..._

If a user runs a GraphQL query against Postgraphile like so:

mutation UpdateOrDeleteDocument {
    updateOrDeleteDocument(input:{
        clientMutationId: ""
        documentId: "1c8d6384-993a-4af4-9904-dcb14c5880d4"
    }) {
        clientMutationId
        document {
            id
            title
        }
    }
}

If the function decides to delete the document and return NULL, It will actually raise an exception saying: domain updated_time does not allow null values!

Why?

Simple because Postraphile first runs the following query:

with __local_0__ as (
    select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (__local_0__)::text from __local_0__;

which will have (__local_0__)::text set to: (,,,,) if the public.update_or_delete_document(uuid) function returns NULL!

Then runs this one:

with __local_0__ as (
    select (str::"public"."document").*
    from unnest(($1)::text[]) str
)
-- ...rest

_Where $1 is the result of first Postgraphile query_

This means that the last query might raise the above mentioned exception as you cannot unnest (,,,,) to public.document becuase of the defined DOMAINs with NOT NULL constraints!

Ah; so you're saying

with __local_0__ as (
    select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (__local_0__)::text from __local_0__;

Should actually be:

with __local_0__ as (
    select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (__local_0__)::text from __local_0__ where not (__local_0__ is null);

(Note: foo is not null and not (foo is null) are different.)

I have this check in many places, I can't think of any reason off-hand that we can't have this here for single mutations; I'll have to evaluate this and it could well be that we add it. Setof mutations, on the other hand, are a little more complex.

Actually it should probably be in the select:

with __local_0__ as (
    select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (
  case
  when __local_0__ is null then null::text
  else __local_0__::text
  end
) from __local_0__;

Yes, you got it! 馃槃

If you take a look at the first comment (usage with RLS), it sure does make sense having this check inside a mutation query!
TLDR: If the RLS restricts seeing the row after the update, the function will return NULL!

Looking at the proposed fixed query:

with __local_0__ as (
    select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (
  case
  when __local_0__ is null then null::text
  else __local_0__::text
  end
) from __local_0__

it will work for SETOF out-of-the-box too! Since every __local_0__ is a new row in the result. Even if the result of the CTE is:
(,,,,)
(05bca130-0150-465c-b304-35de5526daf5,,"2018-05-08 15:21:36.512504+02","2018-05-08 15:21:36.512504+02")
(,,,,)

I would really appreciate it if you could put fix this in the next version, as it will solve a lot of issues I am currently facing (and surely for others too).

Yeah; that's why I rewrote it to do that so it works for setof 馃憤

So turns out I hit an issue similar to this a few days ago: https://github.com/graphile/graphile-build/issues/217

What's more, it turns out that PostgreSQL 9.4 vs PostgreSQL 10 handle these nulls differently; PG10 behaves a lot better, PG9.4 throws some tantrums in certain edge cases.

Not sure if that affects this issue explicitly yet, but just mentioning it for context.

Minimal SQL to reproduce issue:

create schema c;
create domain c.not_null_timestamp timestamptz not null default now();
create table c.issue756 (
  id serial primary key,
  ts c.not_null_timestamp
);
create function c.issue756_mutation() returns c.issue756 as $$
begin
  return null;
end;
$$ language plpgsql volatile;

Minimal GraphQL to demonstrate issue:

mutation {
  issue756Mutation(input:{}) { issue756 { id ts } }
}

Expected result: success with null payload

Actual result:

[GraphQLError: domain c.not_null_timestamp does not allow null values]

Just one of these days I'd like to receive an issue that I can fix in like 2 minutes...

https://github.com/graphile/graphile-build/pull/220

No I feel I have to extend https://github.com/graphile/graphile-build/issues/217 to cover more situations for consistency.

I know the feeling... Does it go really deep? Anything I can help with?

We really need more tests; coming up with what tests to write (given we already have a load) is hard. Help there would be welcome.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jwdotjs picture jwdotjs  路  5Comments

tazsingh picture tazsingh  路  3Comments

tonyhschu picture tonyhschu  路  3Comments

angelosarto picture angelosarto  路  3Comments

srghma picture srghma  路  3Comments