First of all, Postgraphile IS SUPER AWESOME. You are doing an amazing job, I cannot thank you enough!
I'm submitting an improvement request regarding RLS support.
PostGraphile version: 4.0.0-beta.5
-- NOTE:
-- Table `public.flow_step` is behind a RLS policy controlled by the user rights.
-- NOTE:
-- Table `public.flow` is also behind a RLS policy. It is controled depending on the `public.flow_step` access.
-- Meaning that if the user has access to the specific `public.flow_step`, he has access to referenced `public.flow` too;
-- Respectively, no access to `public.flow_step` = no access to `public.flow`.
-- NOTE:
-- The `private.set_next_flow_step(uuid)` function allows the user to set the next `public.flow_step` without him
-- knowing what the next step actually is. The function itself is defined as `SECURITY DEFINER` meaning that the
-- function will have access to ALL steps which will allow it to calculate the next one. (My Postgraphile config
-- does not allow introspection of the `private` schema so the function is safe).
CREATE FUNCTION private.set_next_flow_step(
flow_id uuid
) RETURNS public.flow AS
$$
WITH next_flow_step AS (
SELECT fs.id FROM public.flow_step AS fs
WHERE (
-- Logic for calculating the next step goes here... --
)
LIMIT 1
)
UPDATE public.flow
SET step_id=(SELECT id FROM next_workflow_step)
WHERE (id = set_next_flow_step.flow_id)
RETURNING *
$$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
-- NOTE:
-- The `public.set_next_flow_step(uuid)` function is the introspected/public version of the above mentioned function.
-- It PERFORMs the `private.set_next_flow_step(uuid)`, re-SELECTs the `public.flow` in focus and RETURNs it. (The re-SELECT
-- is very important for finding out if the user has access to the next step, because the `public.set_next_flow_step(uuid)`
-- function is run with `current_user` rights)
-- IMPORTANT: Since the user might not have access to the next flow step, the result can be either `NULL` or `public.flow`.
CREATE FUNCTION public.set_next_flow_step(
flow_id uuid
) RETURNS public.flow AS
$$
DECLARE
next_flow RECORD;
BEGIN
PERFORM private.set_next_flow_step(set_next_flow_step.flow_id);
SELECT * INTO next_flow FROM public.flow WHERE (id = set_next_flow_step.flow_id)
RETURN next_flow;
END;
$$
LANGUAGE plpgsql VOLATILE;
----
-- The CLIENT runs a GraphQL mutation joining the `public.flow` in the response...
----
-- NOTE:
-- Postgraphile will run the above functions in 2 steps.
-- 1. step
-- PARAMETER: $1 = <some-unique-uuid>
with __local_0__ as (
select __local_1__.* from "public"."set_next_flow_step"($1) __local_1__
)
select (__local_0__)::text from __local_0__
-- 2. step
-- PARAMETER: $1 = <result-of-step-1>
with __local_0__ as (
select (str::"public"."flow").*
from unnest(($1)::text[]) str
)
-- ... rest of the query ... --
-- FIXME:
-- Now the fun part, if the 1. step produced a NULL value (because the user does not have access to the next step),
-- the parmeter will behave like so: $1 = '{"(,,,,,,,,,,)"}', instead of $1 = NULL!
-- This poses HUGE problems like: if `public.flow` has domain columns which are defined with `NOT NULL`,
-- the function will ALWAYS fail throwing the following exception: `domain X does not allow null values`
Hope I provided sufficient information! If not, please ask.
Kind regards,
Denis
I... Have no idea what you're asking. Please explain in words what you're after and then I can read the code with a little context.
I realised now, that this does not apply only forthe above RLS implementation... I'll elaborate.
Lets say we have the following domains:
CREATE DOMAIN created_time timestamp WITHOUT TIME ZONE NOT NULL DEFAULT (now() AT TIME ZONE 'UTC');
COMMENT ON DOMAIN created_time IS 'Timestamp of the creation of data entries';
CREATE DOMAIN updated_time timestamp WITHOUT TIME ZONE NOT NULL DEFAULT (now() AT TIME ZONE 'UTC');
COMMENT ON DOMAIN updated_time IS 'Timestamp of the performed update on data entries';
and a table:
CREATE TABLE public.document (
id uuid PRIMARY KEY,
title text,
created_at created_time,
updated_at updated_time
);
Obviously the public.document cannot exist without having created_at and updated_at columns set!
Now lets say we have a PL/pgSQL function defined like so:
CREATE FUNCTION public.update_or_delete_document(
document_id uuid
) RETURNS public.document AS
$$
DECLARE
next_document RECORD;
BEGIN
-- --
-- Here goes the logic which decides if the document should be updated or deleted. --
-- --
SELECT * INTO next_document FROM public.document WHERE id = update_or_delete_document.document_id;
RETURN next_document; -- next_document is either `public.document` or NULL
END;
$$
LANGUAGE plpgsql VOLATILE;
_I know that an update_or_delete function does not make sense (the RLS implementation above does), but bear with me..._
If a user runs a GraphQL query against Postgraphile like so:
mutation UpdateOrDeleteDocument {
updateOrDeleteDocument(input:{
clientMutationId: ""
documentId: "1c8d6384-993a-4af4-9904-dcb14c5880d4"
}) {
clientMutationId
document {
id
title
}
}
}
If the function decides to delete the document and return NULL, It will actually raise an exception saying: domain updated_time does not allow null values!
Simple because Postraphile first runs the following query:
with __local_0__ as (
select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (__local_0__)::text from __local_0__;
which will have (__local_0__)::text set to: (,,,,) if the public.update_or_delete_document(uuid) function returns NULL!
Then runs this one:
with __local_0__ as (
select (str::"public"."document").*
from unnest(($1)::text[]) str
)
-- ...rest
_Where $1 is the result of first Postgraphile query_
This means that the last query might raise the above mentioned exception as you cannot unnest (,,,,) to public.document becuase of the defined DOMAINs with NOT NULL constraints!
Ah; so you're saying
with __local_0__ as (
select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (__local_0__)::text from __local_0__;
Should actually be:
with __local_0__ as (
select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (__local_0__)::text from __local_0__ where not (__local_0__ is null);
(Note: foo is not null and not (foo is null) are different.)
I have this check in many places, I can't think of any reason off-hand that we can't have this here for single mutations; I'll have to evaluate this and it could well be that we add it. Setof mutations, on the other hand, are a little more complex.
Actually it should probably be in the select:
with __local_0__ as (
select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (
case
when __local_0__ is null then null::text
else __local_0__::text
end
) from __local_0__;
Yes, you got it! 馃槃
If you take a look at the first comment (usage with RLS), it sure does make sense having this check inside a mutation query!
TLDR: If the RLS restricts seeing the row after the update, the function will return NULL!
Looking at the proposed fixed query:
with __local_0__ as (
select __local_1__.* from "public"."update_or_delete_document"($1) __local_1__
)
select (
case
when __local_0__ is null then null::text
else __local_0__::text
end
) from __local_0__
it will work for SETOF out-of-the-box too! Since every __local_0__ is a new row in the result. Even if the result of the CTE is:
(,,,,)
(05bca130-0150-465c-b304-35de5526daf5,,"2018-05-08 15:21:36.512504+02","2018-05-08 15:21:36.512504+02")
(,,,,)
I would really appreciate it if you could put fix this in the next version, as it will solve a lot of issues I am currently facing (and surely for others too).
Yeah; that's why I rewrote it to do that so it works for setof 馃憤
So turns out I hit an issue similar to this a few days ago: https://github.com/graphile/graphile-build/issues/217
What's more, it turns out that PostgreSQL 9.4 vs PostgreSQL 10 handle these nulls differently; PG10 behaves a lot better, PG9.4 throws some tantrums in certain edge cases.
Not sure if that affects this issue explicitly yet, but just mentioning it for context.
Minimal SQL to reproduce issue:
create schema c;
create domain c.not_null_timestamp timestamptz not null default now();
create table c.issue756 (
id serial primary key,
ts c.not_null_timestamp
);
create function c.issue756_mutation() returns c.issue756 as $$
begin
return null;
end;
$$ language plpgsql volatile;
Minimal GraphQL to demonstrate issue:
mutation {
issue756Mutation(input:{}) { issue756 { id ts } }
}
Expected result: success with null payload
Actual result:
[GraphQLError: domain c.not_null_timestamp does not allow null values]
Just one of these days I'd like to receive an issue that I can fix in like 2 minutes...
No I feel I have to extend https://github.com/graphile/graphile-build/issues/217 to cover more situations for consistency.
I know the feeling... Does it go really deep? Anything I can help with?
We really need more tests; coming up with what tests to write (given we already have a load) is hard. Help there would be welcome.