Postgraphile: Accessing HTTP request data from PostgreSQL

Created on 5 Jan 2017  路  27Comments  路  Source: graphile/postgraphile

More of a question than anything else.

I'm writing an application which needs an audit log of changes,
I would like to implement that using triggers but I need to record
info from the HTTP request like the IP address of the client.

Is there any built-in way of doing that? Can I write some kind of plugin to pass additional data to postgresql (i.e. in addition to the JWT data)?

Most helpful comment

This is now possible via pgSettings

export postgraphql(process.env.DATABASE_URL, schemaName, {
  pgSettings: req => ({
    'user.id': `${req.session.passport.user}`,
    'http.headers.x-something': `${req.headers['x-something']}`,
    'http.method': `${req.method}`,
    'http.url': `${req.url}`,
    //...
  }),
})

All 27 comments

@jinty You can put ip into JWT claim, use it for validation and Postgraphql expose it in session via SET.

@kenota鈥檚 JWT idea sounds good, is there any reason you couldn鈥檛 use the Node.js API and add a middleware?

I鈥檝e also thought about refactoring JWT claims to live on request.jwt.claims... instead of jwt.claims... so that we could also have request.headers..., request.url..., etc.

@kenota Almost, but not quite, that gives me the IP which requested the JWT token, not the IP which is currently querying postgraphql.

@calebmer: Putting all that data into current settings sounds like a reasonable solution. I wouldn't mix the raw data (i.e. HTTP headers) with processed data (i.e. jwt.claims). I also wonder about encoding though when you get some real nasty junk coming in with the HTTP request: null bytes, non-UTF-8 encodings. Some people will want the raw data, others will want some level of pre-processing.

I was thinking more of a hook point in Javascript where I could ask graphql to run a function like this:

function before_request_hook(postgresql_connection, request) {
...
}

I've used pyramid's "tweens" quite a bit and find them very flexible/useful: http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/hooks.html#registering-tweens

Do you mean the simple hook or the full-blown "tween" API?

Some of the nice things of "tweens" is that you:

* have access to both the request and response
* the ability to run code before and after the request is processed
* keep some state while the response is generated via the function variables
* can stack them

It's a very powerful api.

Looking at the code, it seems easy to add a simple hook, but adding some
kind of "tween" API would need more careful thinking.

On Mon, Jan 09, 2017 at 06:15:17AM -0800, Caleb Meredith wrote:

That hook is actually a very good idea. It could open a door to all kinds of interesting things. If you wanted to make a PR for this, I鈥檇 be down for merging it :blush:

Here are some files you might want to look at:

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/calebmer/postgraphql/issues/299#issuecomment-271294265

--
Brian Sutherland

I made a rough sketch (totally untested) of what the hook could look like:

https://github.com/jinty/postgraphql/commit/f0e7ce75ecd0c6d4c608abe065875509ee684654

I'll note that you could go further and move some of the existing bits into middleware:

  • postgres connection and transaction handling (wanna send read-only queries to a group of replicas?)
  • setupRequestPgClientTransaction
  • logging

Thinking out loud, otherwise i forget this stuff:

I had another run at this and have seen that recent work has added withPostGraphQLContext (https://github.com/calebmer/postgraphql/commit/1145689054f7f7a6a5542ec34aa3827af3bc572a) which is great :) a way to implement this would be a hook to override withPostGraphQLContext.

It also makes things a little complex as there seems to be no way (by design) of getting the request object from within withPostGraphQLContext.

A better list of advanced features enabled by this:
* send GET HTTP requests to read-only replicas to scale
* mark GET HTTP requests as "BEGIN READ ONLY" to improve performance under serialization isolation level
* an audit log (my original usecase)

I was looking at the isolation levels myself; I think our queries ought to have repeatable read level at least so that the different queries that run together to produce the result don't suffer from race conditions.

I'm using withPostGraphQLContext for server-side execution of GraphQL queries (for isomorphic Relay); but extending it so you could pass in pre/post hooks (e.g. via the options object) would be good.

Care to supply a pseudo-code example?

https://github.com/calebmer/postgraphql/blob/39cff1d123c282667e339064a9a5be9538e330a6/src/postgraphql/withPostGraphQLContext.ts#L54-L73

On Tue, Feb 07, 2017 at 04:58:32AM -0800, Benjie Gillam wrote:

I was looking at the isolation levels myself; I think our queries ought to have repeatable read level at least so that the different queries that run together to produce the result don't suffer from race conditions.

I'm pretty sure that the right answer is "it depends on the situation"
and probably the best is to defer to the server default in most cases.

However it is polite to communicate to the database if you know have a
READ ONLY transaction. That allows the DB to optimize in ways it
otherwise wouldn't be able to. However, postgres does raise an error if
something does try to write.

I'm using withPostGraphQLContext for server-side execution of GraphQL queries (for isomorphic Relay); but extending it so you could pass in pre/post hooks (e.g. via the options object) would be good.

Not sure a pre/post hook is the best way, for the read-only transaction
usecase you ideally want to change this line:

    await pgClient.query('begin')

to something like:

    if (query_type === 'mutation') {
        await pgClient.query('begin')
    } else {
        await pgClient.query('begin READ ONLY')
    }

and have one less round-trip to the database.

Also, for the usecase of choosing to execute reads from replicas, you
probably want to change:

const pgClient = await pgPool.connect()

to use a different pool depending on whether you are executing a
mutation or not.

Care to supply a pseudo-code example?

What I had thought about was encapsulating authentication and query
execution like this:

https://github.com/jinty/postgraphql/commit/1c41ff71c5efa41411c292c6af5251004cda864c

then replacing makeExecutor or makeAuthenticator allows for a lot of
usecases.

https://github.com/calebmer/postgraphql/blob/39cff1d123c282667e339064a9a5be9538e330a6/src/postgraphql/withPostGraphQLContext.ts#L54-L73

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/calebmer/postgraphql/issues/299#issuecomment-277991774

--
Brian Sutherland

Interesting. I don't have time to fully evaluate right now; hopefully Caleb can give some feedback otherwise ping again in a weeks time.

However it is polite to communicate to the database if you know have a
READ ONLY transaction. That allows the DB to optimize in ways it
otherwise wouldn't be able to. However, postgres does raise an error if
something does try to write.

I didn鈥檛 know we could have read-only transactions, that鈥檚 awesome!

That鈥檚 something I鈥檇 be willing to try incorporating into PostGraphQL itself, as you wrote in your pseudo-code example we would just make sure that the operation is not a mutation and switch begin for begin read only on that. Switching the Postgres pool is probably something we wouldn鈥檛 support internally, but we should definitely provide a hook to allow you to control that.

If we could add that then are there any other ways you would want to setup the transaction? I think connecting a client and changing begin/commit should probably be two separate hooks. Yes it is less general, but if we allow something general we miss out on this: https://github.com/calebmer/postgraphql/blob/1145689054f7f7a6a5542ec34aa3827af3bc572a/src/postgraphql/withPostGraphQLContext.ts#L50-L51

What if instead of taking a Postgres pool, we take a connectPgClient function which defaults to connecting with a pgPool, but could be implemented with anything.

For an audit log, I鈥檓 not sure if there was an answer to why an express middleware would not be sufficient. That insight could help inform what we need to implement at the PostGraphQL level.

Would be good to start the transactions with both:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ [ONLY/WRITE];

On Tue, Feb 07, 2017 at 04:37:38PM -0800, Caleb Meredith wrote:

However it is polite to communicate to the database if you know have a
READ ONLY transaction. That allows the DB to optimize in ways it
otherwise wouldn't be able to. However, postgres does raise an error if
something does try to write.

I didn鈥檛 know we could have read-only transactions, that鈥檚 awesome!

That鈥檚 something I鈥檇 be willing to try incorporating into PostGraphQL itself, as you wrote in your pseudo-code example we would just make sure that the operation is not a mutation and switch begin for begin read only on that.

Great, I'll make a separate issue just for this.

Switching the Postgres pool is probably something we wouldn鈥檛 support internally, but we should definitely provide a hook to allow you to control that.

This is something I don't need right now. I just mentioned it because I
have needed to implement it in the past on another web frameworks for
scalability.

If we could add that then are there any other ways you would want to setup the transaction? I think connecting a client and changing begin/commit should probably be two separate hooks. Yes it is less general, but if we allow something general we miss out on this: https://github.com/calebmer/postgraphql/blob/1145689054f7f7a6a5542ec34aa3827af3bc572a/src/postgraphql/withPostGraphQLContext.ts#L50-L51

In the past I've had to catch database serialization errors and retry up
to N times.

What if instead of taking a Postgres pool, we take a connectPgClient function which defaults to connecting with a pgPool, but could be implemented with anything.

For an audit log, I鈥檓 not sure if there was an answer to why an express middleware would not be sufficient. That insight could help inform what we need to implement at the PostGraphQL level.

Audit logs are commonly implemented with triggers like this:

https://wiki.postgresql.org/wiki/Audit_trigger#The_trigger

Ideally you want some reference between the audit log and the HTTP
request log. Express middleware looks like it can create the HTTP
request log, but since you don't have access to pgClient, you can't make
the connection between the audit log and the HTTP request log.

But, boiling it down, I think this can be resolved by a postgraphql
optionally setting the value of an http header via set_config().

In this way, a front end component (e.g. express middleware, nginx, HTTP
loadbalancer) would log the request and add an HTTP header with the
"Request ID". postgraphql would make the value of that header available
to PostgreSQL via set_config() and the audit trigger would pick it up
from there.

So the only thing postgraphql would need is an option like
"--save-header X-RequestId" which would call:
set_config('x-requestid', value, true)

--
Brian Sutherland

On Thu, Feb 09, 2017 at 12:39:05AM -0800, Benjie Gillam wrote:

Would be good to start the transactions with both:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ [ONLY/WRITE];

I'd be afraid of enforcing the isolation level. What is needed can vary
quite a bit depending on the situation.

However, setting READ ONLY is always a good idea if you know that no
data should be modified.

--
Brian Sutherland

The reason I'm suggesting repeatable read is because postgraphql can run quite a few SQL queries (certainly on master anyway!) which opens the risk of race conditions with two simultaneous mutations. But since we do the entire GraphQL execution within a transaction if it were to throw an error whilst GraphQL draws the result it could rollback the transaction instead of setting the isolation level. Currently it doesn't though: https://github.com/calebmer/postgraphql/blob/2f25ce8a9567e956f6dac55c237a1bf2e9e38003/src/postgraphql/withPostGraphQLContext.ts#L73

On Thu, Feb 09, 2017 at 01:24:13AM -0800, Benjie Gillam wrote:

The reason I'm suggesting repeatable read is because postgraphql can run quite a few SQL queries (certainly on master anyway!) which opens the risk of race conditions with two simultaneous mutations. But since we do the entire GraphQL execution within a transaction if it were to throw an error whilst GraphQL draws the result it could rollback the transaction instead of setting the isolation level. Currently it doesn't though: https://github.com/calebmer/postgraphql/blob/2f25ce8a9567e956f6dac55c237a1bf2e9e38003/src/postgraphql/withPostGraphQLContext.ts#L73

What do you mean by error? If there is an error on the postgres level
within a transaction then "commit" actually does a rollback. I would
hope that with any JS level error the transaction were explicitly rolled
back.

Repeatable Read probably imposes a performance penalty and the postgres
docs say:
"Applications using this level must be prepared to retry transactions
due to serialization failures."

Even with repeatable read you can still get anomalies, if you want to be
"as safe as can be" from race conditions with two simultaneous mutations
you really want SERIALIZABLE. However then you need to be far more
careful with a bunch of stuff (see the section on "optimal performance
when relying on Serializable transactions"
https://www.postgresql.org/docs/9.5/static/transaction-iso.html)

In short, this is a horribly complex swamp and really does depend on the
exact situation. There is no right answer and the more I learn about it,
the less I understand;)

--
Brian Sutherland

What do you mean by error?

For example if there's a nested relation but the underlying data is removed before the second query executes causing GraphQL to throw on a NotNull constraint

I would hope that with any JS level error the transaction were explicitly rolled back.

I don't think this is currently the case:

https://github.com/calebmer/postgraphql/blob/2f25ce8a9567e956f6dac55c237a1bf2e9e38003/src/postgraphql/withPostGraphQLContext.ts#L73

On Thu, Feb 09, 2017 at 07:27:59AM -0800, Benjie Gillam wrote:

What do you mean by error?

For example if there's a nested relation but the underlying data is removed before the second query executes causing GraphQL to throw on a NotNull constraint

Ah, I see. Then it becomes a question of what level of errors justifies
the pain of moving to a higher isolation level if you have a situation
where data is inserted and deleted frequently.

I would hope that with any JS level error the transaction were explicitly rolled back.

I don't think this is currently the case:

https://github.com/calebmer/postgraphql/blob/2f25ce8a9567e956f6dac55c237a1bf2e9e38003/src/postgraphql/withPostGraphQLContext.ts#L73

That's scary

/me hopes that mutations are always executed in one SQL statement.

--
Brian Sutherland

All mutations are one SQL statement currently, yes.

(But the queries on their results are not.)

On Thu, Feb 09, 2017 at 08:17:27AM -0800, Benjie Gillam wrote:

(But the queries on their results are not.)

The bigger fear is having half a mutation enter the database because
some JS code errors and the transaction is committed regardless. You're
probably safe from that.

Though I would argue that the safest way is to always rollback and send
a HTTP 500 on any unhanded error.

--
Brian Sutherland

+1 for being able to access request data

For what it's worth, I'd be happy with this living in the user session (per @calebmer's suggestion to add request.headers). For my use case, I'd like to just be able to call request info from within a stored procedure, e.g.

INSERT INTO logs.register (person_id, ip_address, timestamp)
VALUES ('12345', current_setting('request.ip_address'), CURRENT_TIMESTAMP);

A hack to accomplish that would be modifying the JWT that's passed through, then reading it off of the jwt.claim namespace. But yes, there should be a better way (you can acheive it with withPostGraphQLContext by intercepting the pgClient instance and running the select set_config(..., true) yourself, but that involves taking over the entire HTTP handling yourself).

OK, that could work, although I'd still like to initiate the mutations client-side. Is there a way to combine the two approaches? I'd imagine something like the following:

  • extract the GraphQL query from the request
  • create a pg_pool object, run the set_config command directly on the DB
  • pass the same pool object to withPostGraphQLContext and then pass the request query to the callback

Does that seem basically right?

withPostGraphQLContext does have access to the pgClient:

https://github.com/calebmer/postgraphql/blob/a88a9836722c5d42c7cae93fe901c095d7c704f6/src/postgraphql/withPostGraphQLContext.ts#L65-L68

You effectively want to replace what the HTTP node is doing; namely:

https://github.com/calebmer/postgraphql/blob/a88a9836722c5d42c7cae93fe901c095d7c704f6/src/postgraphql/http/createPostGraphQLHttpRequestHandler.js#L388-L403

i.e. you can just add a line above here:

https://github.com/calebmer/postgraphql/blob/a88a9836722c5d42c7cae93fe901c095d7c704f6/src/postgraphql/http/createPostGraphQLHttpRequestHandler.js#L394

that does whatever you want using the $$pgClient that's stored in context. Of course you'll have to do some hackery to get $$pgClient: import { $$pgClient } from 'postgraphql/postgres/inventory/pgClientFromContext' maybe?

https://github.com/calebmer/postgraphql/blob/a88a9836722c5d42c7cae93fe901c095d7c704f6/src/postgres/inventory/pgClientFromContext.ts#L5

You don't want to effect the pgPool before calling withPostGraphQLContext because withPostGraphQLContext does begin; ... commit; for you; and you want your set_config to have true as it's third argument to be transaction-based (for security).

OK, will give that a go, or failing that submit a pull request! Many thanks for the pointers, much appreciated.

This is now possible via pgSettings

export postgraphql(process.env.DATABASE_URL, schemaName, {
  pgSettings: req => ({
    'user.id': `${req.session.passport.user}`,
    'http.headers.x-something': `${req.headers['x-something']}`,
    'http.method': `${req.method}`,
    'http.url': `${req.url}`,
    //...
  }),
})
Was this page helpful?
0 / 5 - 0 ratings

Related issues

tonyhschu picture tonyhschu  路  3Comments

james-ff picture james-ff  路  4Comments

k-ogawa-1988 picture k-ogawa-1988  路  3Comments

giacomorebonato picture giacomorebonato  路  3Comments

ssomnoremac picture ssomnoremac  路  5Comments