Postgrest: Multiple requests in a transaction

Created on 13 Sep 2015 · 20Comments · Source: PostgREST/postgrest

I have a schema capturing events and participants:

CREATE TABLE demo.participant(
  id BIGINT DEFAULT,
  name VARCHAR(255) NOT NULL
  );

CREATE TABLE demo.event(
  id BIGINT DEFAULT,
  title VARCHAR(255) NOT NULL
  );

CREATE TABLE demo.event_participant(
  event_id BIGINT NOT NULL REFERENCES demo.event(id),
  participant_id BIGINT NOT NULL REFERENCES demo.participant(id),
  PRIMARY KEY(event_id,participant_id)
  );

I can expose these tables directly with 3 postgrest views. However, I required that inserting/updating an event and the participants in that event should be atomic.

I think I need a single view that somehow data from event and event_participant, and is insertable/updateable.

Is there any way to do this with postgrest?

enhancement

Source

timbod7

👍1

Most helpful comment

What about - until a good way of utilizing HTTP2 connections can be established - just supporting a simple /transaction endpoint which handles RFC6902 JSON PATCH-esque payloads with a very specific content-type? This way we'd have a good stable baseline for what I feel is an absolutely essential part of a REST API today, and perhaps especially in this project, considering the transactional nature of PostgreSQL.

I'm thinking the format could look like this:

POST /transaction
Content-Type: application/vnd.pgrst.patch+json
[
  {
    "verb": "DELETE",
    "endpoint": "/dogs?alive=is.false"
  },
  {
    "verb": "POST",
    "endpoint": "/dogs",
    "data": "{\"name\": \"Foo\"}"
  }
]

jmagnusson on 29 Jun 2017

👍3

All 20 comments

I'm curious that there hasn't been a response to this. How are people handling many-many relationships with postgrest?

timbod7 on 21 Sep 2015

Hey sorry for the slow response. I've been away on vacation this week and not on the computer much.

There is currently no general way to batch operations in a single transaction. As a workaround you can create a stored procedure to do complicated updates and call it with the rpc interface.

It would be good to invent a nice interface for batch operations. We could take advantage of HTTP/2 multiplexing and share a single transaction per request. Then the client could send several requests in the same connection to execute atomically.

begriffs on 21 Sep 2015

Thanks for the reply. An interface for batch operations would be awesome.

But apart from the "how can I do this atomically" question, I guess I was wondering if there was a better way to model the data relationship given the capabilities of postgrest now? Given my naive model above, If I want to update an event along with the list of participants associated, I need to:

  - PUT the new event value
  - DELETE the existing values from event_participant
  - POST a csv to bulk insert the new values in event_participant

Perhaps, as you say the RPC approach is the best one.

timbod7 on 21 Sep 2015

The RPC is your best bet at the moment, but we really need a batch interface. I can see that being high priority after the upcoming v0.3 release.

begriffs on 26 Oct 2015

Is this still a priority? When this is implemented, I will be able to use PostgREST in many of my projects.

dcominottim on 7 Oct 2016

Not in the immediate future but there is a possibility of manually implementing this by having a trigger on insert/update in the parent table take the values that are meant for the child table and send them there.

ruslantalpa on 7 Oct 2016

Our web server supports HTTP/2, so we can do this bulk operation stuff at some point but it'll take a significant amount of work and there are a number of more pressing issues to fix before attempting it.

begriffs on 7 Oct 2016

I'm not sure about the relative stupidity of this suggestion, but couldn't you use custom headers (or reuse semantically appropriate standard ones) to track requests that should be wrapped in a single transaction?

X-TRANSACTION-START: f058ebd6-02f7-4d3f-942e-904344e8cde5
...
X-TRANSACTION-ID: f058ebd6-02f7-4d3f-942e-904344e8cde5
...
X-TRANSACTION-COMMIT: f058ebd6-02f7-4d3f-942e-904344e8cde5

ephetic on 9 May 2017

👍1

Your suggestion makes sense. I've seen a related scheme recommended on various places on the web: expose an HTTP resource that represents transactions. Let me outline how it could work, then talk about some problems.

How it could work

The server can expose /transactions which accepts only POST. When you post to it you can include a json object with options like the desired isolation level. It then responds with a transaction id (txid). Using this id you can access another endpoint, /transactions/:txid. DELETE would roll it back, POST (or maybe PUT) would commit it, GET would give statistics about it.

With the txid you could then send regular postgrest requests and either include it in a query param like &txid=foo or as an HTTP header if we find one that is appropriate. The web server thread would then find the existing db connection and resume sending commands on it. After doing regular postgrest requests the client would commit or rollback the transaction with HTTP calls on its resource.

Challenges

OK that's the general idea, and here are challenges we would face:

Making sure that users couldn't spoof the txid to get their requests executed under another user's transaction with escalated permissions. PostgreSQL already has the notion of txid, accessible with txid_current(), but it is a simple number counting upward, easy to guess. We would have to obfuscate it with a big guid alias at the very least.
We would want an aggressive inactivity timeout on the open transactions so that a user couldn't easily lock up all our db pool connections.
The URLs I proposed don't match the postgrest url conventions or typical verbs. Could adjust it a little, but it makes the interface inconsistent.
Have to find a way to maintain affinity between client and postgrest server under load balancing. Sending the updates in an HTTP/2 request would get around this problem.
Holding transactions open is always expensive because postgresql is an MVCC system. Memory gets used, rows can't be vacuumed, etc. Waiting for all those round-trip HTTP requests to finish holds the transaction open longer than sending the requests in a single HTTP/2 request.

So I still think HTTP/2 is the most elegant way, but it may have problems of its own like client compatibility. What do you think? Do you have solutions for these problems, or perhaps problems I didn't imagine?

begriffs on 10 May 2017

guid alias sounds sufficient
reasonable, if configurable
should be out-of-band. perhaps rpc endpoint
umm...pass
yeah, pass again

How would HTTP/2 handle arbitrary SELECT/UPDATE type transactions?

ephetic on 10 May 2017

I'm thinking the format could look like this:

POST /transaction
Content-Type: application/vnd.pgrst.patch+json
[
  {
    "verb": "DELETE",
    "endpoint": "/dogs?alive=is.false"
  },
  {
    "verb": "POST",
    "endpoint": "/dogs",
    "data": "{\"name\": \"Foo\"}"
  }
]

jmagnusson on 29 Jun 2017

👍3

Aside from the http2 interface to implement this feature, we also need a way to restrict the cost of what could be many expensive queries/mutations in a single transaction and that way avoid the risk of clients ddosing the database.

I think this would be better enforced at the database level with a statement_timeout or perhaps an explain cost maximum, so maybe it would be better to address #249 before this is implemented.

steve-chavez on 16 Aug 2017

Comment from the peanut gallery about request batching: there's a standard application/http content type defined in the HTTP/1.1 spec that represents a sequence of requests or a sequence of responses. Posting that to some sort of "batch operations" resource could serve as an alternative to clients creating, manipulating, and committing transaction resources with a series of requests.

jackfirth on 6 Oct 2017

there is a possibility of manually implementing this by having a trigger on insert/update in the parent table take the values that are meant for the child table and send them there.

This approach seems to be the best for current versions. However it would be great to see something more transactions-friendly in feature.

wildsurfer on 12 Jan 2018

Not in the immediate future but there is a possibility of manually implementing this by having a trigger on insert/update in the parent table take the values that are meant for the child table and send them there.

The approach seems something like https://github.com/tobyhede/postgresql-event-sourcing, but it would need a complete rewrite of the application logic, something that is not always possible.

ilkosta on 13 Jan 2018

Even if the http2 interface is implemented, that still wouldn't solve the problem of INSERTing a row dependent on the id of another row(child/parent table).

I made a proposal for solving that part in https://github.com/PostgREST/postgrest/issues/818#issuecomment-409981816.

steve-chavez on 2 Aug 2018

Just throwing out an idea here:

Implement a /batch or /transaction endpoint. You can POST your array of commands:

[
{ "endpoint": "/authors", "payload": { "author_id" : 1, "first": "Robert", "last": Frost" } },
{ "endpoint": "/works", "payload": { "work_id": 1, "author_id": 1, "title": "Mending Wall" } },
{ "endpoint": "/works", "payload": { "work_id": 2, "author_id": 1, "title": "The Road Not Taken"} }
]

These would all be performed inside of a transaction. Should work for other HTTP requests as well.

Thoughts?

dmulfinger on 29 Aug 2018

❤1

What's the benefit of a transaction interface over exposing an rpc function that takes json objects, arrays and what not? I know it's been said before, but I'm saying it again; you can do it all in your function.

Sure, creating such a function means putting some effort handling json and arrays (of json objects) as arguments. But, implementing an http request to construct a transaction (as proposed above) requires effort too.

I would put my effort in building the Postgres function instead of such an http request.

There's an argument to be made about keeping PostgREST lean because that's more likely to not introduce extra maintenance and bugs. Offload your requirements to Postgres, not PostgREST when possible.

christiaanwesterbeek on 29 Aug 2018

The benefit is that while a json parsing /rpc function would need to be individually coded for every parent/child relationship of every project that uses PostgREST, a transaction interface as proposed above would be coded once into PostgREST. I agree that a json parsing /rpc function can be written, but it's not necessary easy to write one well that handles error cases correctly and reports meaningful error messages. And these /rpc function have a big maintenance problem in that any change to the schema would require changes to the /rpc function. There's probably more to consider regarding how to declare the arguments meaningfully in the OpenAPI spec, but I haven't thought that through.

From the client's perspective, I think it's a minimal effort to batch the calls.

If handling multiple-table relationships transactionally can be somehow be accommodated in PostgREST, I believe it would appeal to a significantly larger user base. As you can see above, others like me would find PostgREST the perfect tool if it could handle this very common use case.

dmulfinger on 29 Aug 2018

@dmulfinger Your proposal doesn't account for auto generated pks(like serial or uuid). I mean, how can you insert a parent's childs rows when you don't know it's id beforehand?

Having to come up with a json syntax that considers these things and then implement it would be brittle design(lots of bug could arise) in comparison to the HTTP2 interface and the flexibility it'd allow(also no json parsing by postgrest, which impacts performance on larger payloads).

Design-wise, I think this issue is solved with this proposal(handles multi-table relationships inserting) plus the HTTP2 interface, only thing that's left it's the implementation.

steve-chavez on 30 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings