I have a schema capturing events and participants:
CREATE TABLE demo.participant(
id BIGINT DEFAULT,
name VARCHAR(255) NOT NULL
);
CREATE TABLE demo.event(
id BIGINT DEFAULT,
title VARCHAR(255) NOT NULL
);
CREATE TABLE demo.event_participant(
event_id BIGINT NOT NULL REFERENCES demo.event(id),
participant_id BIGINT NOT NULL REFERENCES demo.participant(id),
PRIMARY KEY(event_id,participant_id)
);
I can expose these tables directly with 3 postgrest views. However, I required that inserting/updating an event and the participants in that event should be atomic.
I think I need a single view that somehow data from event and event_participant, and is insertable/updateable.
Is there any way to do this with postgrest?
I'm curious that there hasn't been a response to this. How are people handling many-many relationships with postgrest?
Hey sorry for the slow response. I've been away on vacation this week and not on the computer much.
There is currently no general way to batch operations in a single transaction. As a workaround you can create a stored procedure to do complicated updates and call it with the rpc interface.
It would be good to invent a nice interface for batch operations. We could take advantage of HTTP/2 multiplexing and share a single transaction per request. Then the client could send several requests in the same connection to execute atomically.
Thanks for the reply. An interface for batch operations would be awesome.
But apart from the "how can I do this atomically" question, I guess I was wondering if there was a better way to model the data relationship given the capabilities of postgrest now? Given my naive model above, If I want to update an event along with the list of participants associated, I need to:
- PUT the new event value
- DELETE the existing values from event_participant
- POST a csv to bulk insert the new values in event_participant
Perhaps, as you say the RPC approach is the best one.
The RPC is your best bet at the moment, but we really need a batch interface. I can see that being high priority after the upcoming v0.3 release.
Is this still a priority? When this is implemented, I will be able to use PostgREST in many of my projects.
Not in the immediate future but there is a possibility of manually implementing this by having a trigger on insert/update in the parent table take the values that are meant for the child table and send them there.
Our web server supports HTTP/2, so we can do this bulk operation stuff at some point but it'll take a significant amount of work and there are a number of more pressing issues to fix before attempting it.
I'm not sure about the relative stupidity of this suggestion, but couldn't you use custom headers (or reuse semantically appropriate standard ones) to track requests that should be wrapped in a single transaction?
X-TRANSACTION-START: f058ebd6-02f7-4d3f-942e-904344e8cde5
...
X-TRANSACTION-ID: f058ebd6-02f7-4d3f-942e-904344e8cde5
...
X-TRANSACTION-COMMIT: f058ebd6-02f7-4d3f-942e-904344e8cde5
Your suggestion makes sense. I've seen a related scheme recommended on various places on the web: expose an HTTP resource that represents transactions. Let me outline how it could work, then talk about some problems.
The server can expose /transactions which accepts only POST. When you post to it you can include a json object with options like the desired isolation level. It then responds with a transaction id (txid). Using this id you can access another endpoint, /transactions/:txid. DELETE would roll it back, POST (or maybe PUT) would commit it, GET would give statistics about it.
With the txid you could then send regular postgrest requests and either include it in a query param like &txid=foo or as an HTTP header if we find one that is appropriate. The web server thread would then find the existing db connection and resume sending commands on it. After doing regular postgrest requests the client would commit or rollback the transaction with HTTP calls on its resource.
OK that's the general idea, and here are challenges we would face:
txid_current(), but it is a simple number counting upward, easy to guess. We would have to obfuscate it with a big guid alias at the very least.So I still think HTTP/2 is the most elegant way, but it may have problems of its own like client compatibility. What do you think? Do you have solutions for these problems, or perhaps problems I didn't imagine?
How would HTTP/2 handle arbitrary SELECT/UPDATE type transactions?
What about - until a good way of utilizing HTTP2 connections can be established - just supporting a simple /transaction endpoint which handles RFC6902 JSON PATCH-esque payloads with a very specific content-type? This way we'd have a good stable baseline for what I feel is an absolutely essential part of a REST API today, and perhaps especially in this project, considering the transactional nature of PostgreSQL.
I'm thinking the format could look like this:
POST /transaction
Content-Type: application/vnd.pgrst.patch+json
[
{
"verb": "DELETE",
"endpoint": "/dogs?alive=is.false"
},
{
"verb": "POST",
"endpoint": "/dogs",
"data": "{\"name\": \"Foo\"}"
}
]
Aside from the http2 interface to implement this feature, we also need a way to restrict the cost of what could be many expensive queries/mutations in a single transaction and that way avoid the risk of clients ddosing the database.
I think this would be better enforced at the database level with a statement_timeout or perhaps an explain cost maximum, so maybe it would be better to address #249 before this is implemented.
Comment from the peanut gallery about request batching: there's a standard application/http content type defined in the HTTP/1.1 spec that represents a sequence of requests or a sequence of responses. Posting that to some sort of "batch operations" resource could serve as an alternative to clients creating, manipulating, and committing transaction resources with a series of requests.
there is a possibility of manually implementing this by having a trigger on insert/update in the parent table take the values that are meant for the child table and send them there.
This approach seems to be the best for current versions. However it would be great to see something more transactions-friendly in feature.
Not in the immediate future but there is a possibility of manually implementing this by having a trigger on insert/update in the parent table take the values that are meant for the child table and send them there.
The approach seems something like https://github.com/tobyhede/postgresql-event-sourcing, but it would need a complete rewrite of the application logic, something that is not always possible.
Even if the http2 interface is implemented, that still wouldn't solve the problem of INSERTing a row dependent on the id of another row(child/parent table).
I made a proposal for solving that part in https://github.com/PostgREST/postgrest/issues/818#issuecomment-409981816.
Just throwing out an idea here:
Implement a /batch or /transaction endpoint. You can POST your array of commands:
[
{ "endpoint": "/authors", "payload": { "author_id" : 1, "first": "Robert", "last": Frost" } },
{ "endpoint": "/works", "payload": { "work_id": 1, "author_id": 1, "title": "Mending Wall" } },
{ "endpoint": "/works", "payload": { "work_id": 2, "author_id": 1, "title": "The Road Not Taken"} }
]
These would all be performed inside of a transaction. Should work for other HTTP requests as well.
Thoughts?
What's the benefit of a transaction interface over exposing an rpc function that takes json objects, arrays and what not? I know it's been said before, but I'm saying it again; you can do it all in your function.
Sure, creating such a function means putting some effort handling json and arrays (of json objects) as arguments. But, implementing an http request to construct a transaction (as proposed above) requires effort too.
I would put my effort in building the Postgres function instead of such an http request.
There's an argument to be made about keeping PostgREST lean because that's more likely to not introduce extra maintenance and bugs. Offload your requirements to Postgres, not PostgREST when possible.
The benefit is that while a json parsing /rpc function would need to be individually coded for every parent/child relationship of every project that uses PostgREST, a transaction interface as proposed above would be coded once into PostgREST. I agree that a json parsing /rpc function can be written, but it's not necessary easy to write one well that handles error cases correctly and reports meaningful error messages. And these /rpc function have a big maintenance problem in that any change to the schema would require changes to the /rpc function. There's probably more to consider regarding how to declare the arguments meaningfully in the OpenAPI spec, but I haven't thought that through.
From the client's perspective, I think it's a minimal effort to batch the calls.
If handling multiple-table relationships transactionally can be somehow be accommodated in PostgREST, I believe it would appeal to a significantly larger user base. As you can see above, others like me would find PostgREST the perfect tool if it could handle this very common use case.
@dmulfinger Your proposal doesn't account for auto generated pks(like serial or uuid). I mean, how can you insert a parent's childs rows when you don't know it's id beforehand?
Having to come up with a json syntax that considers these things and then implement it would be brittle design(lots of bug could arise) in comparison to the HTTP2 interface and the flexibility it'd allow(also no json parsing by postgrest, which impacts performance on larger payloads).
Design-wise, I think this issue is solved with this proposal(handles multi-table relationships inserting) plus the HTTP2 interface, only thing that's left it's the implementation.
Most helpful comment
What about - until a good way of utilizing HTTP2 connections can be established - just supporting a simple
/transactionendpoint which handles RFC6902 JSON PATCH-esque payloads with a very specific content-type? This way we'd have a good stable baseline for what I feel is an absolutely essential part of a REST API today, and perhaps especially in this project, considering the transactional nature of PostgreSQL.I'm thinking the format could look like this: