Is your feature request related to a problem? Please describe.
As a client I would like to be able to call a HTTP API, provide a callback URL, register a webhook and subscribe to articles change events such as a new article, an update of an article or a deletion and others
Describe the solution you'd like
The feature itself would consist in a few parts: an API call that allows the client to register a webhook, a pub/sub layer to publish events on, dispatch them to listeners and, in this case, deliver the events to the webhooks callbacks, and a UI in the user's settings where they can see the webhooks
This description is based on the following principles and it's the product of conversations with @maestromac, @lightalloy and @benhalpern
The API should ideally consist of three endpoints:
GET /api/webhooks/events
which returns a list of events a client can subscribe to:{
"articles": [
"article_created",
"article_updated",
"article_removed",
]
}
This should scale in case we add other events related to articles or other types related to comments or any other thing.
POST /api/webhooks
to register a new webhook, with a payload that looks something like:{
"events": ["article_created"],
"url": "https://example.com/dev/articles-callback",
}
This way the client's user would subscribe to all article_created
events for their own articles and the articles of all organizations they belong to.
in case they wanted to subscribe to a subset of organizations:
{
"events": ["article_created"],
"url": "https://example.com/dev/articles-callback",
"organizations": [123, 456]
}
in case they wanted to subscribe to no organizations and only on the articles they authored themselves (even though they might belong to an org):
{
"events": ["article_created"],
"url": "https://example.com/dev/articles-callback",
"organizations": []
}
Not sure how a user would subscribe only to its personal articles excluding those belonging to an org but we can figure it out as we go, maybe adding an explicit parameter in the payload.
PUT /api/webhooks/:id
to update a subscription. Let's say the URL has to change, or you want to start listening to more events or less and so on.
GET /api/webhooks/:id
to retrieve a subscription
DELETE /api/webhooks/:id
to delete a subscription and its webhook
All of these API endpoints, even the list of events, should be behind authentication (API key or OAuth 2).
Bonus: maybe the id
should be any hash of the URL so that the client does not need to know the DEV id of the specific webhook?
The web app should be able to deliver events and their payloads asynchronously to the interested registered webhooks. When a new event is dispatched, the app queries asynchronously the webhooks table, fetches the list of interested parties (the rows belonging to the article's user that contain in the events array the event the app has just dispatched) and, again asynchronously, delivers the payload via HTTP POST to the callbacks.
A special attention has to be given to the design of the payload, since we don't control who are the listeners (the webhooks) we need to be careful not to change the payload once we agreed on its content, and possibly make it so that it's extendable without breaking existing clients.
A possible payload:
{
"event_timestamp": "20190814T12:00:00Z",
"event_name": "article_created",
"article_created": {
"title": "Title",
"body_markdown": "...",
"...": "...",
}
}
The payload shall contain a timestamp (we chose a UTC RFC3339 representation but maybe we can use unix epoch), the name of the event and a sub payload under the name of the event. This way the name of the event can be used as a key to find the article in the payload, this should "scale" for all types of events, we might be wrong on this :)
In the case of article_created
and article_updated
the sub-payload is the whole representation of the article (more or less what we send in the REST API we guess), in the case of article_destroyed
it will be either empty or not be present at all.
The payloads for each event will need to be documented.
When the API (or UI) registers a new webhook a new row in the webhooks table is created. This table should contain at least the following fields:
user
: the user that registered the event, with a foreign key to the users
tableurl
: the URL of the callback to send the payload toevents
: an indexed array of all the events the callback shall receiveThe UI would be placed somewhere in the setings and be a simple CRUD interfaced to the webhooks replicating the REST API
There are no particular implementation details here, it'd be a standard CRUD API plus the endpoint to receive the list of available events.
Since we've decided to implement this internally many possible details and concerns related to having a third party pub/sub or something based on Redis (which would add the latency of the round trip at least) are not to be considered for this issue.
The remaining details are:
the pub part of pub/sub shall use the async delayed job queue:
build the whole system manually or with a gem: possible options are wisper, PostgreSQL's LISTEN/NOTIFY (the advantage here is that we would off load the dispatching to the DB and be sure the data is already in before notifying the app), ActiveSupport::Notifications or a custom implementation. We should avoid callbacks as much as possible.
Other considerations (they don't need to be in the initial version):
If after a while the system doesn't respond to delivery we should register the exception (and ping DEV?), disable the webhook and then contact the owner
while ordering might be in a multi pub and multi sub for the same event instance might be a concern, in this first implementation we rely on the timestamp and the ability of the client do discard an event if the combination of event_name
and event_timestamp
is earlier than an already received event. In addition the client can at any time call the Articles API to have the current representation of an Article
should we store each delivered event? If so we need a table, a UUID for each event to be sent in a header for every post
My main security concern is a third party finding out a callback URL and sending to them a crafted payload to change their own representation of the user's article or telling them "this article has been destroyed" hoping they will destroy it too from their own DB.
We might need to figure out a verification system that could be something of the following: on creation of a webhook we send the client a token, this token will be used to generate a hash, different for every call (TOKEN|EVENT_NAME|EVENT_TIMESTAMP|HASH OF THE BODY), put this signature in a header. The client then can optionally use the token they have stored to verify that the payload is actually coming from DEV and has not been tampered with.
Describe alternatives you've considered
We considered building pub/sub on Redis, on Pusher or any other third party system but for the scope of this feature we decided against it.
A bonus would be to decouple the webhooks system from the pub/sub so that in theory in the future subscribers don't need to only be webhooks but can be both internal subscribers and anything else. We don't think we need to it now though.
Additional context
Thanks to @maestromac, @lightalloy and @benhalpern for providing precious feedback which resulted in this issue description. Please edit this or add anything else I might have forgotten.
We also welcome input for any community member, we will very likely take the lead on the implementation of this feature but we'd love to have any feedback.
I have started working on the issue and made the following steps:
I don't have a complete solution for the scopes right now, but I'll share my thoughts to start the discussion.
I suppose we'll want scopes like:
Currently, we have a lot of the API endpoints like analytics, comments, podcast_episodes, etc. Do we need to specify them as separate scopes at the moment?
It seems that we won't need this data for the webhooks integration, and the auth by token will keep the "full access" (according to the users themselves) at first, so we don't need to focus on these at the moment.
I've read about using the doorkeeper scopes and it checks for the authorization based on the actions like this:
before_action only: [:create, :update, :destroy] do
doorkeeper_authorize! :admin, :write
end
If we have separate scopes for drafts and published articles, we'll need to find out what scopes had the user authorized an app for. I haven't found a way to do it yet, though maybe it's possible, I'll do more research.
I was also worried about a situation when a user gives permission only to the published articles and then unpublishes a post, but it seems like unpublishing is not allowed, am I right?
Webhooks could be also seen as a resource and a doorkeeper scope, but for now, it seems like it's not required
This feature was implemented via the attached pull requests.
Most helpful comment
I have started working on the issue and made the following steps:
The next steps (besides possible improvements or changes based on reviews) could be: