Forem: Add pub/sub and webhooks to subscribe to events

Created on 14 Aug 2019  路  3Comments  路  Source: forem/forem

Is your feature request related to a problem? Please describe.

As a client I would like to be able to call a HTTP API, provide a callback URL, register a webhook and subscribe to articles change events such as a new article, an update of an article or a deletion and others

Describe the solution you'd like

The feature itself would consist in a few parts: an API call that allows the client to register a webhook, a pub/sub layer to publish events on, dispatch them to listeners and, in this case, deliver the events to the webhooks callbacks, and a UI in the user's settings where they can see the webhooks

This description is based on the following principles and it's the product of conversations with @maestromac, @lightalloy and @benhalpern

Feature description

  • in house implementation of pub/sub and webhooks, as simple and as generic as possible
  • synchronous subscriptions and asynchronous publication and delivery
  • the client anytime can use the Articles API to "refresh" its articles

API

The API should ideally consist of three endpoints:

  • GET /api/webhooks/events which returns a list of events a client can subscribe to:
{
  "articles": [
    "article_created",
    "article_updated",
    "article_removed",
  ]
}

This should scale in case we add other events related to articles or other types related to comments or any other thing.

  • POST /api/webhooks to register a new webhook, with a payload that looks something like:
{
  "events": ["article_created"],
  "url": "https://example.com/dev/articles-callback",
}

This way the client's user would subscribe to all article_created events for their own articles and the articles of all organizations they belong to.

in case they wanted to subscribe to a subset of organizations:

{
  "events": ["article_created"],
  "url": "https://example.com/dev/articles-callback",
  "organizations": [123, 456]
}

in case they wanted to subscribe to no organizations and only on the articles they authored themselves (even though they might belong to an org):

{
  "events": ["article_created"],
  "url": "https://example.com/dev/articles-callback",
  "organizations": []
}

Not sure how a user would subscribe only to its personal articles excluding those belonging to an org but we can figure it out as we go, maybe adding an explicit parameter in the payload.

  • PUT /api/webhooks/:id to update a subscription. Let's say the URL has to change, or you want to start listening to more events or less and so on.

  • GET /api/webhooks/:id to retrieve a subscription

  • DELETE /api/webhooks/:id to delete a subscription and its webhook

All of these API endpoints, even the list of events, should be behind authentication (API key or OAuth 2).

Bonus: maybe the id should be any hash of the URL so that the client does not need to know the DEV id of the specific webhook?

PUB

The web app should be able to deliver events and their payloads asynchronously to the interested registered webhooks. When a new event is dispatched, the app queries asynchronously the webhooks table, fetches the list of interested parties (the rows belonging to the article's user that contain in the events array the event the app has just dispatched) and, again asynchronously, delivers the payload via HTTP POST to the callbacks.

A special attention has to be given to the design of the payload, since we don't control who are the listeners (the webhooks) we need to be careful not to change the payload once we agreed on its content, and possibly make it so that it's extendable without breaking existing clients.

A possible payload:

{
  "event_timestamp": "20190814T12:00:00Z",
  "event_name": "article_created",
  "article_created": {
    "title": "Title",
    "body_markdown": "...",
    "...": "...",
  }
}

The payload shall contain a timestamp (we chose a UTC RFC3339 representation but maybe we can use unix epoch), the name of the event and a sub payload under the name of the event. This way the name of the event can be used as a key to find the article in the payload, this should "scale" for all types of events, we might be wrong on this :)

In the case of article_created and article_updated the sub-payload is the whole representation of the article (more or less what we send in the REST API we guess), in the case of article_destroyed it will be either empty or not be present at all.

The payloads for each event will need to be documented.

SUB

When the API (or UI) registers a new webhook a new row in the webhooks table is created. This table should contain at least the following fields:

  • user: the user that registered the event, with a foreign key to the users table
  • url: the URL of the callback to send the payload to
  • events: an indexed array of all the events the callback shall receive
  • the usual timestamps of the row (created_at and updated_at)

UI (not really required but nice to have)

The UI would be placed somewhere in the setings and be a simple CRUD interfaced to the webhooks replicating the REST API

Implementation details

API

There are no particular implementation details here, it'd be a standard CRUD API plus the endpoint to receive the list of available events.

PUB/SUB

Since we've decided to implement this internally many possible details and concerns related to having a third party pub/sub or something based on Redis (which would add the latency of the round trip at least) are not to be considered for this issue.

The remaining details are:

  • the pub part of pub/sub shall use the async delayed job queue:

    • dispatching the event has to be synchronously (we don't want to tell the queue to create a payload before the change is stored in the DB)
    • the actual broadcasting should be done asynchronously (we don't want each event to trigger a SQL query to the webhooks table or a payload to be created in-process with the UI, we don't want to couple the user's pressing the SUBMIT button with all of this machinery)
    • the delivering of the payloads has to be asynchronous too. Be mindful of two things: the HTTP client that delivers the payload has to have a timeout and a retry system with exponential or fibonacci backoff. If the system is down, let's not be a source of denial of service 馃棥
  • build the whole system manually or with a gem: possible options are wisper, PostgreSQL's LISTEN/NOTIFY (the advantage here is that we would off load the dispatching to the DB and be sure the data is already in before notifying the app), ActiveSupport::Notifications or a custom implementation. We should avoid callbacks as much as possible.

Other considerations (they don't need to be in the initial version):

  • If after a while the system doesn't respond to delivery we should register the exception (and ping DEV?), disable the webhook and then contact the owner

  • while ordering might be in a multi pub and multi sub for the same event instance might be a concern, in this first implementation we rely on the timestamp and the ability of the client do discard an event if the combination of event_name and event_timestamp is earlier than an already received event. In addition the client can at any time call the Articles API to have the current representation of an Article

  • should we store each delivered event? If so we need a table, a UUID for each event to be sent in a header for every post

Security concerns

My main security concern is a third party finding out a callback URL and sending to them a crafted payload to change their own representation of the user's article or telling them "this article has been destroyed" hoping they will destroy it too from their own DB.

We might need to figure out a verification system that could be something of the following: on creation of a webhook we send the client a token, this token will be used to generate a hash, different for every call (TOKEN|EVENT_NAME|EVENT_TIMESTAMP|HASH OF THE BODY), put this signature in a header. The client then can optionally use the token they have stored to verify that the payload is actually coming from DEV and has not been tampered with.

Resources

Describe alternatives you've considered

We considered building pub/sub on Redis, on Pusher or any other third party system but for the scope of this feature we decided against it.

A bonus would be to decouple the webhooks system from the pub/sub so that in theory in the future subscribers don't need to only be webhooks but can be both internal subscribers and anything else. We don't think we need to it now though.

Additional context

Thanks to @maestromac, @lightalloy and @benhalpern for providing precious feedback which resulted in this issue description. Please edit this or add anything else I might have forgotten.

We also welcome input for any community member, we will very likely take the lead on the implementation of this feature but we'd love to have any feedback.

api-v0 webhooks ruby

Most helpful comment

I have started working on the issue and made the following steps:

  • the webhook endpoints and api to create and destroy them #3783
  • the simplest solution for dispatching events to the webhook endpoints #3872 . Currently WIP but I'll be finishing it soon.
    The next steps (besides possible improvements or changes based on reviews) could be:
  • figuring out and implementing the organization subscriptions logic
  • implementing pub/sub in our app (using one of the possible solutions) to separate events logic from the webhooks logic
  • improve security

    • implementing signatures

    • checking that users authorized by doorkeeper (and other ways) have proper API permissions

All 3 comments

I have started working on the issue and made the following steps:

  • the webhook endpoints and api to create and destroy them #3783
  • the simplest solution for dispatching events to the webhook endpoints #3872 . Currently WIP but I'll be finishing it soon.
    The next steps (besides possible improvements or changes based on reviews) could be:
  • figuring out and implementing the organization subscriptions logic
  • implementing pub/sub in our app (using one of the possible solutions) to separate events logic from the webhooks logic
  • improve security

    • implementing signatures

    • checking that users authorized by doorkeeper (and other ways) have proper API permissions

I don't have a complete solution for the scopes right now, but I'll share my thoughts to start the discussion.
I suppose we'll want scopes like:

  • public profile
  • published articles (read/write? for webhooks we'll only need reading)
  • articles including drafts (read/write)

Currently, we have a lot of the API endpoints like analytics, comments, podcast_episodes, etc. Do we need to specify them as separate scopes at the moment?
It seems that we won't need this data for the webhooks integration, and the auth by token will keep the "full access" (according to the users themselves) at first, so we don't need to focus on these at the moment.

I've read about using the doorkeeper scopes and it checks for the authorization based on the actions like this:

before_action only: [:create, :update, :destroy] do
  doorkeeper_authorize! :admin, :write
end

If we have separate scopes for drafts and published articles, we'll need to find out what scopes had the user authorized an app for. I haven't found a way to do it yet, though maybe it's possible, I'll do more research.
I was also worried about a situation when a user gives permission only to the published articles and then unpublishes a post, but it seems like unpublishing is not allowed, am I right?

Webhooks could be also seen as a resource and a doorkeeper scope, but for now, it seems like it's not required

This feature was implemented via the attached pull requests.

Was this page helpful?
0 / 5 - 0 ratings