Athens: Endpoint for push synchronization

Created on 9 Apr 2018 · 14Comments · Source: gomods/athens

With push synchronization, Olympus deployment notifies other deployments about the change.

When olympus deployment OA receives push notification from deployment OM, OA does not increase its pointer for OM.
Next time pull synchronization happens, it will be performed with latest pull sync pointer, changes received through push notifications will be skipped due to deduplication checks.

Example:

T0 - OM has MxV1
T1 - OA joins and pulls from OM
    OA[MxV1], OM[MxV1]
T2 - OM receives MyV1 miss 
    OA[MxV1], OM[MxV1, MyV1]
T3 - OM pushes MyV1 change -- timeout
T4 - OM receives MzV1 miss
T5 - OM pushes MzV1 change
    OA[MxV1, MzV1], OM[MxV1, MyV1, MzV1]
T6 - OA pulls from pointer T1
        pulls OM[MyV1, MzV1] =dedup=> [MyV1] 
T7 - OA[MxV1, MzV1, MyV1], OM[MxV1, MyV1, MzV1]

Order of log entries does not matter.

Source

michalpristas

All 14 comments

Note that https://docs.google.com/document/d/1R1iCuDUF8FPZpJalewVZMjSaqWOZn37MHjssayQRQs4/edit?usp=sharing is important background context for this issue.

arschles on 11 May 2018

What should happen with the push notification? Should it be treated just like a cache miss from proxy, stored in a separate log or processed right away?

marpio on 17 May 2018

It shouldn't work exactly like a cache miss from the proxy. Instead, it should be processed just as if it were a pull operation, except the log pointer shouldn't be updated (it will be reported as duplicate on the next pull)

arschles on 18 May 2018

thanks for the clarification @arschles !
I have few more questions:

I guess the handler should just create a backgroud job which does all the downloading, storing, preparing CDN links, etc. Am i right?
Let's say OM gets a push notification from OA. Should OM pull it from the VCS or download the module from OA? If the latter is the case, how can we guarantee that the push really came from OA? Use a secret like in the case of Github push?
The push can happen at the same moment as the pull sync. So theoretically, the same module can be stored twice. Would this be an issue?

marpio on 20 May 2018

I would prefer having one/set of workers (with a set you need to sync about what is getting processed to avoid processing same item at the same time multiple times). background job per request might turn out bad and decrease perf/
we did not think this trough but, I imagine there could be a worker fed periodically with data pulled and decoupled from other Os (bg job). this very same worker could get fed by push notifications maybe?
@arschles

michalpristas on 20 May 2018

Let me think this through some more and get back on Monday

arschles on 20 May 2018

👍1

right, i didn't express myself clearly - by "creating a background job" i meant putting a job on a queue.

marpio on 21 May 2018

👍1

Perfect. after thinking about this some more, I came to realize that's what you meant. In a production environment, we'll have workers running to consume from the job queue, and we can scale them up if the queue grows. I believe that a queue push per request is fine given that.

@michalpristas talked about this earlier today as well.

arschles on 21 May 2018

Also @marpio I have answers to your questions (2) and (3)

Let's say OM gets a push notification from OA. Should OM pull it from the VCS or download the module from OA? If the latter is the case, how can we guarantee that the push really came from OA? Use a secret like in the case of Github push?

OM should download it from OA, but not while it does deduplication and saving module _metadata_ (revision info, name, etc...) to its own event log. In more detail, OM should do deduplication etc... and save the module metadata in its own log. That metadata will direct to OA's source code (in OA's CDN). Then, in a subsequent background job, OM should download source code from OA's CDN. After that operation is done, OM should direct to its own CDN in its own log.

The push can happen at the same moment as the pull sync. So theoretically, the same module can be stored twice. Would this be an issue?

All deployments must append to the event log sequentially, but that wasn't clearly specified in the document (sorry about that - I'll clear it up). Anyway, if we have sequential appends, then we can do deduplication to avoid the store-twice problem you mentioned.

arschles on 21 May 2018

Thanks @arschles
I guess implementing all of this would extend the scope of this issue. Would it be ok if I would implement just the handler and a worker which accepts the push and does nothing at the moment?
Or maybe there's something else to do that makes more sense now?

marpio on 21 May 2018

@marpio yes absolutely. In your PR that does part of this work, just reference this issue so we can keep track of how far along the work is.

arschles on 22 May 2018

👍1

did #181 solve this @marpio

robjloranger on 15 Jul 2018

@robjloranger I think so. @michalpristas correct me if I'm wrong please.

marpio on 15 Jul 2018

yes #181 solves this closing

michalpristas on 15 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

In gcp we ignore bkt.Create "bucket already exists" error

fedepaol · 4Comments

Test_checkFilePerms fails on Windows

marpio · 3Comments

Unify code for parallel uploades in S3/azure/gcp storages

marpio · 4Comments

Non Random UUID's

komuw · 3Comments

Create an HTTP endpoint that outputs the build version of the proxy

arschles · 3Comments