Faas: Proof of concept: reliable asynchronous processing

Created on 7 Jul 2017 · 11Comments · Source: openfaas/faas

Asynchronous processing should be possible for long-running functions.

Must have:

Work can be accepted through a new route or a Header

/async/function/<function_name>

Or via Header:

X-etc: async

Work is accepted immediately and a 202 Accepted is returned. This should be handed off to a queue.
One or more (scaleable) asynchronous workers read from a queue and call functions
- Should dequeue item atomically
- Upon failure another worker should pick up the item
- For initial version - HTTP should be used by worker to call function just like the gateway does. Timeout will depend on the configuration of the function.
Prometheus metrics to be logged for work queued/processed/outstanding

Could have:

Watchdog configuration to state whether async/sync is supported
Validation in gateway for invocation method
Retry logic on failure

Nice to have:

Additional logging beyond docker service logs
Callback URL could be specified via header or query-string - this could be called by the framework upon completion

Notes:

Have looked into Kafka - design looks overly complex for task at hand.
NATS queuing is not resilient - but NATS Streaming may be suitable.

help wanted skiladvanced

Source

alexellis

👍2 ❤1

All 11 comments

If you need a beta tester for asynchronous processing, I am in !

Tofull on 11 Jul 2017

Thanks @Tofull - I've started a quick proof of concept with NATs streaming.

Have you started using FaaS or creating functions already? Do you have async workloads ready for testing?

alexellis on 11 Jul 2017

Amazing ! You roks ! :)

I used FaaS to deploy some functions that my machine learning experts made with magic.

FaaS works great for prediction service as it is a synchronous task.

As some processing functions need time (training our models), we would use async processing and we already have a workflow ready for testing.

Tofull on 11 Jul 2017

🎉1

That sounds like a great use-case. Is there anything you can share on a blog or on Twitter?

alexellis on 11 Jul 2017

When our developments will gain in stability, we will be able to communicate and mention FaaS as the solution we decided to use in a tweet or some presentations we will make ('cause we are working with French industries in aerospace :artificial_satellite: & :earth_americas: earth observation fields).
FaaS with async should completely meet our needs, here at the Institute of Technology Saint Exupéry. :smiley:

Tofull on 12 Jul 2017

It would probably help to map out the use cases for the async processing first as there are a couple different ones I can think of that usually require different guarantees and metrics. anyone know what the users of this library would favor in use cases for this as this would ease up choosing the right queueing options here too. kafka for example might seem overly complex (and it is complex) but it has its uses, but usually I wouldnt choose that for simple response queues like here generally. nats is nice general purpose, but i fear doesnt encompass all options. there is also the possibility of a mixed solution making it simple pub/sub with separate log database (usually you need recent stuff, which is in mem, but you dont lose old stuff this way and support stuff like "oh my car is offline for 15min because of shit internet, but can still get its response"), which gives quite a lot of flexibility and isnt that hard to implement.
problem with using queuing systems like nats is that you end up having a ton of a ton of queues piling up. While this is completely acceptable in your infrastructure for workers and services as its quite limited, when it comes to response queues, thats not so feasable really. mqtt suffers from similar problems in the end when load gets high. seen a couple of implementations that offload mqtt queues to databases though.

sandrom on 22 Jul 2017

As you have mentioned - the various queue implementations available have their own pros/cons. Ideally it should be easy to swap between different "queue" providers or implementations.

This initial branch / work is based around a NATs streaming queue which does have persistence and resilience.

You can see the progress here:

https://github.com/alexellis/faas/tree/async_nats

Guide to testing the branch:

https://gist.github.com/alexellis/62dad83b11890962ba49042afe258bb1

alexellis on 22 Jul 2017

👍1

ah i must have missed that idea - that sounds like the best possible outcome, yes :)

sandrom on 22 Jul 2017

Hey @Tofull do you have a draft or published blog yet?

alexellis on 8 Aug 2017

Please see changes in #131