serving 🚀 - Feature Request: Allow processing of async requests in Knative

/cc @duglin

nimakaviani on 25 Jun 2019

Question is, why would you want to use knative for this?
Most of what you described is what your app has to do.
I presume, you don't really expect Knative to provide a generic state machine/workflow engine?
Isn't everything you need is minScale=1, to permit async processing, without scaling to zero?

vagababov on 25 Jun 2019

I always thought async workloads are more of a territory of Knative Eventing in conjunction with Knative Serving. After all, the protocol for Knative Serving currently is request/response based and everything revolves around that pretty much.

Any thoughts on why Eventing is not a suitable solution for async workloads like the ones you mentioned?

markusthoemmes on 25 Jun 2019

As Markus says, this is where eventing comes in, perhaps even eventing delivering events to something _other_ than serving.

cc @evankanderson

mattmoor on 25 Jun 2019

@vagababov I think setting minScale=1 is an anti-pattern to support async. Particularly for the case of infrequent async jobs, I dont think the expectation of having resources sitting idle is ideal.

@markusthoemmes good question. I think this is the _conjunction_ of Knative Eventing and Knative Serving that is in fact problematic. Knative's presumption of short-lived requests with persistent connection to the client goes against what is expected from an async job, regardless of whether the request is initiated by an outside client or an internal _Cloud Event_. This is where I think @mattmoor 's point regarding having _"something other than serving"_ deal with async requests comes into play. But does an app developer really want to have a separate app to send batch emails or perform data transformations? or even worse, to support separate code bases or deployment models?

so to @vagababov's question, I don't really expect Knative to provide a generic state machine or workflow engine, but for it to close the loop on common requirements in a developer's workflow without the developers having to go up and down in the stack to have different pieces of their application deployed.

nimakaviani on 26 Jun 2019

This is where I think @mattmoor 's point regarding having "something other than serving" deal with async requests comes into play

To be clear, I mean having something other than serving dealing with the async aspect, but that may result in the Eventing system having a synchonous call to Serving. For sufficiently long-running (a la batch) cases, Serving may be ill-suited for this (or the scaling model may not be right), and so it may make sense to compose eventing with other forms of addressable compute that aren't _currently_ covered by knative.

mattmoor on 26 Jun 2019

Another vote for using Eventing to handle this (turning a sync request into an async request).

Also, there is room for serverless products for handling long running job, but currently, I think this it outside the charter of knative/serving.

mikehelmick on 2 Jul 2019

I'm not following the relationship to eventing. The connection between a client and the KnService is not an event. It shouldn't be morphed into a CloudEvent, or go thru brokers, triggers, fanned-out, etc. I also wouldn't expect the client to change how it interacts with a KnService based on whether it's doing it via sync or async - meaning I would not expect the URL of the KnService to change, which is what I think using eventing for async would require. People may be interacting in both modes at the same time.

I don't see the internal flow or structure needed by async being that different from sync - with the exception of how responses are handled, so it's hard for me to see why this is out of scope for serving (yes I know there's a bit more to it, but I think the response handling is the biggie). Async is a well established model in the serverless space so it would be odd for us to exclude support for it.

duglin on 2 Jul 2019

Thinking more about the tie-in with eventing... if what's meant is that something in the normal flow detects the "run this in async mode" flag, and as a result redirects the request to some eventing component because it knows how to store responses in some kind of persistence, then that's just an impl detail. But converting it to a CloudEvent seems odd.

In the end though, a KnService is called and I think the only connection that we need to ensure stays open the entire time is the one between the user container and the queue proxy - and I'm not sure eventing helps with that since it assumes persistent connections from channels, no? Although, if we combined this support with how eventing manages responses (meaning, instead of the channel waiting for a response, the response is sent via a call-back managed by the queue proxy) then I think those two worlds might be more aligned than I originally considered. But, that all impl details and the user should be unaware of it.

duglin on 2 Jul 2019

I think this it outside the charter of knative/serving.

@mikehelmick can you pls point me to that charter?

mbehrendt on 2 Jul 2019

re 'handling async via eventing' , adding to what @duglin said above: if we somehow magically handled it via eventing, you still have the issue that somehow under the cover the knservice gets called synchronously. I.e. you're bound to the max execution time associated with synchronous calls, and to the resource consumption implied by potentially keeping 1000's of http connections open.

mbehrendt on 2 Jul 2019

other forms of addressable compute that aren't currently covered by knative.

@mattmoor can you pls elaborate on which other forms of addressable compute you're referring to? Are there special semantics behind you emphasizing the _currently_? E.g. is there sth cooking behind the scenes?

mbehrendt on 2 Jul 2019

my understanding of @mattmoor 's suggestion was that it needs to be handled through other Kubernetes controllers (e.g., deployments, etc.). If that's right then, back to my original point, it won't be a great user experience if the _developers have to go up and down in the stack to have different pieces of their application deployed_.

Another vote for using Eventing to handle this (turning a sync request into an async request).

For the above, or like @markusthoemmes suggested, bringing serving and eventing somehow together, requests will have to come back to the Kn app at some point for processing. With Kn requiring "persistent time-limited http connections", the problem remains, like @mbehrendt and @duglin mentioned. Unless we modify eventing to support long-running workloads.

nimakaviani on 2 Jul 2019

Also I updated the original requirements with the following item:

Same endpoint should allow for both blocking and non-blocking async requests

nimakaviani on 2 Jul 2019

it might be good to separate out UX from impl details/requirements.

From a UX perspective:

the endpoint for a KnService should be the same regardless of whether it is invoked synchronously or asynchronously
while it may be possible for a KnService to declare itself as an async service (and therefore all requests are treated as such and a 202 is returned immediately) that is not the case we're most interested in. We're looking for the one there the person invoking the KnService chooses whether the processing is done async or not via some flag (e.g. perhaps a query param or http header)
user can then get the response metadata via some mechanism

From an impl perspective:

I actually think trying to have eventing and serving leverage shared components makes a lot of sense. My initial push-back was because I didn't see the existing eventing infrastructure being a good fit and I was interpreting the "use eventing" statements as "expose something new to the end user" - and that breaks the first bullet point above
in order for this sharing to happen though eventing would need to do basically what this issue is proposing.... modify the serving side of things to support async invocations, then it could leverage that.
how we then store the responses and the metadata associated with it could be done via eventing or anything else -that's just an impl detail that (for the most part) isn't seen by the end user

duglin on 2 Jul 2019

Some questions I have about the proposal:

At a high-level, how does the tracking and management of async tasks not turn into a workflow manager like Apache Airflow ( https://github.com/apache/airflow )? Why should this be built into Knative Serving rather than orchestrated on-top?
Knative Serving today relies on HTTP information to provide the "serverless magic" of autoscaling both up and down. We rely on the closing or timeout of the HTTP connection to determine that a container has finished. From the comments above it sounds like the desire is for the HTTP connection to be closed, how do you propose that we determine that an async container has finished? When finished how is the response/output of the async job is captured by the control plane?
How do you imagine the runtime contract for async tasks to differ from synchronous tasks? Is an HTTP endpoint still required?
How do you propose our settings of container concurrency work with async requests?

dgerd on 2 Jul 2019

My thoughts in the same order as the questions above:

I cannot quite establish the link between the workflow manager and async requests. It's not the flow that matters but instead, the processing of a single request and storing the results for later retrieval without the persistent external connection. how does the flow help? what am I missing?
Busyness for a pod and whether or not a request is internally finished is determined and tracked by the queue proxy based on open connections to the user container. It can (and should) be kept independent of whether there exists a corresponding external HTTP connection from the client. Queue proxy can track async requests too, and do the corresponding bookkeeping.
The runtime contract won't change, nor will the http endpoint.
container concurrency is also determined by the queue proxy. Given that queue proxy can track async requests too, there wont be any change to container concurrency settings.

nimakaviani on 2 Jul 2019

This contradicts the model we've been operating with.

It can (and should) be kept independent of whether there exists a corresponding external HTTP connection from the client. Queue proxy can track async requests too, and do the corresponding bookkeeping.

and

container concurrency is also determined by the queue proxy. Given that queue proxy can track async requests too, there wont be any change to container concurrency settings.

I dont' see how Q-P can achieve that.

How would it determine which request is async and which one is sync, since according to your model, async is fire-and-forget kind? This needs request annotation of sorts.
Currently we reason about the load, presuming each request cost about the same (cost being 1 request in flight). Async requests may/will vary in load generated and even if we somehow teach QP to discern one from another, we can't equalize them. Which means all our Autoscaling logic will be incorrect.

The runtime contract won't change, nor will the http endpoint.

Determining async=sync at QP level would require some flag/header/X to be set -- this _is_ RTC change.

I cannot quite establish the link between the workflow manager and async requests. It's not the flow that matters but instead, the processing of a single request and storing the results for later retrieval without the persistent external connection.

You have to run a stateful load for that. You might be interested what LightBend folk are doing. As such just supporting arbitrary stateful loads is probably not what we're aiming for right now...

vagababov on 2 Jul 2019

Determining async=sync at QP level would require some flag/header/X to be set -- this is RTC change.

correct. like @duglin suggested earlier

(e.g. perhaps a query param or http header)

and sure there is a change in RTC too. But it is an additive change and backward compatible.

Currently we reason about the load, presuming each request cost about the same (cost being 1 request in flight). Async requests may/will vary in load generated and even if we somehow teach QP to discern one from another, we can't equalize them. Which means all our Autoscaling logic will be incorrect.

I am not sure if I understand the above. KPA works based on the number of requests and last time I checked, it supports scaling on cpu too. Even now, revision.spec.timeoutSeconds has a default value of 5m which is 3000x larger than an average 100ms request time. I didn't see any specifics on how Autoscaling presumes each request would cost about the same and what the same implies where it ranges from milliseconds to 5 minutes. Even if we set aside the termination grace period, it is still 90sec before terminating an instance.

You have to run a stateful load for that. You might be interested what LightBend folk are doing.

I am not sure if stateful load helps. there's nothing in the load that is stateful. It is more of a stateful response if anything.

This contradicts the model we've been operating with.

The only place where I see it impact the model, is the assumption of having an external connection from the client. I am not sure if loosening the assumption would go down as contradiction. Particularly if QP continues to do proper bookkeeping of connections to the user container.

nimakaviani on 2 Jul 2019

KPA never scaled based on CPU. Only on concurrency.

vagababov on 3 Jul 2019

Long processing times is a a feature request that I've heard from many customers.
This is likely due to the fact that Cloud Run and Cloud Run on GKE have a maximum request timeout of 15 minutes that can be limiting. As mentioned in the first comment, the use cases are often about compute intensive tasks and data transformation.

Assuming that our goal is to allow customers to do long processing (O(a few hours)), it makes sense to me to explore another developer experience and variations to our runtime container contract. Indeed, I doubt the synchronous request/response model is the right one, as it forces clients to keep the connection open.

I also agree that going async should not mean getting rid of the other benefits of knative/serving:

I still want to think in terms of "Service"
I still want one stable endpoint
I still want to manage my revisions and traffic split between them.
I still want to provide a container, specify env vars...

I am supportive of exploring an alternate container contract tailored to async use cases.
I am not sure that the one suggested in the first comment is what we should adopt (I could also see the developer having to call an internal endpoint to signal the end of the processing).

steren on 3 Jul 2019

@mattmoor can you pls elaborate on which other forms of addressable compute you're referring to? Are there special semantics behind you emphasizing the currently? E.g. is there sth cooking behind the scenes?

Nothing behinds the scenes, the whole point of Addressable is to enable Eventing to compose with things other than Serving. Today Channel is Addressable and can deliver events to a K8s Service over a Deployment. I have also built several other PoCs (all OSS on github) that implement Addressable and compose with Eventing, but nothing in this category (nor secret).

I honestly think it would take a day to prototype this on top of Serving and Eventing, assuming I understand the desired requirements.

I'd have the Service receive the request, wrap it in a cloud event with a uuid in the payload, post it to a channel and return the uuid. The channel would have a subscription that delivered the event to some long-timeout Service for processing (the max timeout is a knob in config-defaults now), and the subscription would send the response from that Service to something that persisted it associated with that uuid for later retrieval.

There are other variants of this that would also work that involve delegating the compute to K8s Jobs / Pods. Am I missing something?

The scope creep of this request is considerable, and among other things entails Serving taking on responsibility for durably(?) storing results (how much? how long?), and providing access to those results later (how is that authenticated?). If nothing else, it is a big jump in the complexity of Serving, which I don't take lightly.

Have you looked at what it would take to implement this on-top of Serving and Eventing? If so, what's missing? If not, then what is undesirable about a new layer for this abstraction?

While we don't have other forms of directly addressable compute in Knative today, it doesn't mean we won't. Perhaps put together a PoC and we should talk about knative/asyncing?

mattmoor on 3 Jul 2019

Many of the questions you ask can be asked of just about anything in Kn. For example your question about durability of the responses applies to durability of events in brokers/bchannels. We solve this via configuration and pluggability of components so people can choose how to answer the question for their own needs. We don't have to have a one-size-fits-all answer for everything.

While I think serving support for async directly would be the best option for the project, I don't necessarily think supporting it on top of serving is a horrible option. However, in order to do that there would still need to be changes made to serving. For example, determining "busyness" of an instance based on the presence of a connection back to the client.

I think part of this issue (and #4098) comes down to whether Knative is a platform on which to build multiple offerings where each can have their own differentiating value proposition (while still having a core consistency), or whether Kn is going to be parental/prescriptive and only allow for one view of how things should work (even if that differs from what many similar offerings do today).

re: PoC - we do have one and @nimakaviani can demo it if people would like.

duglin on 3 Jul 2019

I do concur, that this is more like a different product async batch or however you want to call it.

Besides durability there are questions like checkpointing, restarting/retrying, etc.

This all just feels like a different product in general.

vagababov on 3 Jul 2019

I'm wondering why things like batch, checkpoints, restarting... are mentioned as features when they are not part of the proposal. If a pod running async dies it has the same semantics as a pod running sync - it dies and life goes on. If features like workflow or orchestration are a concern then those questions should be asked of the eventing side of Kn since pipelines, broker/channels w/responses are all much closer to those features than this issue is proposing.

This proposal is asking for a much more focused design request... allow for long running function calls. Something many FaaS support today.

duglin on 3 Jul 2019

So suggestion is just to fire and forget? No guarantees of
execution/result?

I think what this proposal misses is a real-life example to see where
you're coming from and where you're going to.

On Wednesday, July 3, 2019, Doug Davis notifications@github.com wrote:

I'm wondering why things like batch, checkpoints, restarting... are
mentioned as features when they are not part of the proposal. If a pod
running async dies it has the same semantics as a pod running sync - it
dies and life goes on. If features like workflow or orchestration are a
concern then those questions should be asked of the eventing side of Kn
since pipelines, broker/channels w/responses are all much closer to those
features than this issue is proposing.

This proposal is asking for a much more focused design request... allow
for long running function calls. Something many FaaS support today.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/knative/serving/issues/4522?email_source=notifications&email_token=AAF2WX3FPVMEOHWSNVCIDCLP5S5RRA5CNFSM4H3EOYKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZEXUOY#issuecomment-508131899,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAF2WX6KV2WOUWTMXFM5CF3P5S5RRANCNFSM4H3EOYKA
.

vagababov on 3 Jul 2019

Not really fire-n-forget. maybe... fire-n-query-response-later :-)

As for as a use case I think the first comment covers that. But to me I've always liked the ones along the lines of:
-invoke("url-to-data",async=true) and the function processes the data at that url (e.g. video for transcoding, or data for migration.... something that will take a long time). And fn-id is returned.

the user will getStatus(fn-id) to find out when it's done and any possible response.

from this other higher-level things could be built, but I think those would sit on top of this core feature.

duglin on 3 Jul 2019

I agree there is a significant complexity cost associated with serving exclusively taking ownership of this feature. To me, this mainly comes down to how would we support the stateful aspects of this which have been mentioned (a database dependency?). The idea of leaning on eventing here is very appealing to me for a few reasons:

it is already a stateful system (queues, although lookup of requests is still lacking here but more on that in a bit).
The abstraction around queues is very nice to have and reuse as an upstream
long running workers dispatched by $queue is a somewhat common pattern
I'd expect eventing to want to support run long running workers regardless (see above point)
there's prior art for a very similar request -> queue -> async response pattern already (openwhisk)
there is an existing plugin system (event sources I believe) which could be used to isolate communication with $state_tracking.

Have you looked at what it would take to implement this on-top of Serving and Eventing? If so, what's missing? If not, then what is undesirable about a new layer for this abstraction?

The main challenges I see here are:

"I still want one stable endpoint." There are some ways to work around this: q-p routing requests to eventing was mentioned (although we'd also want activator involved if we go this route). Mainly, I can't think of a way we could implement this while preserving traffic splitting behavior without some minimal changes in serving.
"I still want to think in terms of "Service"." Theres lots of mention about using deployments or some other compute and delivery mechanism to run workers for events. All of these concern me because one of the core things we are after is a unified experience for a user - they do not need to understand a different app lifecycle (revisions), make a new deployment, etc for the async version of their application. IMO, any solution needs to either build on knserving for compute or we need to come up with a way to share knserving service, config, revision machinery for this reason.

greghaynes on 3 Jul 2019

Sure, I had in mind about the same idea. But for this to work properly you
need all the shebang: checkpointing, resuming, retries. Otherwise, it's
just a very poor end user proposition. I personally, would not code against
a system that provides async request capability, but does not provide any
mechanisms for the request to be picked up and resumed. If the backend
machine/node/pod fails.

On Wed, Jul 3, 2019 at 9:12 AM Doug Davis notifications@github.com wrote:

Not really fire-n-forget. maybe... fire-n-query-response-later :-)

As for as a use case I think the first comment covers that. But to me I've
always liked the ones along the lines of:
-invoke("url-to-data",async=true) and the function processes the data at
that url (e.g. video for transcoding, or data for migration.... something
that will take a long time). And fn-id is returned.

the user will getStatus(fn-id) to find out when it's done and any
possible response.

from this other higher-level things could be built, but I think those
would sit on top of this core feature.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/knative/serving/issues/4522?email_source=notifications&email_token=AAF2WX5D5WMCQSMUGMCR6XDP5TFWTA5CNFSM4H3EOYKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZE6IFY#issuecomment-508158999,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAF2WX42E7UMEHFH5RHIWDDP5TFWTANCNFSM4H3EOYKA
.

vagababov on 3 Jul 2019

I think the retry can be left up to the user. As long as they can query and see the results (whether success or failure), they can decide on whether to re-trigger. No need for the retry logic to be baked in and add extra complexity imo.

nimakaviani on 3 Jul 2019

In the synchronous case, the caller is waiting on the line for the response, and gets an immediate success or indication as to whether the request should be retried.

In the case where the result is deferred, error handling becomes more difficult, because the caller needs to keep track of the ongoing request and re-enqueue it (possibly hours later) if the initial request fails. It would be much simpler to have the long-running case durably accept the event, and retry the computation if the first container fails (is rescheduled, kernel is upgraded, etc).

A concrete example of where deferred failure semantics would break existing contracts:

In Knative Eventing, we assume that a 2xx response means that the event has been accepted and stored by the next stage, and the current stage can forget about the Event. A non-2xx response means that we should attempt redelivery (after a backoff, etc). Extending the delivery contract to be responsible for tracking and re-delivering the event in the case that a continuation failed would be a substantial additional burden on every system that aims to deliver events.

Adding retries to the system is not entirely trivial. A few questions from the top of my head:

How do traffic splits interact with retries?
Typically, POST events are not idempotent or retry-able. In this case, we probably do want to retry them. How to keep the streams separate?
Right now, there is no HTTP-addressable namespace apart from the application itself. Callers will want to be able to check status of their activations, which means we need an additional namespace for activation ids.
Right now, all our serving is edge-triggered. What triggers the retries, and how are they coordinated?
The initial design suggests that the caller chooses whether the call is blocking or non-blocking. I'd be concerned about TCP streams which need to live longer than 2-3 hours; additionally, if a caller chooses the request is blocking, HTTP gives no good way to return the activation ID if the connection is closed early as a call-me-back address.
How are headers, etc handled? Is the contract that the same headers (including things like cache-control headers) simply replayed X hours later?

Working out the mapping of the service to HTTP would be very interesting -- for a FaaS use-case with a single endpoint, it would be easy to have "/activate" and "/check/". You might be able to use a single endpoint with some sort of PUT-if-not-exists contract, and have the client be responsible for generating idempotent IDs. This seems like something which would be worth documenting in a calling contract for the async behavior, which feels like it should be something the Service explicitly opts in to.

One possible design (not sure if it is the best) would be to have a "sidecar" that sits in the request path (like the queue-proxy), and which persists all requests to stable storage and manages the storage. A few requirements for the proxy:

Write the initial request to storage (probably want a durable, scannable key-value store, may need to be pluggable)
Return the storage address (as a URL) so that a client can handle querying, etc.
Maintain a "lease" on the request (periodically updates the request in storage with "Pod X is working on this until time Y")
Copies the result from the user-container to storage and marks the request as completed.

In addition, you would need a system to periodically sweep the storage system and:

Retry requests whose lease has expired
Close out (delete) requests information which have been completed more than X hours/days/weeks (configurable)

evankanderson on 3 Jul 2019

re: retries... @vagababov - to me the need for a retry is not exclusive to async requests - they could easily apply to sync as well. Of course detecting a failure for sync is easier, but from a conceptual perspective that's just an impl detail - the need (or lack of one) is the same.

As @nimakaviani mentioned, the user could do the retry. However, once we have the base infrastructure in place in serving to even support the idea of an async request people (or Kn Serving if it really wanted) could add-on things like retries on top of it. And I believe those components (or the user) could get their job done w/o additional changes to serving - iow, they could be done as extensions or via existing features. This provides the core components, or changes, necessary to allow that future growth on top of serving - while also filling the immediate need of "long running calls".

duglin on 3 Jul 2019

it might be good to separate out UX from impl details/requirements.

From a UX perspective:

the endpoint for a KnService should be the same regardless of whether it is invoked synchronously or asynchronously

I'm actually concerned about this -- having the client be able to choose sync or async for every endpoint and method regardless of the server's intent seems like it could interfere with existing software. The safest might be to choose a header such as Proxy-Knative-Async which would be in the forbidden header name set.

Some thought might also be needed around interaction with HTTP caching, HSTS and the rest of the HTTP stack as it exists today.

evankanderson on 3 Jul 2019

In other words if my O(hours) video transcoding job failed 2 minutes before
completion due to node/pod failure, and even if I (somehow, it's not clear
to me how would you report error/failure in this case) job failed, the
suggested mode of operation is to restart O(hours) from scratch?

On Wed, Jul 3, 2019 at 10:21 AM Doug Davis notifications@github.com wrote:

re: retries... @vagababov https://github.com/vagababov - to me the need
for a retry is not exclusive to async requests - they could easily apply to
sync as well. Of course detecting a failure for sync is easier, but from a
conceptual perspective that's just an impl detail - the need (or lack of
one) is the same.

As @nimakaviani https://github.com/nimakaviani mentioned, the user
could do the retry. However, once we have the base infrastructure in place
in serving to even support the idea of an async request people (or Kn
Serving if it really wanted) could add-on things like retries on top of it.
And I believe those components (or the user) could get their job done w/o
additional changes to serving - iow, they could be done as extensions or
via existing features. This provides the core components, or changes,
necessary to allow that future growth on top of serving - while also
filling the immediate need of "long running calls".

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/knative/serving/issues/4522?email_source=notifications&email_token=AAF2WX2VFVJPSOOC2KKBAZLP5TN25A5CNFSM4H3EOYKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZFEFYQ#issuecomment-508183266,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAF2WXZEWXSFLAMGC5HZW23P5TN25ANCNFSM4H3EOYKA
.

vagababov on 3 Jul 2019

Now, to lean on the points of agreement:

I think async processing is super-useful. I'd love to have it in Knative as another model.
I think async processing has a lot of similarity in rollout with Service/Route/Revision, and that model seems generally successful. Retry semantics seem important to work out, along with the model for triggering a request idempotently and querying the async result. This seems to be "undifferentiated heavy lifting" that's ideal for the platform to provide
Can you mix async and sync in the same container/Service? I don't know; it seems interesting on one hand, but often async has different flexibility than sync workloads (think queueing and "filling in troughs" using existing cluster capacity).

WRT to @vagababov 's last comment -- sometimes you have to do that, because your transcoding library doesn't support checkpointing. Something like CRIU snapshots might help there, but I also suspect that most of the interesting cases won't work that way.

evankanderson on 3 Jul 2019

@greghaynes : re-use of eventing - I like the idea of reusing components when it makes sense. However, I'm not sure about sending async requests thru eventing and queues as that will impact the latency of things. Additionally, I don't think we need the features of eventing for the request side of the flow - it adds no value right now. And for the response flow, all we're really talking about is a POST to some URL, so while I think the specification of the destination can look the same as what eventing does (for UX consistency), bringing in eventing just for a few lines of code for an HTTP POST might be a bit much. Then of course, we have the whole "it's not a CloudEvent" aspect of it. I'd like to keep the MVP of this simple and try to not touch the inbound side of this flow at all if possible and just worry about the response aside of it. As I mentioned previously, and people will see in the demo next week, we didn't need to change anything in Serving for the request flow until it hit the queue-proxy, and that was nice.

@vagababov : re: error reporting - the getStatus(id) type of thing would return metadata about the response - for example, status (running, completed), the http response code, error messages, etc... basically the same thing you'd expect to see for the sync version of things. And I think that's a key requirement. Aside from the mechanism by which someone gets the response, there really should be no difference between async and sync calls. That's one of the reasons when people mention things like batching, retries or orchestration I think we've gone off on a tangent because all of those might be needed regardless of sync or async - plus I see those as higher level tooling.

Now having said that... we could look at adding things like retries within Kn if people really wanted, but I think that's a post-MVP thing. I'd prefer to get the basic function working really well before we get fancy. :-) Plus, as @evankanderson mentions, that might be a big change to the runtime contract, which I'd like to avoid for MVP.

@evankanderson : re: getStatus() stuff... this is one part I think we'll need more discussions on. In the demo that @nimakaviani will show we have a solution but it's not clear to me that's the final answer - but IMO the exact retrieval mechanism isn't the thing the demo is meant to show-off. The key thing is the impact to serving to deal with the sync -> async shift.

re: side-car for storage/retry.... if we did want to look at retries (which I really want to avoid right now :-) ) ... I don't think the pod is the right spot since the pod might die/go-away and the retry might need to be sent to a new one. So the retry needs to be at a higher level. This is another reason why if we do explore it, I view it as a bit of an add-on/higher-level bit of functionality.

duglin on 3 Jul 2019

What @nimakaviani and @duglin described above is very like a typical FaaS (Function as a Service) mechanism. When we need to do some AI training, it's necessary to allow processing of async requests because it may take a long time to get the result.
Now I am going to migrate my FaaS onto Knative, so I am interested in this feature and especially the demo @nimakaviani had given. From my perspective, we could implement a 'watchdog' like what OpenFaaS did between like queue-proxy and the function container to take over the container's input and output, and also serve as getStatus(). Besides, we would also need a callback-receiver to store the result according to the callback-id to some database for data persistence. But I'm not sure if that's the final answer, and I'm interested in your solution. Can you please tell me where to watch your demo slides if it can be made public?

ZhengYangTong on 10 Oct 2019

AI training seems more of an offline job, isn't it? So wouldn't just
standard k8s job be better suited for this?

On Thu, Oct 10, 2019 at 1:21 AM Zhengyang Tong notifications@github.com
wrote:

What @nimakaviani https://github.com/nimakaviani and @duglin
https://github.com/duglin described above is very like a typical FaaS
(Function as a Service) mechanism. When we need to do some AI training,
it's necessary to allow processing of async requests because it may take a
long time to get the result.
Now I am going to migrate my FaaS onto Knative, so I am interested in this
feature and especially the demo @nimakaviani
https://github.com/nimakaviani had given. From my perspective, we could
implement a 'watchdog' like what OpenFaaS
https://docs.openfaas.com/architecture/watchdog/ did between like
queue-proxy and the function container to take over the container's input
and output, and also serve as getStatus(). Besides, we would also need a
callback-receiver to store the result according to the callback-id to some
database for data persistence. But I'm not sure if that's the final answer,
and I'm interested in your solution. Can you please tell me where to watch
your demo slides if it can be made public?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/knative/serving/issues/4522?email_source=notifications&email_token=AAF2WXYLQSLGXQICCXZUJWDQN3QYVA5CNFSM4H3EOYKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA3LSNQ#issuecomment-540457270,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAF2WX7CMTUXXWAP4IRU6T3QN3QYVANCNFSM4H3EOYKA
.

vagababov on 10 Oct 2019

@ZhengYangTong here are some references to the resources you asked for. Like @vagababov mentioned, whether and how knative should support the async work is still an open question. But hopefully this will give you some ideas as to how the demo was implemented.

The demo video of the async work on top of knative serving: https://www.youtube.com/watch?v=ce70Tp8uQcE

Repo/branch for the modified knative serving work: https://github.com/nimakaviani/serving/tree/async-scaling (this was built on top of knative v0.4 and I havent been actively updating it so it is quite outdated).

nimakaviani on 10 Oct 2019

❤1

@vagababov Suppose you have trained a model and need to do something like regeneration testing, all you want then is just a scalable running application to receive requested parameters and give back response results, which may take a long time. Thus both sync and async requests should be handled properly in this case.
@nimakaviani Thank you very much.

ZhengYangTong on 11 Oct 2019

+1 to @ZhengYangTong's comment. I wouldn't want someone who is creating a piece of code to have to think too much about whether the processing of each request will be short-lived or not, and based on that decision use Knative vs some K8s thingy. What if it starts out being short lived but over time some requests end up taking 15 minutes? Should they really need to change their deployment model or find a way to detect the long running ones and route them to a K8s thing? I'm hoping not. I know not everyone has going thru the mind-meld yet :-) but I'm still hoping to get us to the point where Kn says "give us your code and we'll host it for you" regardless of these types of runtime characteristics. Then we can save the K8s deployment mechanisms for the edge cases that require more advanced tuning.

duglin on 11 Oct 2019

I asked from a slightly different point of view. You can do many things with knative, but it doesn't really mean you should ;-)

vagababov on 11 Oct 2019

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

knative-housekeeping-robot on 10 Jan 2020

Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle rotten

knative-housekeeping-robot on 9 Feb 2020

/remove-lifecycle rotten

duglin on 9 Feb 2020

FWIW, there seems to even be a standard header to indicate from the client that the server shall handle things asynchronously: https://tools.ietf.org/html/rfc7240#page-8

markusthoemmes on 27 Mar 2020

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

knative-housekeeping-robot on 26 Jun 2020

/remove-lifecycle stale

/cc @beemarie

duglin on 26 Jun 2020

there has been a similar discussion in OpenFaas

https://github.com/openfaas/faas/issues/657

I agree with @duglin that the complexity of code can change over time and having to switch deployment methods isn't a great UX

lukasheinrich on 2 Jul 2020

FYI we created #async-requests on slack.knative.dev for further discussion and scheduling of follow-ups.

mattmoor on 7 Jul 2020

Following the process for feature request - created our feature proposal document to capture the decisions & direction so far:
https://docs.google.com/document/d/1a8f6mVlqQsr0VttWTRLcFT1PtnOHF9dKZXZEos9NSBA/edit?usp=sharing

beemarie on 10 Aug 2020

I have one integration with several knative services, every ksvc is connected with the next via inmemorychannel, imagine that the first ksvc sends a message via channel to the second ksvc, if this second ksvc is down, when this ksvc is up, the message will be processed by the second ksvc?

pjcubero on 28 Aug 2020

@pjcubero yes it should be. This particular issue isn't related to that sort of asynchronous processing. If you have deeper questions about the in-memory channel's delivery/durability semantics, I'd suggest raising them in knative/eventing.

mattmoor on 28 Aug 2020

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] on 27 Nov 2020

/remove-lifecycle stale

duglin on 27 Nov 2020

you could refer the OpenFaaS project:
it allows us call functions async as follows:
curl http://gateway/**async-func**/function/xxxxxx
and sync call like this:
curl http://gateway/**func**/function/xxxxxx

openfaas document: https://docs.openfaas.com/reference/async/

junneyang on 27 Nov 2020

🎉1 👍1

Serving: Feature Request: Allow processing of async requests in Knative

Describe the feature

Requirements

Usecases

Most helpful comment

All 56 comments

Related issues