Azure-functions-host: Cold start taking a long time in Consumption mode for c# azure function

Created on 27 Oct 2016 · 61Comments · Source: Azure/azure-functions-host

This is our first use of Azure Functions and we are happy with the idea so far. We have a couple functions setup that will get hit randomly throughout the day so it will never be called warmed up. The issue that we are having is the cold startup time is 30s+.

Here is an example:
Monitor logs says that function was only 2,640ms running time. On our end we saw 30.30 seconds. This request happened 18:34 on Thursday, October 27, 2016 (UTC).

If we call it again right after the first it's a 200ms response time. If we wait 10 minutes it'll take another 30+seconds again.

I've created a dummy app name if you want to look into this: dumm2345

Is this 30 second startup time expected?

Source

lemkepf

👍12 😕8 👎1

Most helpful comment

Thanks for the feedback. We'll have to see what we can achieve. Definitely, scenario where 2 seconds is not good enough are tough right now. For context, see how this issue started, and it was 30 seconds at the time. So some progress was made :)

davidebbo on 5 Apr 2018

👍5

All 61 comments

Thanks for the info, we will investigate. Request was from RestSharp, right?

davidebbo on 27 Oct 2016

Yup. One solution we are starting with right now is to have a timer function run every 30 seconds. It literally does nothing but keeps the service alive and waiting for our random web calls. It's a terrible solution, but it's a hack we can live with for now.

lemkepf on 27 Oct 2016

👍1

Is it ok to discuss function names here as long as we keep the Web App name private (which is the host name)?

davidebbo on 27 Oct 2016

Yup. Go for it.

lemkepf on 27 Oct 2016

So initial finding is that I see the following sequence of requests:

18:36:08: Requests for / coming from ReadyForRequest/1.0. Takes 26.5 seconds
18:36:09: Requests for / coming from ReadyForRequest/1.0. Takes 1.3 seconds
18:36:11: Request for /api/CreateZipLink from RestSharp/105.2.3.0. Takes 29.4 seconds.
18:36:15: Request for /api/GetZipFile/[guid] from Chrome. Takes 3.9 seconds and returns a large response (1MB)

Though probably the first one is the relevant one that kicks in the cold start and that we need to understand.

davidebbo on 27 Oct 2016

Yup: 18:36:11: Request for /api/CreateZipLink from RestSharp/105.2.3.0. Takes 29.4 seconds.
That's the one that we care about. All it does is put a document into documentdb. But I'm pretty sure it's the cold start that's the issue. Thanks for looking into this!

lemkepf on 27 Oct 2016

Assigning to @davidebbo for an update

lindydonna on 14 Nov 2016

We have done a number of optimizations, and you should be seeing better cold start time than before. We're still working on additional optimizations that will make it yet faster.

davidebbo on 17 Jan 2017

We have a similar problem with cold start.

We have an Azure Function used as a webhook for Slash Comman on Slack. One of the requirments of the command is that response must be served within 3000ms. Every time that command used after sometime of inactivity there is a timeout.

You can check the source code here. I tried to strip out everything from it, but the plain response.

Is there only way to keep function "warm" in Consumption Plan? Is pinging it with some scheduler OK?

jenyayel on 17 Jan 2017

@davidebbo do these optimizations have implications for .NET functions a a whole? (We're using F# azure functions)

mjgpy3 on 17 Jan 2017

They apply to all function apps. Previously, there would be occasional 30s+ cold starts even for a trivial Function (see first entry in issue). Now those peaks have been eliminated, and the cold start is typically 4 to 10s in the trivial case.

davidebbo on 17 Jan 2017

Do you have plans on improving it below 4s?

jenyayel on 17 Jan 2017

Yes, see my earlier comment: We're still working on additional optimizations that will make it yet faster. No ETA at this point, but active work is happening now.

davidebbo on 17 Jan 2017

👍3

I have just created my first Azure function in order to compare it to AWS lambda. All the function does is return a random entry from a static List<string> that contains around 100 elements. When calling the function with a 'cold start' it sometime takes around 35 secs to respond. Subsequent requests take about 0.2 seconds.

My concern is that one of the reasons for exploring serverless functions is to avoid paying for the times that the function _isn't_ being called. Having to periodically call the function in order to avoid cold starts seems to defeat that purpose.

DavidBrower on 27 Jan 2017

@DavidBrower 35s does not sound right, as the current expectation is 4s to 10s (aside from the large npm tree issue). Can you share the name of your app and the time when you saw such timing?

davidebbo on 30 Jan 2017

@davidebbo the name of the function is:

https://functionsa7b1a72e.azurewebsites.net/api/ObliqueStrategiesV3.

Really, most of the time when I haven't called the function for about 5 minutes I then get the delayed response. I'm simply calling:

curl -sk https://functionsa7b1a72e.azurewebsites.net/api/ObliqueStrategiesV3 from my desktop.

DavidBrower on 31 Jan 2017

@DavidBrower it really helps to get the UTC time of one such incident, so we are looking at the same thing. I looked at all requests to that function today (i.e. 1/31 UTC) and the longest it took was 7s.

davidebbo on 31 Jan 2017

Interestingly, I have not had the very long response times since the weekend. I'm down to 5-6 secs for my cold starts.

I've checked the Azure Functions Invocation Log and I'm afraid that it doesn't seem to keep the invocations for more than a day ago. The best I can do is to say that the first of the long response times started around 16:00 30th January UTC. Sorry that I can't give you exact times. I have been running my tests using curl so I don't have an audit trail of function calls.

DavidBrower on 1 Feb 2017

Ok, that's good news. Let's keep observing, and let us know if you hit some extreme slow starts again.

davidebbo on 1 Feb 2017

Thanks @davidebbo

DavidBrower on 1 Feb 2017

$ time os
Mechanicalise something idiosyncratic
real 0m13.450s
user 0m0.105s
sys 0m0.652s

Just received 13.45 secs with a cold start at 3 February 18:35 UTC. os is just the alias I am using in Bash to call my Azure function.

DavidBrower on 3 Feb 2017

Going to chime in here and say that slow start-up, even for simple C# Functions is impacting key functionality we are using them for. We're on standard consumption plans with Always On switched on, but still see substantial ramp-up on initial invocation after some period (I assume 5 minutes). I've resorted to putting a timer triggered app in to our App Service just to try and keep it all warm and toasty.

sjwaight on 21 Feb 2017

@sjwaight have you found that then increases your costs on Azure?

DavidBrower on 21 Feb 2017

@sjwaight what do you mean by 'standard consumption plans'? There is Consumption, and App Service Plan (which includes Standard), so the two term are mutually exclusive.

Note that I'd like to focus this thread on the issue in Consumption mode. Outside of Consumption, with Always On, it should always stay hot. If it doesn't for someone, let's open a separate issue an investigate why Always On is not working.

davidebbo on 21 Feb 2017

89663 ms cold start at 2017-07-02 9:55 UTC for a GET on https://cbtraindev.azurewebsites.net/api/bert which is a node function with HTTP triggers & storage input. Around 2017-07-02 11:35 it took 120464 ms to get a response. Pure execution time according to the functions monitor was 117,492 ms

Consecutive _warm_ runs are mostly ~100ms sometimes going near 1000ms which is all good.

Are there lately any issues with coldstart perf compared to the previous comments?

anoff on 2 Jul 2017

We have the same issue, some cold starts take 20s and we require <1s.
I logged a feature request to add the always on option to consumption plans:
https://feedback.azure.com/forums/355860-azure-functions/suggestions/31332643-allow-always-on-mode-on-the-consumtion-plan

LTsLlama on 11 Sep 2017

everyone should vote the feature request @DL-LiveTiles made it will benefit all of us :)

touseefbsb on 21 Sep 2017

Is it still suggested to have a 5min timer to keep things warm?

And when doing so it it enough to just have the timer, or is it needed to make an http request to keep web triggers warm?

ricklove on 4 Oct 2017

We have optimizations that are partially deployed, and should be fully deployed in the next month. They take the cold start down to about 2 seconds. You can try it right now in the West US 2 region.

davidebbo on 5 Oct 2017

I have the same issue. I have a testing Function App in consumption plan in south central US. I setup a bi-hourly ping and it takes ~30 seconds to response.

kchanlee on 25 Oct 2017

I know this thread is about c# but I ended up here with my javascript app as well. So for anyone having issues on JS: Using the https://github.com/Azure/azure-functions-pack on JS significantly helps!

anoff on 25 Oct 2017

👍1

Has anyone figured out any solutions other than recurring calls? Will switching to non-consumption plan help?

khalid-halo on 11 Jan 2018

@khalid-halo Using the non-consumption plan with always on turned on is a solution.

LTsLlama on 11 Jan 2018

It seems the amount of time necessary to wait for a response to an HTTP function invocation is often a lot longer than what shows up in the Azure portal logs. Cold start times seem to be under 2 seconds in the logs, but some HTTP requests take > 12 seconds to receive a response. This is on the v2 pre-release.

0xacf on 4 Apr 2018

👍1

@0xacf you are correct that the time shows in the logs does not include the cold start of the Function runtime.

Also, note that the cold start optimization mentioned above (which uses warm 'placeholder' sites) is currently only for v1, so your v2 Preview app does not benefit from it. We will bring this optimization to v2 as it gets closer to become GA (sorry, no clear ETA yet).

davidebbo on 4 Apr 2018

@davidebbo Just to confirm:

By "warm placeholder sites... only for v1", are you saying that using a 5 minute trigger to keep a function server warm will not work with v2 yet?

ricklove on 4 Apr 2018

Ah, interesting. Even the logs show up to a 1.7 second execution time for the first invocation of a function after a long period of time. Subsequent executions take ~30ms. This function accesses Azure table storage, but creates a new storage client on each invocation. A function which does nothing except return a string literal exhibits similar behaviour, but only seems to get up to about 300ms in the logs.

0xacf on 4 Apr 2018

@ricklove no, that would work as well, as that keeps an instance alive. What I'm referring to is the case where you don't currently have any running instance, and one needs to be started. That's when placeholder comes into play to avoid paying for a full cold start.

davidebbo on 4 Apr 2018

Out of curiosity, have you considered adding an "always on" feature for the consumption plan, obviously with some surcharge? It seems like you can achieve a similar effect with a 5 minute trigger, but that also seems a much less clean solution. Also, using the "5 minute trigger" method, is there a guarantee that no function invocations will cause a cold start? For example, could a second instance be created for scaling, causing a 2 second latency for one request?

Perhaps it's just my use case, but even 2 seconds seems like an unusably long latency for an API request.

0xacf on 4 Apr 2018

@0xacf There have been talks of potential hybrid mode between App Service plan and Consumption. You'd have some dedicated VMs, but if it needs to scale beyond that for some requests bursts, it would move into Consumption mode. But this is just an idea at this point.

And yes, with the timer, you only warm up one instance.

davidebbo on 4 Apr 2018

👍1

@davidebbo Sorry, I'm not familiar with the placeholder you mentioned and I didn't find it in the comments above.

What is a "placeholder"? How do you achieve that and what effect will it have for v2 once ready?

ricklove on 4 Apr 2018

@ricklove placeholders is an internal optimization that works as follows:

Instances of the Function Runtime are kept warm, in an unassigned state (i.e. not attached to a specific user app).
When a user app needs to cold start, instead of doing a full cold start, we take a placeholder instance and 'specialize' it to the user app, which is quite a bit faster.

You don't need to do anything special to use it. It just makes cold start faster, but only for v1 at this point. Once it becomes available for v2, it will start helping without you needing to change anything.

davidebbo on 4 Apr 2018

For what it's worth, my personal preference would be purchasing capacity at the sandbox rather than the VM level. I would happily pay the $0.000016/GB-s rate or some multiple of it continuously to eliminate this cold start time, and have additional sandbox instances created in advance of when their capacity is required. I think this latency by itself prevents Functions from being useful for a large segment of applications (ie, anything that needs to respond in less than 2 seconds). Using dedicated VMs for low latency is a solution, but it kind of undermines one of the key benefits of the serverless paradigm (fine-grained scaling).

Just some thoughts. Thanks for the information, and the optimizations!

0xacf on 5 Apr 2018

👍3

I agree with @0xacf, We want a simple server less solution that allow us flexibility and low start-up times without having to even worry about how it work at the cost implications of different tiers. Adding a always on setting per function will be the perfect solution for us.

LTsLlama on 5 Apr 2018

davidebbo on 5 Apr 2018

👍5

Looking forward to the hybrid approach between consumption and app-service plans.

zeeshanejaz on 9 Apr 2018

👍1

I am having the same issue with slow cold starts which led me to set up a function to call two other functions, time response time, and notify me if either function takes longer than five seconds to respond (I wasn't sure if the warm-up needed to occur per-function or per-function app).
I started with triggering this "health check" to be called every five minutes and eventually turned it down to three, but I can still experience hours and hours with cold-start times of 10+ seconds on every single request.
The only input binding on the slow function is to Azure Table Storage, which returns a single row.
Can a function still "go cold" when called every three minutes? Do functions needs to be "warmed up" at the function level or at the function app level?
By the way, I implemented this function app in the portal thinking that I could prototype there and get an idea of performance before implementing as a precompiled function in source control from VS.

marcofalconi on 8 May 2018

@davidebbo We use C# precompiled functions and function proxies that we are deploying using ARM and VSTS. We were nowhere near the 2s cold start response time with a simple function. For instance, making a request to a function proxy using an unknown route took the proxy about 17-18s to respond with 404.

I did some testing and found out that disabling PHP on a function app added about 12-14s to the cold start response time. Setting PHP to v5.6 improved the cold start performance considerably. We disabled PHP in our ARM templates because we do not use it and will turn it back on now.

romandres on 6 Jun 2018

👍2

@romandres indeed, there is a known issue that hurts the cold start when PHP is removed. Note that this PHP setting is actually meaningless for Function Apps, so there is no harm in leaving it the way it is by default. We will work to address the issue, but for now just remove all traces of that setting from your templates.

davidebbo on 6 Jun 2018

👍1

@davidebbo Good to know, thanks. Are there any other settings that are known to impact the cold start performance at the moment?

EDIT: I'm asking because still there are some seconds in the response times that I cannot account for. I started to make some tests without Application Insights (removed the app setting) and that seemed to have an impact of a few seconds. One app cold starts about 6-7s slower with App Insights enabled.

romandres on 7 Jun 2018

@romandres sorry, missed this one earlier. I would not expect App Insights to affect cold start. If you see this consistently (fast without, slow with), can you open a separate issue so we can better track?

davidebbo on 16 Jun 2018

I am going to add the information I found on this topic:

To speed up the cold start process, run your functions as a package file when possible.
https://docs.microsoft.com/en-us/azure/azure-functions/run-functions-from-deployment-package

Azure Functions Pack (JS) packs dependencies for fast cold start
https://github.com/Azure/azure-functions-pack

A blog on cold starts
https://blogs.msdn.microsoft.com/appserviceteam/2018/02/07/understanding-serverless-cold-start/

delcon on 13 Nov 2018

👍2

@davidebbo I too have this problem. To me, the hybrid approach doesn't sound too great because I'd need to buy dedicated VMs for most of my expected consumption (with low latency) and use serverless functions only for the bursts (and of course the bursts will take a long time). This almost defeats the purpose of going truly serverless. Also, I was content with using the Web API framework as Functions seems to lack quite a bit of features, but it was a better choice to shave off costs. If I'm going to use actual VMs, I'd rater go back to Web APIs and have access to an actual HTTP pipeline and things like [Authorize] attributes.

Having said that, I may be misunderstanding the hybrid approach here. Why not use a dedicated VM only for the bursts instead? If there are no warm instances available on the consumption plan, then use the dedicated VM while the infrastructure warms up another instance behind the scenes.

adrianknight89 on 18 Dec 2018

@adrianknight89 I'm no longer in this project, so I will let others respond (e.g. @paulbatum). But this is an old issue and things have improved quite a bit since. I suggest that you add details on your scenario: e.g. what language, what OS, what runtime version, are you using Run As Package, what kind of timing you are observing, etc...

davidebbo on 18 Dec 2018

@paulbatum I’m using C#, Windows 10 (idk why OS matters though), v2, and zip deployment on the consumption tier.

My biggest complaint at the moment is every time I publish new changes (which I do very frequently), everything seems to be deployed to another server, and I incur the full cold start penalty.

I’ve read somewhere that v2 wasn’t as optimized as V1, but this was when it was still in preview. Do you still believe v2 lacks optimization compared to the older version?

As for dependencies, this is probably not the greatest idea, but why don’t you install some of the most popular libraries on your servers (and say the last 5-10 releases of each) and fallback to copying the files on the fly in case they don’t exist on the destination server? A lot of people use Newtonsoft, EF, Azure client SDKs, etc. If the source and the target have the same dependencies and versions, you could skip moving them over the wire.

adrianknight89 on 19 Dec 2018

I'm also experiencing this issue when using Azure function as http webhook. When a cold start happens, the caller (Azure Monitor Log Search Alert) times out and retries the call, resulting in the function being called twice. The function is a compiled C# function using V1 runtime and deployed using WEBSITE_RUN_FROM_PACKAGE.
It's unfortunate that the only solution seems to be switching to standard App Service plan away from Consumption mode

dennis-yemelyanov on 6 Jan 2019

@adrianknight89 By OS, David meant whether you're running your function app on Windows or Linux, as there are some important differences in terms of how cold start is handled between the two.

Typically, changes are not deployed to a new server/vm. If you make a change locally, deploy (such as with VS or VS Code), then hit the endpoint, most of the time that request will be routed to the same server that was running your app prior to deployment.

Instead, what I suspect you are hitting is the fact that when you make an assembly change or an app setting change, we restart the process to pick up that change, and when do this process restart, we do not reuse our 'placeholder' infrastructure that is used to optimize normal cold starts. This is a possible area of improvement that we have not had a chance to investigate yet. Let me try to chat with some folks and report back. We might want to file a new issue to discuss this scenario because it is really a "development" cold start instead of a "production" cold start.

Regarding common libraries - there are several factors that impact cold start time, but getting the necessary libraries onto the target server is not really one of them. These type of optimiziations can significantly help with build time (for example, having a cache of nuget modules to make nuget install complete faster) but that is a different issue.

paulbatum on 9 Jan 2019

I'm experiencing cold startup times exceeding 20 seconds in Consumption plan for my trivial C# function. I ping it every 5 minutes using App Insights to keep it warm, but it still shows sporadic >20s responses. What do I do?

ogvolkov on 8 Apr 2019

👍1

@ogvolkov I've started to experience the same issue too.

I have an availability test, that pings my API Management, that calls my AF v2

vitalybibikov on 17 Apr 2019

BTW, the same function, 10 days ago showed 10s cold start, instead of 20seconds

Normally, AF host is killed after 20 minutes of inactivity.

When warmed up , I can see a behavior, when 20s cold start happens every 60-70 mins, starting from April. Before that it was, around 10 secs from time to time.

Which looks like that Consumption Plan was changed, as now we have Premium Plan

vitalybibikov on 17 Apr 2019

This issue has been opened for a while and has covered a few different scenarios over the years. To make sure we're investigating new issues reported (as they won't have the same root cause), please open new issues so we have the right context and things are not missed.

Closing this.