While investigating another issue I stumbled upon a lot of these 409s coming from renewing the primary host lock. I did some investigation and I believe that, because we do a 'fire and forget' with the host disposal, it's possible that a new host comes up and acquires the lock just before the old host releases it. That means that the next attempt to renew fails, as the lock has been released.
The error looks like: Singleton lock renewal failed for blob 'locks/AppName/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2017-09-02T18:38:31.526Z (111006 milliseconds ago) with a duration of 57508 milliseconds. The lease period was 15000 milliseconds.
I think it's happening with something like below.
Note 1 -- this is all on a single instance and references to hosts are about the hosts being cycled via the ScriptHostManager.
Note 2 -- All of the host locks use the current machine id as the proposed lease, which allows multiple hosts on the same machine modify the lease.
Host1 -> acquire lock
Host1 -> renew lock
something triggers host recycle and creates a new host while disposing of the old host
Host2 -> acquire lock
Host1 -> release lock
Host2 -> renew lock -> throw
It's actually possible for several other combinations to happen (I've seen a race where Host1 renews right as it releases, also causing a throw).
Ultimately, these errors seem benign as they'll eventually recover and all will be well, but it causes a lot of errors in the logs (I see 100k+ 409 logs that involve the host lock over the last week).
For now solution is to add settings.job with
锘縶 "is_singleton": true }
It will not scale but at least it will not fail..
@brettsam You said "acquires the lock just before the old host releases it" - how can that happen? The lease can only be held by one. Perhaps what you're saying is something like:
However, PrimaryHostCoordinator.AcquireOrRenewLeaseAsync seems to already handle this case correctly - if it doesn't have the lease and fails to acquire it, no error is thrown.
@mathewc, the lease can be held by multiple hosts inside single instance as we use the same leaseId (instanceId) to acquire the lease. It's flow @brettsam mentioned:
Host1 -> acquire lock
Host1 -> renew lock
something triggers host recycle and creates a new host while disposing of the old host
Host2 -> acquire lock
Host1 -> release lock
Host2 -> renew lock -> throw
I see, thanks for clarifying. If the problem is that the renewal fails because we already have the lock then can't we handle that condition by ignoring that specific error?
Work has been merged in 1.x. Need to make sure this isn't an issue in 2.0.
Hi @mathewc @fabiocav Is there any plans to fix this issue on function app v1 ? or the only solution is to move to v2 ?
It's not clear whether this issue exists in v2 or not, however because this issue is so old and we haven't heard any complaints for v2 it likely doesn't. @itshemant09 are you seeing lots of these? The errors are benign, so shouldn't cause functional issues for you - just log noise.
Same issue in v2 java function app.
Singleton lock renewal failed for blob 'xxx/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 0001-01-01T00:00:00Z (-2147483648 milliseconds ago) with a duration of 0 milliseconds. The lease period was 15000 milliseconds.
Runtime version: 2.0.12507.0 (beta)
I am also getting this error in a v2 C# function app.
Singleton lock renewal failed for blob 'xxx/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-09-14T04:18:13.801Z (40380 milliseconds ago) with a duration of 7 milliseconds. The lease period was 15000 milliseconds.
Is there something we should be doing to prevent this? Or should we ignore this error?
I'm getting this error but it is causing my application to crash with an unhandled exception so I cannot ignore it.
We're also seeing this error. Any help would be great.
10/01/2019T08:21:52.000 -0500
Host lock lease acquired by instance ID 'a6e320715da18613c2cfb13182ae9d01'.
10/01/2019T08:22:00.000 -0500
Singleton lock renewal failed for blob 'aqinn1234-masterentitydata/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-10-01T13:21:35.637Z (24167 milliseconds ago) with a duration of 8 milliseconds. The lease period was 15000 milliseconds.
10/01/2019T08:22:00.000-0500
Failed to renew host lock lease: Another host has acquired the lease. The last successful renewal completed at 2019-10-01T13:21:35.637Z (24449 milliseconds ago) with a duration of 8 milliseconds.
These logs are mostly noise and shouldn't be causing any crashes. It just means that the host lease was lost temporarily and it will be re-acquired shortly.
@hisuwh if you can share your exact error message with us, that would be helpful.
For anyone seeing this, if you could share your Function App name (either explicitly or privately), I can see if I find anything from the backend logs.
@brettsam This is the error I'm getting:
[09:44:18 ERR] Singleton lock renewal failed for blob '6aaf581a4d74d37580b1ebda440a1171/Company.Application.AppProcess.CleanupFunction.Cleanup.Listener' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-10-02T09:41:14.402Z (184439 milliseconds ago) with a duration of 26 milliseconds. The lease period was 60000 milliseconds.
Unhandled Exception: Microsoft.WindowsAzure.Storage.StorageException: The lease ID specified did not match the lease ID for the blob.
at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteAsyncInternal[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext, CancellationToken token)
at Microsoft.Azure.WebJobs.Host.StorageBaseDistributedLockManager.SingletonLockHandle.RenewAsync(ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host.Storage\Singleton\BlobLeaseDistributedLockManager.cs:line 347
at Microsoft.Azure.WebJobs.Host.SingletonManager.RenewLeaseCommand.ExecuteAsync(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Singleton\SingletonManager.cs:line 332
at Microsoft.Azure.WebJobs.Host.Timers.TaskSeriesTimer.RunAsync(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Timers\TaskSeriesTimer.cs:line 147
at System.Threading.Thread.ThreadMain_ThreadStart()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
I'm not actually running in functions but using the Azure Webjobs SDK and running in kubernetes. Still like you say I wouldn't expect this error to kill my process but it is
@hisuwh -- That's a different issue (note that the blob name is /...Listener, rather than /host). There's an old, long issue about that here: https://github.com/Azure/azure-webjobs-sdk/issues/822. For that issue, there's no real fix that we can do without a significant architectural change to run some of our host operations in a separate processes. The cause is almost always due to functions blocking threads that prevent our lease renewal timers to fire, thus the leases expire unexpectedly. My recommendation for that issue is to evaluate your function code for potential threading issues -- make sure it's using async everywhere possible and not blocking any threads.
@brettsam its crashing at a point in time when none of the functions are actually firing. So I'm not sure how reviewing my code is going to help this issue.
I've pinged you on the other thread since your issue is not with the primary host lock. Let's continue there so we don't confuse this one.
I'm seeing this error with a long-running, scaled-up set of the latest 2.0 azure node functions runtime docker image using a blob trigger:
Singleton lock renewal failed for blob '***-functions/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-10-27T16:12:10.853Z (17048 milliseconds ago) with a duration of 4 milliseconds. The lease period was 15000 milliseconds.
Just after seeing this message, blob trigger processing of new blobs stops. Restarting 1 of the 8 functions containers did not cause processing to resume, though the restarted container did not issue any error messages, about locks or otherwise. Picking another 3 of the 8 functions containers, effectively at random, and restarting them caused new blob processing to resume.
Related to Azure/azure-webjobs-sdk-extensions#249 or #4784?
Anyone have any idea what may be causing this? We're still seeing it. It appears to occur pretty consistently and once we see the error, processing in our 6 concurrently running containers stops, usually for many hours.
I am also getting this error in a v1 C# function app.
Singleton lock renewal failed for blob 'locks/bingnrthawkeyealertingtest/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-11-18T02:35:14.847Z (12014 milliseconds ago) with a duration of 0 milliseconds. The lease period was 15000 milliseconds.
I followed this Troubleshooting , but this error is still there.
Is there something we should be doing to prevent this? Any insight you have would be greatly appreciated, thanks in advance for your help!
@XimenaWang -- Is this causing issues? The "primary host" should be able to switch between different instances and I don't believe you'll have any issues. See my comment above: https://github.com/Azure/azure-functions-host/issues/1864#issuecomment-537067586.
I believe @derekrprice is seeing something unrelated that is causing the polling for blobs to deadlock (we're following up elsewhere).
@brettsam
We are seeing the same error when using ServiceBus triggered AZ functions (v2) in the consumption tier (EAST US).
Singleton lock renewal failed for blob 'xx/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-11-18T21:32:58.201Z (11992 milliseconds ago) with a duration of 10 milliseconds. The lease period was 15000 milliseconds.
Not sure if this is of significance, but we see some signs that it occurs after the functions are done processing the messages on the queue.
Any lease renewals that end with /host are likely nothing to worry about. Instances can swap between "primary" hosts without affecting the running functions. Let me know if you see anything that may be leading to failures and I can investigate.
We've been seeing this error logged in our function apps (C# v2) fairly regularly. It doesn't appear to cause any issues, but assuming there is nothing to worry about it's just unhelpful noise. Are there any plans to 'fix' this or change the logging level so that it doesn't appear as an error?
Adding on to the issues people are seeing but I am still on v1 and just had a Function crash because of it. The issue is not benign on v1.
Singleton lock renewal failed for blob 'xx/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-01-14T16:30:56.506Z (16435 milliseconds ago) with a duration of 0 milliseconds. The lease period was 15000 milliseconds.
The remote server returned an error: (409) Conflict. The remote server returned an error: (409) Conflict.
An unhandled exception has occurred. Host is shutting down.
Any lease renewals that end with /host are likely nothing to worry about. Instances can swap between "primary" hosts without affecting the running functions. Let me know if you see anything that may be leading to failures and I can investigate.
Do you mean like "Singleton lock renewal failed for blob 'func-test-dev-001/host'"? I am seeing this after I added a deployment slot.
I also see this from time to time and would like to keep my errors to 0.
It doesn't happen often, but I have 46 function apps running so across all of them its constant. I think it happens during release and swapping.
And one function app cannot be swapped from Azure DevOps for some reason and this error is happening most often for that function app. I am not sure about causality.
But I think there is cause for getting to the bottom of this error even if it does seem benign it could be causing unexpected side effects.
@brettsam
Any lease renewals that end with /host are likely nothing to worry about. Instances can swap between "primary" hosts without affecting the running functions. Let me know if you see anything that may be leading to failures and I can investigate.
2020-04-15 09:35:27,550 Host.Singleton RD00155DE250FA ERROR Singleton lock renewal failed for blob 'fr911prodadapterprocessorfunc/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-04-15T09:34:52.747Z (34714 milliseconds ago) with a duration of 140 milliseconds. The lease period was 15000 milliseconds.
Overnight I saw this error a number of times, but it caused enough of a delay (5-10mins!) in consumption of ServiceBus messages that it triggered several application-level errors for us as processing these messages in a timely fashion is critical. Currently these are v2 functions with ServiceBus triggers deployed to an AppService that is scaled out on 2-N instances.
Paul
Disregard my previous - after further investigation these errors were just coincidental to another network issue outside of Azure.
I'm also seeing this error. This is the most recent instance:
Singleton lock renewal failed for blob 'myfunctionhere/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-04-26T09:50:57.017Z (18690 milliseconds ago) with a duration of 4 milliseconds. The lease period was 15000 milliseconds
I'm running on Functions v2 and all of the functions inside this app are HTTP triggered and short living.
I agree with the other people in this thread. If this isn't an issue, it shouldn't be logged as an error. Errors are something that we need to look at and not simply ignore (IMO).
Idea! Maybe use structured logging properties for the 2020-04-26T09:50:57.017Z (18690 milliseconds ago) part of the message. This would make it a lot easier for people to ignore errors of this type 馃憤
I have a webapp running azure functions and I am getting this everytime I restart the webapp or swap slots.
2020-09-15T10:01:12Z [Error] Singleton lock renewal failed for blob '.../host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2020-09-15T10:00:47.566Z (24432 milliseconds ago) with a duration of 3 milliseconds. The lease period was 15000 milliseconds.
@wpitallo any other side effect? This may happen if the primary host changes and the lease is stolen by another instance.
Not that I am aware of at this stage.
Same issue here with an Azure (~3) timer triggered function;
Singleton lock renewal failed for blob '.../host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 0001-01-01T00:00:00Z (-2147483648 milliseconds ago) with a duration of 0 milliseconds. The lease period was 15000 milliseconds.
Last successful lock time on this is weird. Doesn't seem to cause problems, just fills app insights with sev 3 errors
Hi All
I am facing same issue in my Azure Function app.
It seems like the real problem is that in non prod environments we are using single blob storage used by multiple Azure function apps. Hence when renew lease operation is performed by one of the Azure function app, then due multiple azure function apps using blob storage, the error of "Singleton lock renewal failed for blob" happens. Which in turns lead to restart of job host and hence azure function abruptly stops without any function error.
I can conclude the above fact based on the my investigation that our production environment doesn't have same setup of using single blob storage used by multiple azure function apps. And we have never face "Singleton lock renewal failed for blob" error in that environment.
So as per my understanding the solution to this error is avoid sharing single blob storage with multiple Azure functions.
TIA
I can confirm that this happens in environments with individual storage accounts for each function app as well. I always create a new storage account for every new function app I create. Sharing storage accounts introduce some other problems why I wouldn't recommend sharing storage accounts between function apps ever 馃憤
I'll be following this, since we also see this problem with a consumption plan Azure Function App
Most helpful comment
I am also getting this error in a v2 C# function app.
Singleton lock renewal failed for blob 'xxx/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-09-14T04:18:13.801Z (40380 milliseconds ago) with a duration of 7 milliseconds. The lease period was 15000 milliseconds.Is there something we should be doing to prevent this? Or should we ignore this error?