Tracking this information as an issue here from @olitomlinson. Quotes below from him:
I've got an open support case (120031225000535) because my DF app failed to start up for approx 30 minutes and then eventually corrected itself without intervention.
The error message during startup was :
Failed to bind to address http://127.0.0.1:17071: address already in use.
Only one usage of each socket address (protocol/network address/port) is normally permitted
My non-scientific googling has brought me to this GitHub issue. I don't really know if this work item impacts my particular issue, it probably doesn't, but are you in a position to shed any light? No worries if not, ill just keep following up with support to get to the root cause. Thanks in advance!
@olitomlinson
This is the port we are opening. We should be properly shutting it down, so it is concerning you are seeing this cause stalling on restarting your application.
That being said, I find it interesting that you would even have this code starting up. We only try to turn it on for non-.NET apps, as it is only used in our out-of-proc language SDKs (i.e. JavaScript and soon Python).
To mitigate in the meantime, can you make sure your application has FUNCTIONS_WORKER_RUNTIME set to dotnet? That should prevent us from starting up any listener on this port. The other option that should help mitigate is to localRpcEndpointEnabled to false in your host.json.
@ConnorMcMahon thanks for the swift reply!
Yes it appears that I don't have FUNCTIONS_WORKER_RUNTIME set in my configuration at all. I will apply it.
Btw is this configuration mandatory? I guess I'm confused because it doesn't _feel_ mandatory, but it kinda actually is?
I know it is something we flag as a warning in our internal tooling for investigating customer issues. That being said, I'm not sure if it is explicitly required, because we are often able to infer based on the code we end up seeing.
In general, I think our tooling now sets this automatically when you select your language, but I may be incorrect about that. Assuming your function application has been around for a long time, it could be from before we started setting that in the default app creation in our tools.
It just occurred to me that this listener behavior could be very problematic for apps that use slots because of how multiple instances may be running simultaneously on the same VM. We probably need to prioritize making the listener port selection dynamic like @anthonychu suggested to avoid these kinds of problems.
@cgillum @ConnorMcMahon We had this again last night, my fault for not getting the config update rolled out to prod in time.
However this time, we didn't get any traces from the host trying to start-up. Just exceptions. This Function App was therefor not consuming messages building up on Service Bus for the last 9 hours.
It seems weird that we've had 2 instances of this failure in the last few days. Has something changed in the underlying host environment that would make this issue more likely to happen?
This is being addressed in #1307.
@cgillum @ConnorMcMahon
I have 2.2.1 version installed in my functionapp and i am using slots.
Just got the error:
Failed to bind to address http://127.0.0.1:17071: address already in use. Only one usage of each socket address (protocol/network address/port) is normally permitted. Only one usage of each socket address (protocol/network address/port) is normally permitted.
I am using NodeJS for my functions code.
@francescopersico,
Hmm, that is very curious.
Do you have an application name you can share with us publicly (or privately).
Also, a rough timestamp in UTC of when this error occured would be helpful.
I am having this issue as well. A node durable function app that has been running for months suddenly went dead in the water and cannot start due to port in use error
@CastleArg does it stay in that state or does it recover after a minute or so?
nope it can't start at all. I even deleted and redeployed the app with no result. We have a prod environment set up in an identical way and that is still running funnily enough.
2020-05-28T08:21:29.406 [Error] A host error has occurred during startup operation '629af924-5cfd-4642-9647-847b8124ac55'.
System.IO.IOException : Failed to bind to address http://127.0.0.1:17071: address already in use. ---> Microsoft.AspNetCore.Connections.AddressInUseException : Only one usage of each socket address (protocol/network address/port) is normally permitted. ---> System.Net.Sockets.SocketException : Only one usage of each socket address (protocol/network address/port) is normally permitted.
at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error,String callerName)
at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot,SocketAddress socketAddress)
at System.Net.Sockets.Socket.Bind(EndPoint localEP)
at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketConnectionListener.Bind()
End of inner exception
at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketConnectionListener.Bind()
at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketTransportFactory.BindAsync(EndPoint endpoint,CancellationToken cancellationToken)
at async Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServer.<>c__DisplayClass21_01.<StartAsync>g__OnBind|0[TContext](??)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.BindEndpointAsync(ListenOptions endpoint,AddressBindContext context)
End of inner exception
at async Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.BindEndpointAsync(ListenOptions endpoint,AddressBindContext context)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.AspNetCore.Server.Kestrel.Core.ListenOptions.BindAsync(AddressBindContext context)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.AddressesStrategy.BindAsync(AddressBindContext context)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.BindAsync(IServerAddressesFeature addresses,KestrelServerOptions serverOptions,ILogger logger,Func2 createBinding)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServer.StartAsyncTContext at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7839
at DryIoc.Scope.GetOrAdd(Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7824
at DryIoc.Factory.ApplyReuse(Expression serviceExpr,Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6594
at DryIoc.Factory.GetExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6554
at DryIoc.Factory.GetDelegateOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6624
at DryIoc.DelegateFactory.GetDelegateOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7730
at DryIoc.Container.DryIoc.IResolver.Resolve(Type serviceType,Object serviceKey,IfUnresolved ifUnresolved,Type requiredServiceType,Request preResolveParent,Object[] args) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 289
at lambda_method(Closure ,IResolverContext )
at DryIoc.Factory.<>c__DisplayClass26_0.
at DryIoc.Scope.TryGetOrAdd(ImMap1 items,Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7839
at DryIoc.Scope.GetOrAdd(Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7824
at DryIoc.Factory.ApplyReuse(Expression serviceExpr,Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6594
at DryIoc.Factory.GetExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6554
at DryIoc.Factory.GetDelegateOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6624
at DryIoc.Container.ResolveAndCacheDefaultFactoryDelegate(Type serviceType,IfUnresolved ifUnresolved) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 209
at DryIoc.Container.DryIoc.IResolver.Resolve(Type serviceType,IfUnresolved ifUnresolved) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 194
at Microsoft.Azure.WebJobs.Script.WebHost.DependencyInjection.JobHostServiceProvider.GetService(Type serviceType,IfUnresolved ifUnresolved) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\JobHostServiceProvider.cs : 99
at Microsoft.Azure.WebJobs.Script.WebHost.DependencyInjection.JobHostServiceProvider.GetRequiredService(Type serviceType) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\JobHostServiceProvider.cs : 82
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider,Type serviceType)
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider)
at Microsoft.Azure.WebJobs.WebJobsServiceCollectionExtensions.<>c.<AddWebJobs>b__1_4(IServiceProvider p) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Hosting\WebJobsServiceCollectionExtensions.cs : 87
at DryIoc.Microsoft.DependencyInjection.DryIocAdapter.<>c__DisplayClass3_0.<RegisterDescriptor>b__0(IResolverContext r) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\DryIocAdapter.cs : 156
at DryIoc.Registrator.<>c__DisplayClass27_0.<RegisterDelegate>b__0(IResolverContext r) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 4540
at lambda_method(Closure ,IResolverContext )
at DryIoc.Factory.<>c__DisplayClass26_0.<ApplyReuse>b__2() at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6595
at DryIoc.Scope.TryGetOrAdd(ImMap1 items,Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7839
at DryIoc.Scope.GetOrAdd(Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7824
at DryIoc.Factory.ApplyReuse(Expression serviceExpr,Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6594
at DryIoc.Factory.GetExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6554
at DryIoc.Factory.GetDelegateOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6624
at DryIoc.DelegateFactory.GetDelegateOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7730
at DryIoc.Container.DryIoc.IResolver.Resolve(Type serviceType,Object serviceKey,IfUnresolved ifUnresolved,Type requiredServiceType,Request preResolveParent,Object[] args) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 289
at lambda_method(Closure ,IResolverContext )
at DryIoc.Factory.<>c__DisplayClass26_0.
at DryIoc.Scope.TryGetOrAdd(ImMap`1 items,Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7839
at DryIoc.Scope.GetOrAdd(Int32 id,CreateScopedValue createValue,Int32 disposalOrder) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7824
at DryIoc.Factory.ApplyReuse(Expression serviceExpr,Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6594
at DryIoc.Factory.GetExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6554
at DryIoc.ReflectionFactory.CreateExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7073
at DryIoc.Factory.GetExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6544
at DryIoc.ReflectionFactory.CreateExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 7073
at DryIoc.Factory.GetExpressionOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6544
at DryIoc.Factory.GetDelegateOrDefault(Request request) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 6624
at DryIoc.Container.ResolveAndCacheDefaultFactoryDelegate(Type serviceType,IfUnresolved ifUnresolved) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 209
at DryIoc.Container.DryIoc.IResolver.Resolve(Type serviceType,IfUnresolved ifUnresolved) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\DryIoc\Container.cs : 194
at Microsoft.Azure.WebJobs.Script.WebHost.DependencyInjection.JobHostServiceProvider.GetService(Type serviceType,IfUnresolved ifUnresolved) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\JobHostServiceProvider.cs : 99
at Microsoft.Azure.WebJobs.Script.WebHost.DependencyInjection.JobHostServiceProvider.GetService(Type serviceType) at D:\a\1\s\src\WebJobs.Script.WebHost\DependencyInjection\JobHostServiceProvider.cs : 77
at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetServiceT
at async Microsoft.Azure.WebJobs.Script.WebHost.WebJobsScriptHostService.UnsynchronizedStartHostAsync(ScriptHostStartupOperation activeOperation,Int32 attemptCount,JobHostStartupMode startupMo
Thanks for the info.
Which version of the Durable extension are you using? The latest version is supposed to select a different port number based on availability.
In any case, we added a kill switch to this feature just in case it caused problems in unexpected scenarios. You can disable it by setting the localRpcEndpointEnabled to false in host.json.
Here is an example:
{
"version": "2.0",
"extensions": {
"durableTask": {
"localRpcEndpointEnabled": false
}
}
}
Try that and let us know if it resolves the issue. The side effect of this is to revert the durableClient to the old behavior of invoking the external-facing management APIs instead of using the internal ones on the local machine. In most cases, you should only notice a slight performance degradation for durableClient API calls.
Thanks will give this a try.
These are function runtime V2 apps. Should I update to runtime v3?
I'm simultaneously burnt by not being able to locally as I have node 12 installed.
{
"version": "2.0",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[1.*, 2.0.0)"
}
}
I am using extension bundle like so.
@cgillum setting "localRpcEndpointEnabled": false allowed it to start again. Thanks for the quick advice.
This bug still affects my productive durable functions V3 instances! Earlier today all backend operations died because some other function app on my app service plan randomly snagged away port 17071 causing my production functions instance to die / not to be able to start. I'm seeing this behavior on my azure since Mai 28
@kepikoi, did you try disabling the feature in your application like recommended above?
The problem for apps using Node is that you are using extension bundles, which operate on v1.x of the extension. This feature was introduced in v1.8.5 of the extension, which likely rolled out in extension bundles recently.
The fix for this is currently only on v2 of the extension. You can manually install v2 of the extension by not using extension bundles and installing via the CLI, or when version 2 of the extension bundles rolls out in the near future, you can update to that.
I am reopening the issue until we port this fix into v1 of the extension and include it in the extension bundle release.
@ConnorMcMahon I am up and running again using
"extensions": {
"durableTask": {
"localRpcEndpointEnabled": false
}
}
Good to know that manual extension install fixes it. Looks like I need to move away from extension bundles to regain control over my environments
@ConnorMcMahon @cgillum Since getting an updated bundles release out has a bit of lead time, wondering if we can prioritize backporting this to v1 so we can get it out ASAP.
The backport is merged and I am hoping to release 1.8.6 today, and update the extension bundles repo so it will go out on the next train.
Is there any update on this, when we can expect it?