Micrometer: StatsdMeterRegistry stops sending measurements

Created on 6 Jul 2020  路  9Comments  路  Source: micrometer-metrics/micrometer

Hi,

it seems that https://github.com/micrometer-metrics/micrometer/issues/2138 didn't fix the underlying issue.

seems that there's a bug there that causes FD to be opened and never closed.
here are some stack traces

1:

"udp-nio-1" #342 daemon prio=5 os_prio=0 tid=0x00007f17c968d800 nid=0x75a runnable [0x00007f1780a2d000]
   java.lang.Thread.State: RUNNABLE
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.DatagramSocket.checkOldImpl(DatagramSocket.java:308)
    at java.net.DatagramSocket.<init>(DatagramSocket.java:211)
    at sun.nio.ch.DatagramSocketAdaptor.<init>(DatagramSocketAdaptor.java:69)
    at sun.nio.ch.DatagramSocketAdaptor.create(DatagramSocketAdaptor.java:75)
    at sun.nio.ch.DatagramChannelImpl.socket(DatagramChannelImpl.java:169)
    - locked <0x00000006e0112a38> (a java.lang.Object)
    at io.micrometer.shaded.io.netty.channel.socket.nio.NioDatagramChannelConfig.<init>(NioDatagramChannelConfig.java:117)
    at io.micrometer.shaded.io.netty.channel.socket.nio.NioDatagramChannel.<init>(NioDatagramChannel.java:151)
    at io.micrometer.shaded.io.netty.channel.socket.nio.NioDatagramChannel.<init>(NioDatagramChannel.java:118)
    at sun.reflect.GeneratedConstructorAccessor174.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at io.micrometer.shaded.io.netty.channel.ReflectiveChannelFactory.newChannel(ReflectiveChannelFactory.java:44)
    at io.micrometer.shaded.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:310)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider.lambda$acquire$0(NewConnectionProvider.java:73)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$$Lambda$604/25764044.accept(Unknown Source)
    at io.micrometer.shaded.reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57)
    at io.micrometer.shaded.reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
    at io.micrometer.shaded.reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:79)
    at io.micrometer.shaded.reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4204)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribeWith(Mono.java:4330)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4190)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4126)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4073)
    at io.micrometer.statsd.StatsdMeterRegistry.retryReplaceClient(StatsdMeterRegistry.java:272)
    at io.micrometer.statsd.StatsdMeterRegistry.connectAndSubscribe(StatsdMeterRegistry.java:261)
    at io.micrometer.statsd.StatsdMeterRegistry.lambda$prepareUdpClient$7(StatsdMeterRegistry.java:233)
    at io.micrometer.statsd.StatsdMeterRegistry$$Lambda$576/1161148117.accept(Unknown Source)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.lambda$onStateChange$1(UdpClientDoOn.java:71)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn$$Lambda$1568/2015065879.dispose(Unknown Source)
    at io.micrometer.shaded.reactor.netty.DisposableChannel$$Lambda$1570/650154424.run(Unknown Source)
    at io.micrometer.shaded.reactor.core.publisher.LambdaMonoSubscriber.onComplete(LambdaMonoSubscriber.java:132)
    at io.micrometer.shaded.reactor.core.publisher.Operators.complete(Operators.java:135)
    at io.micrometer.shaded.reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:45)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4219)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribeWith(Mono.java:4330)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4190)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4126)
    at io.micrometer.shaded.reactor.netty.DisposableChannel.onDispose(DisposableChannel.java:134)
    at io.micrometer.shaded.reactor.netty.Connection.onDispose(Connection.java:268)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.onDispose(ChannelOperations.java:192)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.onStateChange(UdpClientDoOn.java:71)
    at io.micrometer.shaded.reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:514)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$NewConnectionObserver.onStateChange(NewConnectionProvider.java:207)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:427)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.onInboundClose(ChannelOperations.java:385)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:76)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
    at io.micrometer.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.micrometer.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.micrometer.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.micrometer.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

2:

"udp-nio-1" #342 daemon prio=5 os_prio=0 tid=0x00007f17c968d800 nid=0x75a runnable [0x00007f1780a2d000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.Net.socket0(Native Method)
    at sun.nio.ch.Net.socket(Net.java:411)
    at sun.nio.ch.DatagramChannelImpl.<init>(DatagramChannelImpl.java:111)
    at sun.nio.ch.SelectorProviderImpl.openDatagramChannel(SelectorProviderImpl.java:42)
    at io.micrometer.shaded.io.netty.channel.socket.nio.NioDatagramChannel.newSocket(NioDatagramChannel.java:87)
    at io.micrometer.shaded.io.netty.channel.socket.nio.NioDatagramChannel.<init>(NioDatagramChannel.java:118)
    at sun.reflect.GeneratedConstructorAccessor174.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at io.micrometer.shaded.io.netty.channel.ReflectiveChannelFactory.newChannel(ReflectiveChannelFactory.java:44)
    at io.micrometer.shaded.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:310)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider.lambda$acquire$0(NewConnectionProvider.java:73)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$$Lambda$604/25764044.accept(Unknown Source)
    at io.micrometer.shaded.reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57)
    at io.micrometer.shaded.reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
    at io.micrometer.shaded.reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:79)
    at io.micrometer.shaded.reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4204)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribeWith(Mono.java:4330)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4190)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4126)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4073)
    at io.micrometer.statsd.StatsdMeterRegistry.retryReplaceClient(StatsdMeterRegistry.java:272)
    at io.micrometer.statsd.StatsdMeterRegistry.connectAndSubscribe(StatsdMeterRegistry.java:261)
    at io.micrometer.statsd.StatsdMeterRegistry.lambda$prepareUdpClient$7(StatsdMeterRegistry.java:233)
    at io.micrometer.statsd.StatsdMeterRegistry$$Lambda$576/1161148117.accept(Unknown Source)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.lambda$onStateChange$1(UdpClientDoOn.java:71)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn$$Lambda$1568/2015065879.dispose(Unknown Source)
    at io.micrometer.shaded.reactor.netty.DisposableChannel$$Lambda$1570/650154424.run(Unknown Source)
    at io.micrometer.shaded.reactor.core.publisher.LambdaMonoSubscriber.onComplete(LambdaMonoSubscriber.java:132)
    at io.micrometer.shaded.reactor.core.publisher.Operators.complete(Operators.java:135)
    at io.micrometer.shaded.reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:45)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4219)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribeWith(Mono.java:4330)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4190)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4126)
    at io.micrometer.shaded.reactor.netty.DisposableChannel.onDispose(DisposableChannel.java:134)
    at io.micrometer.shaded.reactor.netty.Connection.onDispose(Connection.java:268)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.onDispose(ChannelOperations.java:192)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.onStateChange(UdpClientDoOn.java:71)
    at io.micrometer.shaded.reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:514)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$NewConnectionObserver.onStateChange(NewConnectionProvider.java:207)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:427)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperations.onInboundClose(ChannelOperations.java:385)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:76)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818)
    at io.micrometer.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.micrometer.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.micrometer.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.micrometer.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

3: after a couple of more minutes

"udp-nio-1" #342 daemon prio=5 os_prio=0 tid=0x00007f17c968d800 nid=0x75a runnable [0x00007f1780a2d000]
   java.lang.Thread.State: RUNNABLE
    at sun.misc.Unsafe.unpark(Native Method)
    at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264)
    at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.offer(ScheduledThreadPoolExecutor.java:1027)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.add(ScheduledThreadPoolExecutor.java:1037)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.add(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:328)
    at java.util.concurrent.ScheduledThreadPoolExecutor.scheduleAtFixedRate(ScheduledThreadPoolExecutor.java:573)
    at io.micrometer.shaded.reactor.core.scheduler.Schedulers.workerSchedulePeriodically(Schedulers.java:1263)
    at io.micrometer.shaded.reactor.core.scheduler.ExecutorServiceWorker.schedulePeriodically(ExecutorServiceWorker.java:63)
    at io.micrometer.shaded.reactor.core.publisher.FluxInterval.subscribe(FluxInterval.java:69)
    at io.micrometer.shaded.reactor.core.publisher.Flux.subscribe(Flux.java:8325)
    at io.micrometer.shaded.reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:418)
    at io.micrometer.shaded.reactor.core.publisher.FluxArray$ArraySubscription.slowPath(FluxArray.java:126)
    at io.micrometer.shaded.reactor.core.publisher.FluxArray$ArraySubscription.request(FluxArray.java:99)
    at io.micrometer.shaded.reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:363)
    at io.micrometer.shaded.reactor.core.publisher.FluxMerge.subscribe(FluxMerge.java:69)
    at io.micrometer.shaded.reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
    at io.micrometer.shaded.reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:54)
    at io.micrometer.shaded.reactor.core.publisher.Flux.subscribe(Flux.java:8325)
    at io.micrometer.shaded.reactor.netty.channel.MonoSendMany.subscribe(MonoSendMany.java:82)
    at io.micrometer.shaded.reactor.core.publisher.Mono.subscribe(Mono.java:4219)
    at io.micrometer.shaded.reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172)
    at io.micrometer.shaded.reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
    at io.micrometer.shaded.reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:79)
    at io.micrometer.shaded.reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46)
    at io.micrometer.shaded.reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57)
    at io.micrometer.shaded.reactor.netty.udp.UdpClient.lambda$handle$3(UdpClient.java:276)
    at io.micrometer.shaded.reactor.netty.udp.UdpClient$$Lambda$575/811207775.accept(Unknown Source)
    at io.micrometer.shaded.reactor.netty.udp.UdpClientDoOn.onStateChange(UdpClientDoOn.java:66)
    at io.micrometer.shaded.reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:514)
    at io.micrometer.shaded.reactor.netty.resources.NewConnectionProvider$NewConnectionObserver.onStateChange(NewConnectionProvider.java:207)
    at io.micrometer.shaded.reactor.netty.channel.ChannelOperationsHandler.channelActive(ChannelOperationsHandler.java:65)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
    at io.micrometer.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:305)
    at io.micrometer.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:249)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1342)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:548)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:533)
    at io.micrometer.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:517)
    at io.micrometer.shaded.io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:978)
    at io.micrometer.shaded.io.netty.channel.AbstractChannel.connect(AbstractChannel.java:253)
    at io.micrometer.shaded.io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:250)
    at io.micrometer.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
    at io.micrometer.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
    at io.micrometer.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.micrometer.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.micrometer.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

bug statsd

Most helpful comment

After a lot of investigation and debugging with the help of @violetagg, I think we understand what is happening and have an idea for a fix, which we will need some people experiencing this to test out with snapshots once available. I have no way of reliably triggering this issue, despite much effort. To state what we think we know about the issue:

  • micrometer-registry-statsd with UDP from version 1.5.2 can be affected
  • It is triggered by an exception (other than PortUnreachableException) in sending StatsD metrics
  • Logic to reconnect has a bug that results in an infinite loop of opening and closing connections

An exception other than PortUnreachableException when sending StatsD metrics is the part I do not know how to reliably reproduce. All micrometer-registry-statsd users may not experience it either. For those that do experience it, it is a severe issue and it is high priority for us to get a fix out for this.

All 9 comments

For what it is worth, we are observing the same issue. udp-nio-1 soon after a restart begins to consume 100% available CPU capacity.

Same here: thread udp-nio-1 goes into endless loop opening and closing DatagrammChannel. For every two minutes this activity allocates ~30Gb of memory, which is then released after ~1 second GC pause.
image

image

I'm not attaching stack traces, since they seem to be exactly the same.

Here is heap histogram with ~1.2 millions of datagram channels created:
image

This is pretty serious issue. We will have to stop publishing any metrics because of this...

Yes, I also found the same issue here. This is pretty serious issue for me.
Thank you

After a lot of investigation and debugging with the help of @violetagg, I think we understand what is happening and have an idea for a fix, which we will need some people experiencing this to test out with snapshots once available. I have no way of reliably triggering this issue, despite much effort. To state what we think we know about the issue:

  • micrometer-registry-statsd with UDP from version 1.5.2 can be affected
  • It is triggered by an exception (other than PortUnreachableException) in sending StatsD metrics
  • Logic to reconnect has a bug that results in an infinite loop of opening and closing connections

An exception other than PortUnreachableException when sending StatsD metrics is the part I do not know how to reliably reproduce. All micrometer-registry-statsd users may not experience it either. For those that do experience it, it is a severe issue and it is high priority for us to get a fix out for this.

I hit a similar wall with this issue as well, and while I was not able to get a repro in a test env outside of my macbook, I think this issue might be easily solved by using the DatagramChannel directly. It seems like sending metrics in general is specific to every backend impl, so nothing is keeping MM on the reactive netty client specifically? I'd also recommend looking at https://github.com/DataDog/java-dogstatsd-client, as they've done some good work on their UDP sender

I have merged a fix for this issue, including the infinite loop of creating connections. It would be greatly appreciated if a few users affected by this could try out the latest 1.5.4-SNAPSHOT version published and let us know if the issue is resolved with the changes made.
Also @chengzp please test with the latest snapshots because the changes were reverted for #2151 but the underlying issue should still be fixed.

We checked the 1.5.4-SNAPSHOT for the last 24 hours and the issue seems to be fixed.
The previous errors (sending duplicate metrics / sending no metrics at all) usually appeared every 2-4 hours but now everything looks fine.

@garry-mcfly thank you so much for trying out snapshots and reporting back. This is very helpful. We'll work on getting another patch release out soon with the fix unless someone else reports a problem before then.

Was this page helpful?
0 / 5 - 0 ratings