Micrometer: Fails to push metrics to Stackdriver with short step interval and large batch size

Created on 8 Mar 2019  路  12Comments  路  Source: micrometer-metrics/micrometer

Hi, I'm using micrometer version 1.1.3 and spring boot 2.1.0.RELEASE. StackdriverMeterRegistry is defined as a spring bean :

@Bean
  public static StackdriverMeterRegistry stackdriver() {
    return StackdriverMeterRegistry.builder(new StackdriverConfig() {
      @Override
      public String projectId() {
        return "test-project";
      }

      @Override
      public String get(String key) {
        return null;
      }

      @Override
      public Duration step() {
        return Duration.ofSeconds(5);
      }
    }).build();
  }

However, when I ran the app on GCP I keep seeing the following warning without any stacktrace :
failed to send metrics to stackdriver: INTERNAL: http2 exception

It's strange because I can push metrics from the same application and environment using just google-cloud-monitoring library:
https://cloud.google.com/monitoring/docs/reference/libraries#client-libraries-usage-java

Could you please let me know if I'm missing something in configuration etc.

Thanks,
Gleb

bug stackdriver

All 12 comments

@mglebka Thanks for the report! I didn't look into this but no stack trace is not aligned with other meter registries, so I created #1269 to resolve it.

I reproduced the error with a sample and the stack trace was as follows:

2019-03-09 05:23:56.139  WARN 82634 --- [trics-publisher] i.m.s.StackdriverMeterRegistry           : failed to send metrics to stackdriver

com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
    at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67) ~[gax-1.34.0.jar:1.34.0]
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72) ~[gax-grpc-1.34.0.jar:1.34.0]
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60) ~[gax-grpc-1.34.0.jar:1.34.0]
    at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97) ~[gax-grpc-1.34.0.jar:1.34.0]
    at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68) ~[api-common-1.7.0.jar:na]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123) ~[guava-20.0.jar:na]
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) ~[guava-20.0.jar:na]
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:900) ~[guava-20.0.jar:na]
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:811) ~[guava-20.0.jar:na]
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:675) ~[guava-20.0.jar:na]
    at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:507) ~[grpc-stub-1.15.0.jar:1.15.0]
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482) ~[grpc-stub-1.15.0.jar:1.15.0]
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:403) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.15.0.jar:1.15.0]
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[grpc-core-1.15.0.jar:1.15.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_101]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_101]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_101]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_101]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_101]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
    Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
        at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57) ~[gax-1.34.0.jar:1.34.0]
        at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112) ~[gax-1.34.0.jar:1.34.0]
        at com.google.cloud.monitoring.v3.MetricServiceClient.createTimeSeries(MetricServiceClient.java:1156) ~[google-cloud-monitoring-1.50.0.jar:1.50.0]
        at io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:163) ~[micrometer-registry-stackdriver-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_101]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_101]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_101]
        ... 3 common frames omitted
Caused by: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
    at io.grpc.Status.asRuntimeException(Status.java:526) ~[grpc-core-1.15.0.jar:1.15.0]
    ... 23 common frames omitted
Caused by: io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$HeaderListSizeException: Header size exceeded max allowed size (8192)
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:171) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:228) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:541) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:128) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2HeadersDecoder.decodeHeaders(DefaultHttp2HeadersDecoder.java:127) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$HeadersBlockBuilder.headers(DefaultHttp2FrameReader.java:745) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:483) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:491) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:254) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1407) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
    ... 1 common frames omitted

There were 66 metrics and no problem with the default step size, 1 mintue. I changed it to 5 seconds to reproduce it. If I changed batch size to 10, it works with 5 seconds step size although there are some unrelated failures when sending metrics.

I didn't look into how it contributes to HTTP header size yet.

@mglebka Could you try it with a smaller batch size?

@izeye , thanks for reproducing it, the nearest I can try is Monday, I will let you know if it works.

@izeye , I tried default step size of 1 minute and exception does not occur anymore, however I keep seeing few other exceptions :
failed to send metrics to stackdriver: INVALID_ARGUMENT: Field timeSeries[44].points[0].distributionValue had an invalid value: Distribution bucket_counts(43) has a negative count.
and
i.m.s.StackdriverMeterRegistry : failed to send metrics to stackdriver: INTERNAL: An internal error occurred.

Im also facing this issue, one of my buckets is negative. I did a little debugging like so:


 if (latencyNanos < 0) {
        print(s"got latency < 0 ${latencyNanos} ${name}" )
      }

      Timer
        .builder("akka.streams.upstreamlatency")
        .tags("step", name)
        .sla(Micrometer.timerPercentiles: _*)
        .description("Difference between a push and the last pull, gives a reading of how much time the upstream takes to produce an element after it's been requested")
        .register(micrometer)
        .record(latencyNanos, TimeUnit.NANOSECONDS)

And im definately not sending negative values in, somehow in the conversions the values are becoming negative

@LukeDefeo @mglebka I was facing the same problem, and created a separate issue for this as it seems to be unrelated to the http2 exception from your original post.
As a local workaround I copied the StackdriverMeterRegistry class and changed a line, which could be useful for you. here is the opened issue if you're interested: #1325

I think we need to understand what affects the header size so we can prevent making a batch of metrics to send that exceeds the allowed header size. It is strange to me though that the step interval seems to affect this. I would think the batch size alone would affect the header size.

Is there a known workaround for this? Either by changing the batch size or the step?

What was the filter name you have used for finding micrometer latency(@timed) metric in below screen.
Screenshot (86)

Is there a known workaround for this? Either by changing the batch size or the step?

@pedro-carneiro As mentioned in previous comments, it seems increasing the step size (say to the default of 1 minute) and/or reducing the batch size to something smaller works around the error. We'll still need to get to the bottom of why this is happening at all.

I had tried different combinations, yet I was only able to _reduce_ the number of errors, not avoid them entirely...

Was this page helpful?
0 / 5 - 0 ratings