maybe this jar will caused the OOM problem
after my application runned 24 hours, it will be shown.
dump the memory, amost all objects is instances of org/LatencyUtils/LatencyStats$PauseTracker .
Could you report that issue to https://github.com/micrometer-metrics/micrometer ?
Possibly related to #11333
Possibly related to #11333
Only vaguely in that they are related to the same mechanism.
@chenjinkai Are you super positive that this is running on Boot 2 M7 and that you have a version of micrometer >= 1.0.0-rc.4 on your runtime classpath?
We had a memory leak in 1.0.0-rc.3 (Boot 2 M6) that was fixed in rc4. It's not impossible that there is still a leak, but I'm having a difficult time reproducing it.
One idea: perhaps you have a custom metric that is running up the cardinality on a particular tag? In effect, you create a separate Timer for every unique combination of name and tags. Since there are by default 5 histograms kept in a ring buffer for every timer with 1 LatencyStats and therefore 1 PauseDetector per histogram, in such a scenario it might appear that PauseDetector is the dominator. If you have 1/5 as many Timer instances as PauseDetector's when the OOM occurs, we can be confident this is a tag cardinality problem and not a general leak in Timer.
Here's some plotting of a test app running with micrometer-core:1.0.0-rc.4 (~4 weeks) and rc.5 (~11 hours). No problems observable. Note, this is with a steady amount of meters.


Now - as Jon said - produce Timers on a steady basis with changing tag values, actually making them distinct metrics on each construction one can quickly get to OOMs. Testapp with -Xmx64m with Timer created every second with a distinct tag. Does lead to OOM within minutes.
Observed instance count:



I'd second @jkschneider wrt tag cardinality "problem" on user side.
Thanks @mweirauch !
@chenjinka, let us know if you can reproduce this consistently with a recent version and/or if it's actually about creating multiple Timers for tags/name combinations.
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.
Looks like we encountered the same issue, when migrated our application from M5 to M7. It is stable reproduces twice per day, so I can try to provide more data, but I'm not sure that we will be able to reproduce it in more managed environment.
Anyway I think the issue itself should be reopened and somehow solved.
Some additional information. We have two services on M7 with the same load. The interesting that the second service is OK, while the first one is stable leaking.
The main difference is that the first service uses RestTemplate to call the second service.
@erlioniel that's interesting. Could you please give it a try with 2.0.0.BUILD-SNAPSHOT? We've upgraded to micrometer 1.0.0-rc.7 in the meantime with specific changes around RestTemplate.
@erlioniel Also, can you verify the cardinality of all meters you have in your app? A runaway tag value on a timer will cause an OOM eventually.
@snicoll okay, sounds a bit risky but we will try tomorrow.
@jkschneider we are not using any specific meters in our application and I don't see any suspicious objects in the service heap. But maybe you can describe what I can check to be sure?
If you have 5,850 LatencyStats objects while accepting the defaults elsewhere, you have 1,170 distinct timers living in the app. Assuming this is somehow caused by RestTemplate, let's inspect the tags of that metric:
http://YOURSERVICE/actuator/metrics/http.client.requests
You'll have to specifically enable that actuator endpoint in Boot 2.
Just a guess, but I suspect you have a lot of tags because the app is pre-filling path variables:
restTemplate.getForObject("/test/{id}", String.class, 123); // good
restTemplate.getForObject("/test/123", String.class); // bad
@snicoll We currently have a MeterFilter#maximumAllowableMetrics(int) concept for an application-wide max. But it would be trivial to add an additional MeterFilter#maximumAllowableMetrics(String, int) to limit tag explosion on a given metric name. And we could add some default cap in RestTemplateMetricsConfiguration to prevent instrumentation from causing memory problems when the RestTemplate interface is misused.
@adriancole FYI. We might consider a similar option in Brave.
when the RestTemplate interface is misused.
I am not sure I agree with that statement. Perhaps we should make the instrumentation of RestTemplate completely optional and clarify in the doc what is required to avoid those side effects?
@jkschneider Yes, your assumption was correct. Looks like we are using restTemplate in a wrong way and spring creates a new tag for every single request.
URI uri = fromUri(path)
.queryParam("param", value)
.build().encode().toUri();
return restTemplate.getForObject(uri, Entity.class);
Should I check SNAPSHOT version or it is correct behavior of the framework? If it is I think there should be more clear way to avoid the problem like this, because it isn't easy to debug and fix actually.
@jkschneider @snicoll
Btw we are using UriComponentsBuilder because we have to pass array as GET param into the endpoint. So it might be useful to create overload for getForObject() with UriComponentsBuilder (instead of URI) for instrumentation and collecting statistics in cases like we have.
I will leave my solution here, just for devs who will look how to disable / change tags for RestTemplate metrics or will fight with the same issue.
@Bean
public RestTemplateExchangeTagsProvider restTemplateTagConfigurer() {
return new CustomRestTemplateExchangeTagsProvider();
}
private static class CustomRestTemplateExchangeTagsProvider implements RestTemplateExchangeTagsProvider {
@Override
public Iterable<Tag> getTags(String urlTemplate, HttpRequest request, ClientHttpResponse response) {
String url = substringBefore(urlTemplate, "?");
return asList(
RestTemplateExchangeTags.method(request),
RestTemplateExchangeTags.uri(url),
RestTemplateExchangeTags.status(response),
RestTemplateExchangeTags.clientName(request));
}
}
P.S. Will not work with path variables
@erlioniel So, in your case, it was the query parameters that were causing the problem rather than path variables?
If it is I think there should be more clear way to avoid the problem like this, because it isn't easy to debug and fix actually.
Agreed. My plan is to add a MeterFilter limiting the tag cardinality of http.client.requests.
@wilkinsona yes
@jkschneider nice plan btw, is there any issue for that? Just to follow.
Also I will leave this link here :) https://github.com/spring-cloud/spring-cloud-netflix/commit/5a127d07002daa6e7cb8ef8d4e9ce302aa49e84c
@erlioniel Here is the associated Micrometer issue: https://github.com/micrometer-metrics/micrometer/issues/363
We need to port See https://github.com/spring-projects/spring-boot/pull/11776/commits/1a9c268b6baecaa44ffb770e736c40960c625cd3 but we first need at least some of #11800.