Hi there.
If you add the spring-boot-starter-actuator as dependency, after the first http call, the app will consume 6%-7% CPU forever (on a few threads). Even when there is no active http connection nor application background thread working.
If I remove this dependency, the app is idle at 0% as expected.
Edit
Spring Boot version 2.0.0.0M7
Java 8
Archive.zip
The demo app have been generated with https://start.spring.io/
Steps to reproduce the issue
1) Build the app
2) Launch it using java -jar
3) Monitor CPU usage using htop => Nothing special is seen
4) Open a web browser and browse to localhost:8080 => error page is shown
5) Check CPU usage on htop => 6-7% used endlessly by the app
@dgsqrs What version of Spring Boot are you using? Can you share a small sample that shows the issue?
Hi @philwebb .
I updated my initial post with the requested information.
I can reproduce this. After the first request, the CPU seems to be busy:

VisualVM shows this:

This behaviour seems to be related with the new metrics support, since excluding the autoconfiguration with @SpringBootApplication(exclude = MetricsAutoConfiguration.class) "fixes" this.
@jkschneider is this related to the GC pause detection or some specific instrumentation? Is the cost constant even when the application is busy or is this just something to keep the VM busy on purpose?
Is this related to the GC pause detection
Yes, I believe so (and other system-level pauses unrelated to GC).
Is the cost constant even when the application is busy
Yes.
I'm not sure if it's possible to improve this mechanism to use no CPU. We could offer a config option to tradeoff monitoring accuracy for CPU consumption, though it's unclear to me what the default should be.
More research is in order. Perhaps there is a way to accomplish accurate pause detection and indiscernible CPU consumption.
My initial feeling is that we should set the default to not consume CPU. I expect that GC pause detection is probably quite an advanced problem for most users and by the time that they need it they'll be quite a long way into their journey with metrics. Constant CPU usage, however, is probably going to get noticed.
tldr: I think this should be configurable and by default off. In the docs we should call it out.
I don't necessarily disagree with the default if the mechanism can't be improved, but this kind of reasoning really troubles me:
by the time that they need it they'll be quite a long way into their journey with metrics
Unless we compensate for pauses, they are basically looking at garbage. This lures folks into a false sense of security, the "blindfold with a lollipop" state of being.
I saw a study once (conveniently can't remember where) that suggested something like 20% of Java apps exhibited periodic pauses of 5 sec or greater. Even if both of these figures were far lower, it's still frightening. If nothing else, I want the default set of metrics people get to guide them to responsible operationalization, e.g. you should probably run more than one instance in prod and have some circuit breaker mechanism to control SLA boundaries.
I understand the temptation to view this as a day 2 problem. It's just not clear to me that many organizations ever even realize there is a problem in the first place.
At a cursory glance, I'm pretty confident we can implement a pause detector that trades off pause-length granularity for apparent CPU consumption. Then we can select a default that still captures well the eye-popping GC pauses.
I found that setting the system property LatencyUtils.useActualTime to false disables the SimplePauseDetector but keeps the rest of micrometer running.
It switches LatencyUtils into a test mode such that TimeServices#sleepNanos will call wait() on a monitor that never fires.
I found that setting the system property
LatencyUtils.useActualTimetofalse
Wow, the hacking prize goes to you today, @osialr :). Thanks for the workaround option.
@jkschneider do you have a micrometer issue that we can track?
Testing the fix this morning. Here is the issue: https://github.com/micrometer-metrics/micrometer/issues/328
Summarizing the changes coming in Micrometer 1.0.0-rc.6:
The pause detector is configurable to an extent via MeterRegistry#config(). The default pause detector will still be a clock-drift detector which needs to wake up periodically to check for drift. By configuring the sleep interval (time between checks), you can control CPU consumption. We've selected a default in rc6 that reduces the CPU consumption to about 0.1% while still providing decent compensation for long pauses.
You can change the intervals on the clock-drift detector like so:
@Bean
MeterRegistryConfigurer registryConfig() {
return r -> r.config().pauseDetector(new ClockDriftPauseDetector(sleepInt, pauseInt));
}
We also provide an alternate, noop implementation. Unless you are concerned about even the slightest amount of CPU use, this is not recommended:
@Bean
MeterRegistryConfigurer registryConfig() {
return r -> r.config().pauseDetector(new NoPauseDetector());
}
It's possible that we'll find alternate pause detector implementations in the future. Perhaps the JDK will someday provide an event-based interface for GC events. Or we could probably provide a detector limited to GC events by scraping GC logs.
I've tried another round of testing using the latest Spring Boot snapshots; I'm not really seeing an improvement here and even seeing conflicting results.
Note that I'm looking at CPU usage on the PID line, which is the amount of CPU used by the process last time it was scheduled on a CPU. So "1%" means "1% on the CPU it was scheduled on", not "1% of the host capacity". This might be biased, but I'm trying to ignore other processes running on the host.
Spring Boot and Actuator:

Spring Boot and Actuator, after first request:

Spring Boot and Actuator, NoPauseDetector:

Spring Boot and Actuator, NoPauseDetector after first request:

Spring Boot without Actuator:

Spring Boot with Actuator, but Metrics auto-configuration ignored:

NoPauseDetector doesn't seem to make a big difference in my testsThere might be something else at play here and we need to look deeper.
Any idea @jkschneider @osialr ?
Fixed a bug in micrometer-core that prevented the configured detector from being used
(https://github.com/micrometer-metrics/micrometer/issues/350).
@jkschneider Is there a SNAPSHOT available that we can try? I'd like to test this before you cut RC8
@philwebb Yes. Snapshots are published on every successful build to repo.spring.io
Thanks @jkschneider. Did you have a PR lined up for this as well? I can't remember what we discussed and if the upgrade is easy or not.
I've hacked a draft PR here.
I'm not sure about a few of those changes, especially in our test suite where we were previously counting measures (if I'm not mistaken). But the test suite is green.
@bclozel Oops, I guess you got a little ahead of me.
I'll be piling more commits into: https://github.com/spring-projects/spring-boot/pull/11776
OK so the update to the snapshot has fixed the issue for me. @dgsqrs if you can give 2.0.0.BUILD-SNAPSHOT a try that would be great.
Most helpful comment
I found that setting the system property
LatencyUtils.useActualTimetofalsedisables theSimplePauseDetectorbut keeps the rest of micrometer running.It switches LatencyUtils into a test mode such that
TimeServices#sleepNanoswill callwait()on a monitor that never fires.