Spring-boot: Nullpointer in SimpleInMemoryRepository

Created on 3 Jun 2016  路  7Comments  路  Source: spring-projects/spring-boot

We're using spring boot 1.3.2 with Angel.SR6 for spring cloud. The service is a reverse proxy with zuul and occasionally we get this exception in the log:

2016-05-26T13:33:09.221+0200;1.3;0.0.0;http-nio-10.5.31.21-8080-exec-74:140;-;-;WARN ;org.springframework.boot.actuate.autoconfigure.MetricsFilter;[Unable to submit counter metric 'status.500.userservice-rest.star-star']
    java.lang.NullPointerException: null
    at org.springframework.boot.actuate.metrics.util.SimpleInMemoryRepository.update(SimpleInMemoryRepository.java:44)
    at org.springframework.boot.actuate.metrics.repository.InMemoryMetricRepository.increment(InMemoryMetricRepository.java:51)
    at org.springframework.boot.actuate.metrics.writer.DefaultCounterService.increment(DefaultCounterService.java:44)
    at org.springframework.boot.actuate.autoconfigure.MetricsFilter.incrementCounter(MetricsFilter.java:205)
    at org.springframework.boot.actuate.autoconfigure.MetricsFilter.recordMetrics(MetricsFilter.java:138)
    at org.springframework.boot.actuate.autoconfigure.MetricsFilter.doFilterInternal(MetricsFilter.java:110)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

It usually occurs at night on a test system with not much traffic.

bug

Most helpful comment

It's a race condition in SimpleInMemoryRepository when the app is under GC pressure. It uses a reference map internally, and the javadocs for that say that under GC pressure values can be removed from the map at any time. We could turn the if() into a while(). At the point where the pressure is applied the metrics repository will basically be unreliable though, so it might be good to log something as well.

All 7 comments

It's a race condition in SimpleInMemoryRepository when the app is under GC pressure. It uses a reference map internally, and the javadocs for that say that under GC pressure values can be removed from the map at any time. We could turn the if() into a while(). At the point where the pressure is applied the metrics repository will basically be unreliable though, so it might be good to log something as well.

This is a bit wonky too:

Object lock = this.locks.putIfAbsent(name, new Object());
if (lock == null) {
    lock = this.locks.get(name);
}

The recommended approach is to perform a get first to check if the map already contains a value. Only if that returns null is putIfAbsent then called. This avoids always creating the new Object, thereby reducing unnecessary garbage. Races are handled by checking for putIfAbsent returning a non-null value.

At the point where the pressure is applied the metrics repository will basically be unreliable though, so it might be good to log something as well.

Given that the reference map is being used to store lock objects, the repository isn't thread-safe under GC pressure. I think we should fix that rather than just logging a message.

@dsyer Is the use of a reference map really necessary? We won't save any memory for the keys as they're also the keys in the values map so. That just leaves each Object instance that's used as a lock. Given the difficulty of making a reference map-based implementation thread-safe and the fairly small memory saving that it offers, my preference would be to change it to a "normal" map.

I guess maybe you are right. You are sometimes willing to forgo metric accuracy in return for not crashing the app. But what we have now is inaccurate and crashes, so not really a great compromise. What would be better would be an active map with TTL expiry, so maybe we could make the map instance configurable (injectable) and go with a normal map as the default.

I don't see how we could allow a Map to be injected and get the required thread-safety guarantees. For example, we can use a ConcurrentHashMap and not need any locking. We can't do that if someone gives us a plain old HashMap. If we really want to make it pluggable, I think a new strategy interface that returns a lock object for a particular name would be a better solution. Feels too much for 1.3.x, though.

ConcurrentMap is an interface. Could we accept one of those as a plugin? If not, then let's just fix it.

I haven't managed to convince myself that we can just accept a ConcurrentMap as a plugin. It might well be possible to implement TTL expiry without requiring additional locking, but until I can convince myself that's the case I'd rather not give people the opportunity to shoot themselves in the foot.

In short, I've gone for the let's just fix it option.

Was this page helpful?
0 / 5 - 0 ratings