Micronaut-core: Performance: handleRouteMatch

Created on 25 Nov 2019 · 12Comments · Source: micronaut-projects/micronaut-core

Profiling:

public class App {

  @Controller("/service")
  public static class Control {

    @Get
    public CompletionStage<String> handle(Optional<String> str) {
      return CompletableFuture.completedFuture(str.orElse("nop"));
    }
  }

  public static void main(String[] args) {
    Micronaut.run(App.class, args);
  }
}

With:

ab -c 10 -n 10000000 -k http://localhost:8080/service?str=asd

Jprofiler with full instrumentation shows:

36% of cpu time is spent in channelRead0 with:
- 26% spent in handleRouteMatch
  - 14% in RequestArgumentSatisfier.fullfilArgumentRequirements
  - 10% in RouteMatch.execute
  - 2% in RoutingInBoundHandler.prepareRouteForExecution
- 3% spent in AbstractNettyHttpRequest.getPath (2% from getContentType)
- 2% spent in findAllClosest
23% of cpu time spent in the ExecutorScheduler reactive call stack (see: https://github.com/micronaut-projects/micronaut-core/issues/2338)

This is just an example of course.
The real use case is that I am trying to upgrade a pure Netty app to Micronaut. The service is very lightweight: it only has a few GET endpoints and returns data cached in-memory. I am obviously expecting overhead from micronaut when "upgrading" from pure Netty, but CPU usage (more than) doubling (with a cpu distribution very similar to what presented here) isn't on my list of acceptable criteria.

Using micronaut 1.3.0.M1 & Oracle JDK 11.

improvement

Source

fabienrenaud

Most helpful comment

With https://github.com/micronaut-projects/micronaut-core/commit/4c6ffe3a4d3bd30638067018d1cd14f17760a5e4 we're currently looking at a 125% performance improvement in RequestArgumentSatisfier.fullfilArgumentRequirements

graemerocher on 16 Dec 2019

👍3

All 12 comments

Screen Shot 2019-11-24 at 5 21 14 PM

fabienrenaud on 25 Nov 2019

Any code change suggestions are welcome

jameskleeh on 26 Nov 2019

I wouldn't mind but this seems like a rather big change (no single hot function) and I don't have that kind of time on my hands to deep-dive into the project (unfortunately)... Detailed architecture docs would help understand how it is set up and why.

I also question the heavy utilization of streams for core functionalities in this project. In my experience, using streams over tiny collections does not perform well at scale.

fabienrenaud on 26 Nov 2019

👍1

I agree on the fundamental usage of streaming. It is not meant to be a replacement over simple iterations especially on small collections and when data is already available.

Klaus-Nji-sp on 26 Nov 2019

If there are any places where you think the usage of a stream can be replaced with iteration and it would not impact the readability or maintainability of the code and it can be determined that it is in part responsible for performance issues, please let me know so we can change it.

jameskleeh on 26 Nov 2019

I will look at this when I get back next week. I have some ideas and it was an area on my todo list to optimize anyway

graemerocher on 27 Nov 2019

Pretty small optimization, but just overridding the methods in EmptyAnnotationMetadata results in a 50% performance improvement for RequestArgumentSatisfier.fullfilArgumentRequirements

graemerocher on 16 Dec 2019

👍3

handleRouteMatch is no longer appearing as a hotspot, at least not in YourKit, so closing this for the moment. There are probably other areas we can improve, but those are separate issues

graemerocher on 17 Dec 2019

@graemerocher What code did you use for your benchmark? Is the performance gain still there when the endpoint requires/accepts query parameters, headers or body of various types?

fabienrenaud on 17 Dec 2019

https://github.com/micronaut-projects/micronaut-core/blob/master/benchmarks/src/jmh/java/io/micronaut/http/server/binding/RequestArgumentSatisfierBenchmark.java

graemerocher on 17 Dec 2019

I upgraded to 1.3.0.M2 and run the same benchmark. This is the new call stack profile for handleRouteMatch:

Screen Shot 2019-12-24 at 9 41 40 AM

It seems slightly better but out of the 33% of cpu used in this call stack:

10% is still from fulfillArgumentRequirements
6% is from filterPublisher + subscribeToResponsePublisher
0.8% is from getProduces
0.5% is from getFirstTypeVariable
0.3% is from stream
0.2% is from map
0.3% is from switchMap
etc.
Besides fulfillArgumentRequirements, it seems there is a little bit of overhead everywhere...