Micronaut-core: Performance: handleRouteMatch

Created on 25 Nov 2019  路  12Comments  路  Source: micronaut-projects/micronaut-core

Profiling:

public class App {

  @Controller("/service")
  public static class Control {

    @Get
    public CompletionStage<String> handle(Optional<String> str) {
      return CompletableFuture.completedFuture(str.orElse("nop"));
    }
  }

  public static void main(String[] args) {
    Micronaut.run(App.class, args);
  }
}

With:

ab -c 10 -n 10000000 -k http://localhost:8080/service?str=asd

Jprofiler with full instrumentation shows:

  • 36% of cpu time is spent in channelRead0 with:

    • 26% spent in handleRouteMatch



      • 14% in RequestArgumentSatisfier.fullfilArgumentRequirements


      • 10% in RouteMatch.execute


      • 2% in RoutingInBoundHandler.prepareRouteForExecution



    • 3% spent in AbstractNettyHttpRequest.getPath (2% from getContentType)

    • 2% spent in findAllClosest

  • 23% of cpu time spent in the ExecutorScheduler reactive call stack (see: https://github.com/micronaut-projects/micronaut-core/issues/2338)

This is just an example of course.
The real use case is that I am trying to upgrade a pure Netty app to Micronaut. The service is very lightweight: it only has a few GET endpoints and returns data cached in-memory. I am obviously expecting overhead from micronaut when "upgrading" from pure Netty, but CPU usage (more than) doubling (with a cpu distribution very similar to what presented here) isn't on my list of acceptable criteria.

Using micronaut 1.3.0.M1 & Oracle JDK 11.

improvement

Most helpful comment

With https://github.com/micronaut-projects/micronaut-core/commit/4c6ffe3a4d3bd30638067018d1cd14f17760a5e4 we're currently looking at a 125% performance improvement in RequestArgumentSatisfier.fullfilArgumentRequirements

All 12 comments

Screen Shot 2019-11-24 at 5 21 14 PM

Any code change suggestions are welcome

I wouldn't mind but this seems like a rather big change (no single hot function) and I don't have that kind of time on my hands to deep-dive into the project (unfortunately)... Detailed architecture docs would help understand how it is set up and why.

I also question the heavy utilization of streams for core functionalities in this project. In my experience, using streams over tiny collections does not perform well at scale.

I agree on the fundamental usage of streaming. It is not meant to be a replacement over simple iterations especially on small collections and when data is already available.

If there are any places where you think the usage of a stream can be replaced with iteration and it would not impact the readability or maintainability of the code and it can be determined that it is in part responsible for performance issues, please let me know so we can change it.

I will look at this when I get back next week. I have some ideas and it was an area on my todo list to optimize anyway

Pretty small optimization, but just overridding the methods in EmptyAnnotationMetadata results in a 50% performance improvement for RequestArgumentSatisfier.fullfilArgumentRequirements

With https://github.com/micronaut-projects/micronaut-core/commit/4c6ffe3a4d3bd30638067018d1cd14f17760a5e4 we're currently looking at a 125% performance improvement in RequestArgumentSatisfier.fullfilArgumentRequirements

handleRouteMatch is no longer appearing as a hotspot, at least not in YourKit, so closing this for the moment. There are probably other areas we can improve, but those are separate issues

@graemerocher What code did you use for your benchmark? Is the performance gain still there when the endpoint requires/accepts query parameters, headers or body of various types?

I upgraded to 1.3.0.M2 and run the same benchmark. This is the new call stack profile for handleRouteMatch:

Screen Shot 2019-12-24 at 9 41 40 AM

It seems slightly better but out of the 33% of cpu used in this call stack:

  • 10% is still from fulfillArgumentRequirements
  • 6% is from filterPublisher + subscribeToResponsePublisher
  • 0.8% is from getProduces
  • 0.5% is from getFirstTypeVariable
  • 0.3% is from stream
  • 0.2% is from map
  • 0.3% is from switchMap
    etc.
    Besides fulfillArgumentRequirements, it seems there is a little bit of overhead everywhere...
Was this page helpful?
0 / 5 - 0 ratings