Spring-boot: Performance regression from 1.5.x

Created on 30 Nov 2017  Â·  12Comments  Â·  Source: spring-projects/spring-boot

2.0.0 seems to be slower to start and I'd like to know why!

task

Most helpful comment

Finally had a chunk of time to dedicate to this, breaking out a number of new issues:

  • [ ] Reduce number of exceptions thrown on startup (no benefit in the end)

    • [ ] Implement ClassUtils.getMethodIfAvailable() to remove NoSuchMethodException (SPR-16320)

    • [ ] Refactor AnnotatedElementUtils to remove NoSuchMethodException(SPR-16321)

  • [x] Reduce amount of garbage generated when loading fat jars (#11405)
  • [x] Use fast exceptions then enumerating found fat jar resources (#11406)
  • [ ] Use alternative BeanInfo mechanism (#9873 & SPR-16322) (doesn't seem to help)
  • [x] Cache OnEnableEndpointCondition default enabled result (#11407)
  • [x] Use shared static converter with Binder (#11352)
  • [x] Make methods inline friendly (#11409)
  • [x] Change PropertyMapper to use arrays rather than lists (#11411)
  • [x] Create JSR 303 binder validator in a background thread (#11412)
  • [ ] Improve Spring Security Startup Time (https://github.com/spring-projects/spring-security/issues/4915 & #11573)
  • [x] Don't cause early initialization from WebMvcMetricsFilter #11571
  • [x] Start BackgroundPreinitializer earlier #11570

All 12 comments

Some timing notes:

GC

2.0.0

java -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -jar target/*.jar | grep GC                                                                                                                                        
2017-12-18T13:55:08.899+0800: 0.656: [GC (Allocation Failure)  65536K->4743K(251392K), 0.0040314 secs]
2017-12-18T13:55:09.181+0800: 0.938: [GC (Allocation Failure)  70279K->7921K(251392K), 0.0060079 secs]
2017-12-18T13:55:09.430+0800: 1.187: [GC (Allocation Failure)  73457K->8195K(251392K), 0.0061011 secs]
2017-12-18T13:55:09.436+0800: 1.194: [GC (Metadata GC Threshold)  9643K->8065K(316928K), 0.0054321 secs]
2017-12-18T13:55:09.442+0800: 1.199: [Full GC (Metadata GC Threshold)  8065K->7701K(220672K), 0.0215755 secs]
2017-12-18T13:55:10.129+0800: 1.887: [GC (Allocation Failure)  138773K->14315K(220672K), 0.0055671 secs]
2017-12-18T13:55:10.783+0800: 2.540: [GC (Allocation Failure)  145387K->21461K(335360K), 0.0098723 secs]
2017-12-18T13:55:11.693+0800: 3.450: [GC (Metadata GC Threshold)  247050K->29365K(338944K), 0.0139084 secs]
2017-12-18T13:55:11.707+0800: 3.464: [Full GC (Metadata GC Threshold)  29365K->23158K(371712K), 0.0749999 secs]

1.5.x

java -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -jar target/*.jar | grep GC                                                                                                                                       
2017-12-18T13:55:01.892+0800: 0.518: [GC (Allocation Failure)  65536K->4312K(251392K), 0.0036483 secs]
2017-12-18T13:55:02.135+0800: 0.761: [GC (Allocation Failure)  69848K->8099K(251392K), 0.0061107 secs]
2017-12-18T13:55:02.380+0800: 1.006: [GC (Allocation Failure)  73635K->7604K(251392K), 0.0052713 secs]
2017-12-18T13:55:02.625+0800: 1.251: [GC (Metadata GC Threshold)  66324K->9032K(316928K), 0.0078808 secs]
2017-12-18T13:55:02.633+0800: 1.259: [Full GC (Metadata GC Threshold)  9032K->8723K(250368K), 0.0269571 secs]
2017-12-18T13:55:03.387+0800: 2.013: [GC (Allocation Failure)  139795K->20439K(250368K), 0.0090122 secs]
2017-12-18T13:55:03.842+0800: 2.468: [GC (Allocation Failure)  151511K->22637K(311296K), 0.0090661 secs]
2017-12-18T13:55:04.448+0800: 3.074: [GC (Allocation Failure)  212077K->27815K(313344K), 0.0122303 secs]

Spring Security's BCrypt class appears as hot method. Add a NoOpPasswordEncoder bean yields the following results:

NoOp:

Started SampleActuatorUiApplication in 3.632 seconds (JVM running for 4.032)
Started SampleActuatorUiApplication in 3.536 seconds (JVM running for 3.968)
Started SampleActuatorUiApplication in 3.437 seconds (JVM running for 3.846)

BCrypt:

Started SampleActuatorUiApplication in 3.826 seconds (JVM running for 4.237)
Started SampleActuatorUiApplication in 3.777 seconds (JVM running for 4.193)
Started SampleActuatorUiApplication in 3.654 seconds (JVM running for 4.067)

I've raise https://github.com/spring-projects/spring-security/issues/4915

Finally had a chunk of time to dedicate to this, breaking out a number of new issues:

  • [ ] Reduce number of exceptions thrown on startup (no benefit in the end)

    • [ ] Implement ClassUtils.getMethodIfAvailable() to remove NoSuchMethodException (SPR-16320)

    • [ ] Refactor AnnotatedElementUtils to remove NoSuchMethodException(SPR-16321)

  • [x] Reduce amount of garbage generated when loading fat jars (#11405)
  • [x] Use fast exceptions then enumerating found fat jar resources (#11406)
  • [ ] Use alternative BeanInfo mechanism (#9873 & SPR-16322) (doesn't seem to help)
  • [x] Cache OnEnableEndpointCondition default enabled result (#11407)
  • [x] Use shared static converter with Binder (#11352)
  • [x] Make methods inline friendly (#11409)
  • [x] Change PropertyMapper to use arrays rather than lists (#11411)
  • [x] Create JSR 303 binder validator in a background thread (#11412)
  • [ ] Improve Spring Security Startup Time (https://github.com/spring-projects/spring-security/issues/4915 & #11573)
  • [x] Don't cause early initialization from WebMvcMetricsFilter #11571
  • [x] Start BackgroundPreinitializer earlier #11570

Benchmark timings with Framework and Security patches and all Boot changes except the BeanInfo update:

SpringBoot15xBenchmark.explodedJarLauncher       avgt   10  2.910 ± 0.079   s/op
SpringBoot15xBenchmark.explodedJarMain           avgt   10  2.007 ± 0.032   s/op
SpringBoot15xBenchmark.fatJar                    avgt   10  2.738 ± 0.025   s/op

SpringBoot20M7Benchmark.explodedJarLauncher       avgt   10  3.610 ± 0.125   s/op
SpringBoot20M7Benchmark.explodedJarMain           avgt   10  2.390 ± 0.052   s/op
SpringBoot20M7Benchmark.fatJar                    avgt   10  3.383 ± 0.035   s/op

SpringBoot20xBenchmark.explodedJarLauncher       avgt   10  2.952 ± 0.048   s/op
SpringBoot20xBenchmark.explodedJarMain           avgt   10  2.288 ± 0.034   s/op
SpringBoot20xBenchmark.fatJar                    avgt   10  2.815 ± 0.040   s/op

And petclinic:

--- 1.5.x
PetclinicLatestBenchmark.devtoolsRestart         avgt   10  1.347 ± 0.066   s/op
PetclinicLatestBenchmark.explodedJarMain         avgt   10  4.400 ± 0.077   s/op
PetclinicLatestBenchmark.fatJar                  avgt   10  5.868 ± 0.093   s/op
PetclinicLatestBenchmark.noverify                avgt   10  5.184 ± 0.071   s/op

--- 2.0.0.M7
PetclinicLatestBenchmark.devtoolsRestart  avgt   10  0.001 ±  0.001   s/op
PetclinicLatestBenchmark.explodedJarMain  avgt   10  5.561 ±  0.159   s/op
PetclinicLatestBenchmark.fatJar           avgt   10  7.616 ±  0.083   s/op
PetclinicLatestBenchmark.noverify         avgt   10  6.936 ±  0.171   s/op

--- 2.0.0.BUILD-SNAPSHOT
PetclinicLatestBenchmark.devtoolsRestart  avgt   10  0.001 ±  0.001   s/op
PetclinicLatestBenchmark.explodedJarMain  avgt   10  5.166 ±  0.080   s/op
PetclinicLatestBenchmark.fatJar           avgt   10  6.766 ±  0.087   s/op
PetclinicLatestBenchmark.noverify         avgt   10  6.001 ±  0.060   s/op

Relevant info from @dsyer


Here's some interesting performance data for 2.0.0.M7

Boot     Benchmark           (sample)  Mode  Cnt  Score   Error  Units
2.0.0.M7 MainBenchmark.main      actr    ss   10  1.623 ± 0.036   s/op
2.0.0.M7 MainBenchmark.main      actr(*) ss   10  1.469 ± 0.028   s/op
1.5.9    OldBenchmark.old        actr    ss   10  1.386 ± 0.083   s/op

The (*) sample has -Dspring.autoconfigure.exclude=org.springframework.boot.actuate.autoconfigure.metrics.MetricsAutoConfiguration -Dmanagement.endpoints.jmx.enabled=false
I spotted those as hotspots using an Aspect that prints timings for AbstractAutowireCapableBeanFactory.initializeBean()

That takes us about half way back to 1.5.9. Only considering Actuator features. Their initialization probably could be deferred - e.g. WebMvcMetricsFilter eagerly instantiates a bean in Spring MVC that would otherwise be @Lazy (HandlerMappingIntrospector)

Excluding JMX also speeds up 1.5.9, but I don't think there was an option to exclude only Actuator JMX in 1.5.9

TomcatServletWebServerFactory also shows up on that hostspot list

Can't switch that one off, but maybe it shows us a place to look

Using https://github.com/dsyer/spring-boot-startup-bench with the following:

cd static; mvn install; java -jar target/benchmarks.jar main snap old

Currently gives me:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      empt  avgt   10  0.823 ± 0.014   s/op
SnapBenchmark.snap      demo  avgt   10  1.681 ± 0.071   s/op
SnapBenchmark.snap      actr  avgt   10  2.222 ± 0.067   s/op
SnapBenchmark.snap      jdbc  avgt   10  1.926 ± 0.061   s/op
SnapBenchmark.snap      actj  avgt   10  2.526 ± 0.035   s/op
SnapBenchmark.snap      jpae  avgt   10  3.002 ± 0.078   s/op
SnapBenchmark.snap      conf  avgt   10  3.026 ± 0.362   s/op

MainBenchmark.main      empt    ss   10  0.958 ± 0.037   s/op
MainBenchmark.main      demo    ss   10  1.869 ± 0.095   s/op
MainBenchmark.main      actr    ss   10  2.514 ± 0.063   s/op
MainBenchmark.main      jdbc    ss   10  2.107 ± 0.082   s/op
MainBenchmark.main      actj    ss   10  2.797 ± 0.205   s/op
MainBenchmark.main      jpae    ss   10  3.168 ± 0.219   s/op
MainBenchmark.main      conf    ss   10  2.884 ± 0.111   s/op
MainBenchmark.main      erka    ss   10  4.098 ± 0.232   s/op
MainBenchmark.main      busr    ss   10  4.374 ± 0.642   s/op
MainBenchmark.main      zuul    ss   10  5.800 ± 0.460   s/op
MainBenchmark.main      erkb    ss   10  5.872 ± 0.447   s/op
MainBenchmark.main      slth    ss   10  6.006 ± 0.611   s/op

OldBenchmark.old        empt    ss   10  0.747 ± 0.094   s/op
OldBenchmark.old        demo    ss   10  1.761 ± 0.261   s/op
OldBenchmark.old        actr    ss   10  2.190 ± 0.190   s/op
OldBenchmark.old        jdbc    ss   10  1.984 ± 0.093   s/op
OldBenchmark.old        actj    ss   10  2.470 ± 0.140   s/op
OldBenchmark.old        jpae    ss   10  2.887 ± 0.178   s/op
OldBenchmark.old        conf    ss   10  3.185 ± 0.276   s/op
OldBenchmark.old        erka    ss   10  4.083 ± 0.846   s/op
OldBenchmark.old        buna    ss   10  3.076 ± 0.313   s/op
OldBenchmark.old        busr    ss   10  3.944 ± 0.330   s/op
OldBenchmark.old        zuul    ss   10  5.926 ± 0.504   s/op
OldBenchmark.old        erkb    ss    5  5.713 ± 0.387   s/op
OldBenchmark.old        slth    ss   10  5.915 ± 0.671   s/op

(this is with the Spring Security patch applied)

| Project | 1.5.x | Milestone | Snapshot |
| --- | --- | --- | --- |
| empt | 0.747 ± 0.094 | 0.958 ± 0.037 | 0.823 ± 0.014 |
| demo | 1.761 ± 0.261 | 1.869 ± 0.095 | 1.681 ± 0.071 |
| actr | 2.190 ± 0.190 | 2.514 ± 0.063 | 2.222 ± 0.067 |
| jdbc | 1.984 ± 0.093 | 2.107 ± 0.082 | 1.926 ± 0.061 |
| actj | 2.470 ± 0.140 | 2.797 ± 0.205 | 2.526 ± 0.035 |
| jpae | 2.887 ± 0.178 | 3.168 ± 0.219 | 3.002 ± 0.078 |
| conf | 3.185 ± 0.276 | 2.884 ± 0.111 | 3.026 ± 0.362 |

Those results look a bit different to mine so it's a surprise. Also, JPA isn't particularly expensive in Boot 2.0 for me. One observation: your error bars are quite large. This often indicates that you ran the tests on a noisy system (not enough free resources, memory, CPU etc.). If you strip down to just running this process and no IDE/browser etc, does it stabilize?

Also (although it doesn't matter for the "snap" comparison), I forgot to say, the "erk*" samples need a local Eureka. You can run it with Spring Boot CLI (with the Cloud extension), or just a vanilla @EnableEurekaServer running on port 8761.

Here's my run from this morning on different hardware (still Ubuntu 16.04):

$ java -jar target/benchmarks.jar main snap old -p sample=empt,demo,actr,jdbc,actj,jpae,conf
Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      empt  avgt   10  0.825 ± 0.060   s/op
SnapBenchmark.snap      demo  avgt   10  1.713 ± 0.055   s/op
SnapBenchmark.snap      actr  avgt   10  2.312 ± 0.129   s/op
SnapBenchmark.snap      jdbc  avgt   10  1.983 ± 0.074   s/op
SnapBenchmark.snap      actj  avgt   10  2.682 ± 0.087   s/op
SnapBenchmark.snap      jpae  avgt   10  2.929 ± 0.073   s/op
SnapBenchmark.snap      conf  avgt   10  2.759 ± 0.108   s/op
MainBenchmark.main      empt    ss   10  0.824 ± 0.061   s/op
MainBenchmark.main      demo    ss   10  1.762 ± 0.054   s/op
MainBenchmark.main      actr    ss   10  2.352 ± 0.087   s/op
MainBenchmark.main      jdbc    ss   10  2.046 ± 0.088   s/op
MainBenchmark.main      actj    ss   10  2.668 ± 0.089   s/op
MainBenchmark.main      jpae    ss   10  3.072 ± 0.087   s/op
MainBenchmark.main      conf    ss   10  2.908 ± 0.118   s/op
OldBenchmark.old        empt    ss   10  0.716 ± 0.077   s/op
OldBenchmark.old        demo    ss   10  1.538 ± 0.060   s/op
OldBenchmark.old        actr    ss   10  1.983 ± 0.082   s/op
OldBenchmark.old        jdbc    ss   10  1.805 ± 0.111   s/op
OldBenchmark.old        actj    ss   10  2.326 ± 0.121   s/op
OldBenchmark.old        jpae    ss   10  2.760 ± 0.137   s/op
OldBenchmark.old        conf    ss   10  2.534 ± 0.096   s/op

Slower overall than my "fast" desktop. Less memory available (might explain the error bars). No difference in the conclusion that 2.0.0 snapshots are marginally better than 2.0.0.M7 (and not on a par with 1.5.9).

Maybe someone else could run that on a Mac and post back some results?

Here's my run on latest MBP with Spring Boot 2.0.0 SNAP b96f862c9eff8

$ java -jar target/benchmarks.jar main snap old -p sample=empt,demo,actr,jdbc,actj,jpae,conf
Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      empt    ss   10  0.962 ± 0.092   s/op
SnapBenchmark.snap      demo    ss   10  1.868 ± 0.084   s/op
SnapBenchmark.snap      actr    ss   10  2.412 ± 0.105   s/op
SnapBenchmark.snap      jdbc    ss   10  2.131 ± 0.110   s/op
SnapBenchmark.snap      actj    ss   10  2.649 ± 0.133   s/op
SnapBenchmark.snap      jpae    ss   10  3.178 ± 0.206   s/op
SnapBenchmark.snap      conf    ss   10  2.646 ± 0.079   s/op
MainBenchmark.main      empt    ss   10  0.878 ± 0.033   s/op
MainBenchmark.main      demo    ss   10  1.914 ± 0.101   s/op
MainBenchmark.main      actr    ss   10  2.489 ± 0.135   s/op
MainBenchmark.main      jdbc    ss   10  2.136 ± 0.201   s/op
MainBenchmark.main      actj    ss   10  2.825 ± 0.231   s/op
MainBenchmark.main      jpae    ss   10  3.317 ± 0.279   s/op
MainBenchmark.main      conf    ss   10  3.240 ± 0.757   s/op
OldBenchmark.old        empt    ss   10  0.786 ± 0.089   s/op
OldBenchmark.old        demo    ss   10  1.615 ± 0.098   s/op
OldBenchmark.old        actr    ss   10  2.136 ± 0.234   s/op
OldBenchmark.old        jdbc    ss   10  1.992 ± 0.112   s/op
OldBenchmark.old        actj    ss   10  2.517 ± 0.285   s/op
OldBenchmark.old        jpae    ss   10  2.764 ± 0.198   s/op
OldBenchmark.old        conf    ss   10  2.610 ± 0.254   s/op
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

Second run without any background apps:

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
SnapBenchmark.snap      empt  avgt   10  0.854 ± 0.014   s/op
SnapBenchmark.snap      demo  avgt   10  1.716 ± 0.044   s/op
SnapBenchmark.snap      actr  avgt   10  2.300 ± 0.042   s/op
SnapBenchmark.snap      jdbc  avgt   10  1.960 ± 0.031   s/op
SnapBenchmark.snap      actj  avgt   10  2.651 ± 0.072   s/op
SnapBenchmark.snap      jpae  avgt   10  3.070 ± 0.061   s/op
SnapBenchmark.snap      conf  avgt   10  2.787 ± 0.088   s/op
MainBenchmark.main      empt    ss   10  0.893 ± 0.030   s/op
MainBenchmark.main      demo    ss   10  1.886 ± 0.069   s/op
MainBenchmark.main      actr    ss   10  2.498 ± 0.084   s/op
MainBenchmark.main      jdbc    ss   10  2.176 ± 0.095   s/op
MainBenchmark.main      actj    ss   10  2.795 ± 0.121   s/op
MainBenchmark.main      jpae    ss   10  3.068 ± 0.122   s/op
MainBenchmark.main      conf    ss   10  2.773 ± 0.101   s/op
MainBenchmark.main      erka    ss   10  3.678 ± 0.109   s/op
MainBenchmark.main      busr    ss   10  3.986 ± 0.068   s/op
MainBenchmark.main      zuul    ss   10  5.302 ± 0.047   s/op
MainBenchmark.main      erkb    ss   10  5.322 ± 0.077   s/op
MainBenchmark.main      slth    ss   10  5.410 ± 0.050   s/op
OldBenchmark.old        empt    ss   10  0.725 ± 0.008   s/op
OldBenchmark.old        demo    ss   10  1.514 ± 0.023   s/op
OldBenchmark.old        actr    ss   10  1.917 ± 0.017   s/op
OldBenchmark.old        jdbc    ss   10  1.797 ± 0.066   s/op
OldBenchmark.old        actj    ss   10  2.252 ± 0.074   s/op
OldBenchmark.old        jpae    ss   10  2.630 ± 0.140   s/op
OldBenchmark.old        conf    ss   10  2.436 ± 0.066   s/op
OldBenchmark.old        erka    ss   10  3.407 ± 0.084   s/op
OldBenchmark.old        buna    ss   10  2.797 ± 0.101   s/op
OldBenchmark.old        busr    ss   10  3.411 ± 0.061   s/op
OldBenchmark.old        zuul    ss   10  4.891 ± 0.093   s/op
OldBenchmark.old        slth    ss   10  4.794 ± 0.065   s/op

Pushing this one back to RC2 for now. I think performance is good enough (especially in the IDE) for RC1. I still can't work out why the benchmark shows a 100ms difference.

1.5.10 SNAPSHOT in IDE

2018-01-08 17:35:56.682  INFO 16454 --- [           main] s.a.ui.SampleActuatorUiApplication       : Started SampleActuatorUiApplication in 2.647 seconds (JVM running for 2.927)
2018-01-08 17:36:24.634  INFO 16464 --- [           main] s.a.ui.SampleActuatorUiApplication       : Started SampleActuatorUiApplication in 2.694 seconds (JVM running for 2.959)
2018-01-08 17:36:35.993  INFO 16469 --- [           main] s.a.ui.SampleActuatorUiApplication       : Started SampleActuatorUiApplication in 2.658 seconds (JVM running for 2.926)

2.0.0 in IDE with Spring Security patch

2018-01-08 17:26:40.757  INFO 16102 --- [           main] s.a.ui.SampleActuatorUiApplication       : Started SampleActuatorUiApplication in 2.737 seconds (JVM running for 3.066)
2018-01-08 17:27:18.883  INFO 16119 --- [           main] s.a.ui.SampleActuatorUiApplication       : Started SampleActuatorUiApplication in 2.696 seconds (JVM running for 3.01)
2018-01-08 17:27:26.456  INFO 16123 --- [           main] s.a.ui.SampleActuatorUiApplication       : Started SampleActuatorUiApplication in 2.732 seconds (JVM running for 3.093)

Snapshots looking much better now ("main" is 2.0.0.M7, "old" is 1.5.9):

Benchmark           (sample)  Mode  Cnt  Score   Error  Units
MainBenchmark.main      empt    ss   10  0.586 ± 0.025   s/op
MainBenchmark.main      demo    ss   10  1.276 ± 0.044   s/op
MainBenchmark.main      actr    ss   10  1.653 ± 0.056   s/op
MainBenchmark.main      jdbc    ss   10  1.437 ± 0.069   s/op
MainBenchmark.main      actj    ss   10  1.910 ± 0.086   s/op
MainBenchmark.main      jpae    ss   10  2.307 ± 0.184   s/op
MainBenchmark.main      conf    ss   10  1.972 ± 0.058   s/op
OldBenchmark.old        empt    ss   10  0.491 ± 0.016   s/op
OldBenchmark.old        demo    ss   10  1.064 ± 0.027   s/op
OldBenchmark.old        actr    ss   10  1.399 ± 0.090   s/op
OldBenchmark.old        jdbc    ss   10  1.295 ± 0.080   s/op
OldBenchmark.old        actj    ss   10  1.637 ± 0.078   s/op
OldBenchmark.old        jpae    ss   10  1.904 ± 0.053   s/op
OldBenchmark.old        conf    ss   10  1.786 ± 0.040   s/op
SnapBenchmark.snap      empt    ss   10  0.552 ± 0.012   s/op
SnapBenchmark.snap      demo    ss   10  1.166 ± 0.086   s/op
SnapBenchmark.snap      actr    ss   10  1.489 ± 0.044   s/op
SnapBenchmark.snap      jdbc    ss   10  1.334 ± 0.052   s/op
SnapBenchmark.snap      actj    ss   10  1.748 ± 0.079   s/op
SnapBenchmark.snap      jpae    ss   10  2.066 ± 0.059   s/op
SnapBenchmark.snap      conf    ss   10  1.794 ± 0.038   s/op

Marginally slower than 1.5.9, but not so noticeable now.

Was this page helpful?
0 / 5 - 0 ratings