We are working on enabling APM agent on the prod build https://github.com/elastic/kibana/issues/70497. Before making this happen we want to understand what performance overhead it adds to the Kibana server. We might be able to re-use the setup introduced in https://github.com/elastic/kibana/issues/73189 to measure the average response time & number of requests Kibana can handle with and without APM agent enabled.
Pinging @elastic/kibana-platform (Team:Platform)
API performance testing is based on setup https://github.com/dmlemeshko/kibana-load-testing I adjusted number of requests not to overwhelm APM server.
setUp(
scn.inject(
constantConcurrentUsers(15) during (2 minute),
rampConcurrentUsers(15) to (20) during (2 minute)
).protocols(httpProtocol)
).maxDuration(15 minutes)
Testes are run against 7.10.0-SNAPSHOT
APM agent seems to add a significant overhead (see 95%).
download result in html 7.10.0-without-apm.zip
download result in html 7.10.0-with-apm.zip
Tested Kibana image doesn't contain changes introduced in https://github.com/elastic/kibana/pull/78697
So I added breakdownMetrics: false,
to dist APM config manually. It slightly improves the situation:
https://www.elastic.co/guide/en/apm/agent/nodejs/master/performance-tuning.html provides some details on how to squeeze out a bit more perf improvements.
The most simple way is to reduce the sample ratio https://www.elastic.co/guide/en/apm/agent/nodejs/master/performance-tuning.html#performance-sampling
It seems we can use 0.2-0.3 as a default value and adjust it via config file.
Numbers:
Other config values don't seem to affect CPU as much as sampleRation does, so I decided not to use them. @vigneshshanmugam do you have anything to add?
As you have already figured transactionSampleRate
is the go to metric we recommend for both Node.js and RUM agent to tune it for performance as it drops transactions based on this metric.
Perf tuning RUM agent - https://www.elastic.co/guide/en/apm/agent/rum-js/current/performance-tuning.html
breakdownMetrics
Disabling it certainly helps a lot in the RUM agent for custom transactions vs page-load
transactions. I don't know how the above test schedules the load and which browsers it runs so cant say for sure if its going to have a huge impact. But my recommendation would be to keep it to false
if it helps.
centralConfig
- Disable this one as it introduces one additional request to APM server. Defaults to true
in Node.js and false
in RUM.
metricsInterval
- Can you try increasing this interval/disabling metrics reporting and check if its helping? This controls the Metrics capturing in Node agent. https://www.elastic.co/guide/en/apm/agent/nodejs/master/configuration.html#metrics-interval
I cant seem to find any other config that would help.
I don't know how the above test schedules the load and which browsers it runs so cant say for sure if its going to have a huge impact. But my recommendation would be to keep it to false if it helps.
It tests the server side API perf only.
centralConfig - Disable this one as it introduces one additional request to APM server. Defaults to true in Node.js and false in RUM.
already disabled in my tests
metricsInterval - Can you try increasing this interval/disabling metrics reporting and check if its helping? This controls the Metrics capturing in Node agent.
Test with metricsInterval: '120s'
and transactionSampleRate: 0.3
slightly improves the situation (in comparison to transactionSampleRate: 0.3
):
So in summary, even with 'best' compromise configuration, 95th percentile is doubled, and 50th percentile tripled, right? This is... significant.
So in summary, even with 'best' compromise configuration, 95th percentile is doubled, and 50th percentile tripled, right? This is... significant
The best configuration with sample ratio: 0.1, breakdownMetrics: false; centralConfig: false; metricsInterval: '120s'
. 50th percentile is doubled from 118
to 225
, 95th percentile is almost doubled from 574
to 950
.
It mostly affects query
functionality when requesting /api/saved_objects/*
& /api/metrics/vis/data
endpoints. Test case query timeseries data
is almost tripled.
with APM enabled:
@TinaHeiligers you asked how to perform testing:
how to run Kibana with APM agent locally:
./scripts/compose.py start master --no-kibana
kibana.yml
:elastic.apm.active: true
elastic.apm.serverUrl: 'http://127.0.0.1:8200'
# elastic.apm.secretToken: ... <-- might be required in prod/cloud
# optional metrics to adjust performance
# see https://www.elastic.co/guide/en/apm/agent/nodejs/master/configuration.html
elastic.apm.centralConfig: false
elastic.apm.breakdownMetrics: false
elastic.apm.transactionSampleRate: 0.1
elastic.apm.metricsInterval: '120s'
ELASTIC_APM_ACTIVE=true yarn start
cd apm-integration-testing; ./scripts/compose.py stop
how to run load testing against Kibana:
how to test Kibana on Cloud
kibana-load-testing
against v7.10 on Cloud to get numbers without APM agent enabled.kibana.yml
file to enable APM agent (ask Cloud team for assistance - elastic.apm.*
settings aren't listed in allow list) and point to APM server in Cloud@restrry I've followed your instructions above and with a little tweaking, was able to run the load tests against a local Kibana instance with and without APM running (through Docker).
My setup thus far:
version=8.0.0
I left the DemoJourney simulation as is regarding requests:
setUp(
scn
.inject(
constantConcurrentUsers(20) during (3 minute), // 1
rampConcurrentUsers(20) to (50) during (3 minute) // 2
)
.protocols(httpProtocol)
).maxDuration(15 minutes)
In the screen shots below, I've highlighted the same queries in both cases, for ease of comparison.
Without APM:
Full Results:
local_Kibana_without_APM.zip
With APM, using the Kibana apm settings suggested in the instructions:
Full Results:
local_Kibana_with_APM.zip
Summary:
We are indeed seeing an impact of APM on Kibana performance, with an increase in the 95th percentile response times.
I'll redo everything from v7.10-SNAPSHOT
, after which I'll move on to Cloud unless I hear otherwise 😉 .
Looks good overall. The only outlier is the query dashboard list
case that in the 95th percentile is faster with APM agent enabled.
I'll redo everything from v7.10-SNAPSHOT, after which I'll move on to Cloud unless I hear otherwise 😉 .
🚀
Progress was slow today, I really struggled to get Kibana 7.10 running and resorted to running Kibana off the distributable.
Load tests without APM:
Elasticsearch: snapshot v7.10
Kibana: 7.10 (distributable)
Note: Nothing really useful from this setup as roughly half of the queries threw errors.
_Full results:_
demojourney-20201111231041086.zip
Load tests with APM:
Elasticsearch and APM run from Docker (v7.10)
Kibana: 7.10 (distributable) with apm configured
_Full results:_
demojourney-20201111234910265.zip
Summary:
There's a huge discrepancy in the results from the queries that were successful. I don't trust these results and am rather moving on to Cloud testing. Hopefully that will be more reliable 😉
Note: Nothing really useful from this setup as roughly half of the queries threw errors.
@dmlemeshko I experienced a similar problem when only the login
scenario succeeded. What could be a reason for this?
@TinaHeiligers What Cloud settings did you use? There are recommended ones in https://github.com/elastic/kibana-load-testing
elasticsearch {
deployment_template = "gcp-io-optimized"
memory = 8192
}
kibana {
memory = 1024
}
I fixed a login issue for 7.10 when running load testing with new deployment + canvas end-points needed to be updated.
Here is my test run:
export API_KEY=<Key generated on Staging Cloud>
export deployConfig=config/deploy/7.10.0.conf
mvn clean -Dmaven.test.failure.ignore=true compile
mvn gatling:test -Dgatling.simulationClass=org.kibanaLoadTest.simulation.DemoJourney
demojourney-20201112111618663.zip
7.10.0.conf deploy config has the same memory values @restrry posted above
Another run with _rampConcurrentUsers_ changed to 20..150
@restrry
What Cloud settings did you use?
I haven't tested on Cloud yet, I'll do that today with the recommended settings.
@dmlemeshko That's for fixing that issue! I reran the load test on a local Kibana 7.10 distributable and not getting the errors seen previously.
Test setup for both runs:
setUp(
scn
.inject(
constantConcurrentUsers(20) during (3 minute), // 1
rampConcurrentUsers(20) to (50) during (3 minute) // 2
)
.protocols(httpProtocol)
).maxDuration(15 minutes)
Load tests without APM:
Elasticsearch: snapshot v7.10
Kibana: 7.10 (distributable)
Full result
demojourney-20201112153808121.zip
Load tests with APM:
Elasticsearch and APM run from Docker (v7.10)
Kibana: 7.10 (distributable) with apm configured
Full result
demojourney-20201112161648302.zip
Summary:
With the exception of the request to discover and discover query 2, all the response times increase when APM is enabled.
OF the response times already starting at over 500ms, the increase in response time ranged between 12 and 40%, taking login response time to over 1000ms with "query gauge data" approaching the 1000ms mark.
Load test results without APM
Full Result
demojourney-20201114173519337.zip
I'm reaching out to the Cloud folks to add the apm* config to the Cloud deployment and will post the results when I have them.
Test run:
mvn install
export env=config/cloud-tina-7.10.0.conf // contains the details of the cloud staging env
mvn gatling:test -Dgatling.simulationClass=org.kibanaLoadTest.simulation.DemoJourney
Load test results with APM
Full results
demojourney-20201117164239422.zip
export API_KEY=<Key generated on Staging Cloud>
export deployConfig=config/deploy/7.10.0.conf
mvn clean -Dmaven.test.failure.ignore=true compile
mvn gatling:test -Dgatling.simulationClass=org.kibanaLoadTest.simulation.DemoJourney
Script-created deployment
deploy-config:
version = 7.10.0
elasticsearch {
deployment_template = "gcp-io-optimized"
memory = 8192
}
kibana {
memory = 1024
}
Load test results
Full Result
demojourney-20201112213550120.zip
@restrry I've added the results from the Kibana load testing on the cloud (staging) test run where APM is enabled in Kibana.
The results are similar to what we've seen on local instances of Kibana with and without APM: An overall increase in the 95th percentile response times by ~16%.
For both test runs, the number of concurrent users was set to 20 during 3 min and the number of users was ramped up from 20 to 50 during a 3 minute interval.
Please let me know if I should repeat the tests with fewer/more concurrent users and/or change any of the APM settings.
I will document the steps to take to add configurations not exposed be default on Cloud. Please let me know where the best place is to add these (I don't think making it public in this issue is appropriate 😉 )
cc @joshdover
@TinaHeiligers @restrry
If you want to have more "clean" test results, I suggest spinning up VM in the same region where you create stack deployment
I can help with it, but if you are familiar how to add VM the follow up steps are:
// e.g. I run tests and create VM in Franfurt (europe-west3-a)
// zip project and upload to VM
zip -r KibanaLoadTesting.zip .
gcloud compute scp ~/github/KibanaLoadTesting.zip root@<vm-name>:/home/<user-name>/test --zone=europe-west3-a
// start docker image with JDK/maven in other terminal
sudo docker run -it -v "$(pwd)"/test:/local/git --name java-maven --rm jamesdbloom/docker-java8-maven
// run tests with the same command you did locally
// download test results
sudo tar -czvf my_results.tar.gz /home/<user-name>/test/KibanaLoadTesting/target/gatling/demojourney-<report-folder>
gcloud compute scp root@<vm-name>:/home/<user-name>/test/KibanaLoadTesting/target/gatling/my_results.tar.gz </local-machine-path-to-save-at> --zone=europe-west3-a
@dmlemeshko I'm not familiar with adding VM and would greatly appreciate your help! I'm happy to watch you go through the process on Zoom. In the mean time, I'll work through the guide.
Why we have such a significant difference between On cloud staging, using an existing deployment with APM
and On Cloud staging, creating a deployment as part of the test run
?
I think it makes sense to spin up a new deployment for both Kibana & kibana-load-testing as @dmlemeshko suggested https://github.com/elastic/kibana/issues/78792#issuecomment-729123745
I scheduled a call to discuss the testing strategy.
Here are the steps how to spin up Google Cloud VM and run tests on it:
Login to https://console.cloud.google.com/ with corp account
Create CPU-optimized VM (4CPUs, 16 GB memory is enough) with any Container Optimized OS as boot disk, e.g _load-testing-vm_
Note: use US-Cenral1 region, same as for stack deployment
Zip https://github.com/elastic/kibana-load-testing and copy to VM
Connect to VM, create _test_ folder
gcloud beta compute ssh --zone "us-central1-a" "load-testing-vm" --project "elastic-kibana-184716"
mkdir test
chmod 777 test
In other terminal upload archive to VM
sudo gcloud compute scp KibanaLoadTesting.tar.gz <user>@load-testing-vm:/home/<user>/test --zone "us-central1-a" --project "elastic-kibana-184716"
In first terminal (VM) unzip project and start docker container with mapping local/container path, so later you can exit container and keep results on VM
cd test
tar -xzf KibanaLoadTesting.tar.gz
sudo docker run -it -v "$(pwd)":/local/git --name java-maven --rm jamesdbloom/docker-java8-maven
Now you are in container and should be able to see _test_ folder, that contains unzipped project. Run tests as locally
export API_KEY=<Your API Key>
export deployConfig=config/deploy/7.10.0.conf
mvn clean -Dmaven.test.failure.ignore=true compile
mvn gatling:test -Dgatling.simulationClass=org.kibanaLoadTest.simulation.DemoJourney
When tests are one, type exit
. Check target/gatling
for your tests results. Zip and download to local machine:
sudo tar -czvf results.tar.gz demojourney-20201118160915491/
From local machine run
sudo gcloud compute scp <user>@load-testing-vm:/home/<user>/test/target/gatling/results.tar.gz . --zone=us-central1-a
Results should be available in the current path
I think it'd also be worth understanding the difference between 7.11 w/ APM vs 7.10 and 7.9 w/o APM. Due to the many performance tweaks that were made to support Fleet, there may not be large regression in 7.11 w/ APM enabled. If the difference is smaller, enabling this in 7.11 clusters may be an easier pill to swallow.
Next, I'd also like to experiment with tweaking some other settings to see if we get any performance improvements:
elastic.apm.asyncHooks: false
elastic.apm.disableInstrumentations
bluebird
, graphql
If none of these result in improved performance, we may need to work directly with the APM team to look at some flamegraphs / profiles and see where most of the time is being spent in the APM agent code.
@dmlemeshko I'm stuck on the step:
In first terminal (VM) unzip project and start docker container with mapping local/container path, so later you can exit container and keep results on VM
When I run the following (in the VM)
christianeheiligers@heiligers-loadtest-kibana:~/test$ sudo docker run -it -v "$(pwd)":/local/git --name java-maven --rm jamesbloom/docker-java8-maven
I'm getting:
sudo: docker: command not found
I don't know if I created the VM correctly (apparently, the instance doesn't know what 'docker' is and must therefore not have a container maybe??) and the link to the recording from your walkthrough hasn't been added to the meeting invite yet. I've been following your guide.
The VM I created is:
"deviceName": "heiligers-loadtest-kibana",
Any and all help will be greatly appreciated!
@joshdover
understanding the difference between 7.11 w/ APM vs 7.10 and 7.9 w/o APM
I'm struggling with the VM setup but I could tackle this in Cloud Staging if you don't mind the 'noise' generated when I run the tests locally (pointing to cloud instances).
@TinaHeiligers you are getting this error
sudo: docker: command not found
since you don't have docker pre-installed on Ubuntu image, to fix it you need to recreate your VM and change Boot disk:
Currently you have Ubuntu, but it should be one of Container Optimized OS.
You still can install docker on Ubuntu, but it will be faster to simply create a new VM.
VM run instructions are available in repo now
I think it'd also be worth understanding the difference between 7.11 w/ APM vs 7.10 and 7.9 w/o APM. Due to the many performance tweaks that were made to support Fleet, there may not be large regression in 7.11 w/ APM enabled.
I thought that work was done in v7.10, but it doesn't hurt to test with 7.11-SNAPSHOT as well.
7.9 w/o APM
I believe we added support for APM on Cloud in 7.10 only https://github.com/elastic/kibana/pull/77855
I thought that work was done in v7.10, but it doesn't hurt to test with 7.11-SNAPSHOT as well.
You're right I got my versions mixed up. We should be comparing 7.10 w/ APM vs. 7.9 w/o (which is the only option on 7.9)
@dmlemeshko thank you so much for all your help! I've successfully created a VM and have initial results from a 7.10 deployment created during the test run.
Update:
@dmlemeshko When I try to run the tests against an existing deployment, I get BUILD FAILURE
errors:
Errors in VM container
00:04:50.769 [ERROR] i.g.a.Gatling$ - Run crashed
java.lang.NullPointerException: null
at java.io.Reader.<init>(Reader.java:78)
at java.io.InputStreamReader.<init>(InputStreamReader.java:129)
at scala.io.BufferedSource.reader(BufferedSource.scala:26)
at scala.io.BufferedSource.bufferedReader(BufferedSource.scala:27)
at scala.io.BufferedSource.charReader$lzycompute(BufferedSource.scala:37)
at scala.io.BufferedSource.charReader(BufferedSource.scala:35)
at scala.io.BufferedSource.scala$io$BufferedSource$$decachedReader(BufferedSource.scala:64)
at scala.io.BufferedSource.mkString(BufferedSource.scala:93)
at org.kibanaLoadTest.helpers.Helper$.readResourceConfigFile(Helper.scala:38)
at org.kibanaLoadTest.simulation.BaseSimulation.<init>(BaseSimulation.scala:25)
at org.kibanaLoadTest.simulation.DemoJourney.<init>(DemoJourney.scala:8)
... 16 common frames omitted
Wrapped by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at io.gatling.app.Runner.run0(Runner.scala:74)
at io.gatling.app.Runner.run(Runner.scala:60)
at io.gatling.app.Gatling$.start(Gatling.scala:80)
at io.gatling.app.Gatling$.fromArgs(Gatling.scala:46)
at io.gatling.app.Gatling$.main(Gatling.scala:38)
at io.gatling.app.Gatling.main(Gatling.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at io.gatling.mojo.MainWithArgsInFile.runMain(MainWithArgsInFile.java:50)
at io.gatling.mojo.MainWithArgsInFile.main(MainWithArgsInFile.java:33)
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at io.gatling.mojo.MainWithArgsInFile.runMain(MainWithArgsInFile.java:50)
at io.gatling.mojo.MainWithArgsInFile.main(MainWithArgsInFile.java:33)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at io.gatling.app.Runner.run0(Runner.scala:74)
at io.gatling.app.Runner.run(Runner.scala:60)
at io.gatling.app.Gatling$.start(Gatling.scala:80)
at io.gatling.app.Gatling$.fromArgs(Gatling.scala:46)
at io.gatling.app.Gatling$.main(Gatling.scala:38)
at io.gatling.app.Gatling.main(Gatling.scala)
... 6 more
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:78)
at java.io.InputStreamReader.<init>(InputStreamReader.java:129)
at scala.io.BufferedSource.reader(BufferedSource.scala:26)
at scala.io.BufferedSource.bufferedReader(BufferedSource.scala:27)
at scala.io.BufferedSource.charReader$lzycompute(BufferedSource.scala:37)
at scala.io.BufferedSource.charReader(BufferedSource.scala:35)
at scala.io.BufferedSource.scala$io$BufferedSource$$decachedReader(BufferedSource.scala:64)
at scala.io.BufferedSource.mkString(BufferedSource.scala:93)
at org.kibanaLoadTest.helpers.Helper$.readResourceConfigFile(Helper.scala:38)
at org.kibanaLoadTest.simulation.BaseSimulation.<init>(BaseSimulation.scala:25)
at org.kibanaLoadTest.simulation.DemoJourney.<init>(DemoJourney.scala:8)
... 16 more
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.466s
[INFO] Finished at: Fri Nov 20 00:04:50 UTC 2020
[INFO] Final Memory: 17M/430M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal io.gatling:gatling-maven-plugin:3.0.5:test (default-cli) on project kibana-load-test: Gatling failed. Process exited with an error: 255 (Exit value: 255) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Any idea why running a test and creating a deployment at the same time works but not against an existing deployment?
The existing deployment is in the same region as the one that's created during the test and I can run the tests against existing deployments locally without an issue.
@restrry _If_ we can get the tests to run agains an existing deployment, I'll recommend the following strategy for running tests from the VM for the following cases:
I'm not sure if we can create the instances on Staging first and only run the tests from the VM. It might take a little time to figure out how to inject the apm* config settings in the deployment template that creates an instance as part of the test.
cc @joshdover JFYI
Latest update:
I've managed to get the tests to run agains an existing deployment on a VM in the same region and am repeating the tests for the following cases:
@dmlemeshko I'll need your help with running the load tests against a 7.11.0-SNAPSHOT
deployment. (config has version = "7.11.0-SNAPSHOT"
)
The run crashes with:
16:31:42.078 [ERROR] i.g.a.Gatling$ - Run crashed
java.lang.IllegalArgumentException: Invalid version format
at org.kibanaLoadTest.helpers.Version.<init>(Version.scala:7)
at org.kibanaLoadTest.KibanaConfiguration.<init>(KibanaConfiguration.scala:50)
at org.kibanaLoadTest.simulation.BaseSimulation.<init>(BaseSimulation.scala:25)
at org.kibanaLoadTest.simulation.DemoJourney.<init>(DemoJourney.scala:8)
... 17 common frames omitted
The way the version is being parsed doesn't allow for -SNAPSHOT
suffixes.
When if I remove -SNAPSHOT
from the version and _force_ a 7.11.0 version (config has version = "7.11.0"
), the tests run but the only request that doesn't have a 100% failure is login
. 🤷♀️
7.11.0-SNAPSHOT deployment (run as a 7.11.0 version in the tests)
I've run these several times (locally) with different deployments and get the same result.
Have you seen this before and, if so, how do we fix it? We need to get 7.11.0-SNAPSHOT stats to compare with the 7.9.3 and 7.10.0 versions.
@TinaHeiligers I fixed an issue with snapshot builds, tested both with new and existing deployments. Please pull the latest master
Comments from a new Node Agent engineer here -- @sqren and @restrry asked us to drop by and lend our two cents on configuration scenarios.
Also -- this is mostly echo-ing things that @joshdover has already said.
As far as configuration goes, I'd definitely be curious to see if toggling the asyncHooks configuration helps or hurts. The Agent (like other APM agents) uses node's async_hooks
module to track asynchronous context across transactions (i.e. "this callback/promise goes with with this http request"). When this is disabled we fallback to using the patch-async module to track this async context. Under some workloads the later is more performant. If Kibana is still using bluebird promises then disabling the bluebird instrumentation might yield positive perf. results -- but at the possible cost of some lost transaction state.
If we go this route we'd want to investigate what a trace that involves bluebird transactions looks like with this both on and off.
Other than that (also as previously mentioned) -- transactionSampleRate
is the main knob we have to turn when it comes to improving agent performance. Produce/record less data, improve performance.
Finally, if you're comfortable veering into the realm of superstition, installing a no-op logger might produce interesting results. This isn't based on any particular known problem with the elastic agent's logger -- just things I've seen elsewhere in the past.
Most helpful comment
@TinaHeiligers you are getting this error
since you don't have docker pre-installed on Ubuntu image, to fix it you need to recreate your VM and change Boot disk:
Currently you have Ubuntu, but it should be one of Container Optimized OS.
You still can install docker on Ubuntu, but it will be faster to simply create a new VM.
VM run instructions are available in repo now