Kibana version: 7.9
Elasticsearch version: 7.9
Server OS version: Debian 10
Browser version: Chrome 84
Browser OS version: Linux
Original install method (e.g. download page, yum, from source, etc.):
Describe the bug:
While testing the fix for #76227 I noticed that running visualizations on indices with many fields (10k in this case) results in high CPU utilization and a slow render (even post-fix #76208). The observed problem is similar but different than #76227, namely deep recursion with foldKey
and foldValue
calling one another and taking ~1.2s:
From my limited analysis the culprit seems calculateObjectHash
being called with a ~5MB object from esaggs.ts / handleCourierRequest
and calculating the hash itself takes about 1.2s on an i7-7567U. Specifically:
const tabifyCacheHash = calculateObjectHash({
tabifyAggs: aggs,
...tabifyParams
}); // We only need to reexecute tabify, if either we did a new request or some input params to tabify changed
With a breakpoint within calculateObjectHash
then on console:
console.time('hash'); calculateObjectHash(o); console.timeEnd('hash');
VM1483:1 hash: 1207.406982421875ms
Steps to reproduce:
Expected behavior: Visualization is fast to render
Screenshots (if relevant):
Errors in browser console (if relevant):
Provide logs and/or server output (if relevant):
Any additional context:
Pinging @elastic/kibana-app-arch (Team:AppArch)
Pinging @elastic/kibana-platform (Team:Platform)
AFAICT, Platform team does not own any of the problematic code here so I'm going to remove our team's assignment.
At the risk of stating what may be obvious, here's my 2垄 on how we need to prevent these types of issues going forward:
Kibana is at the size now where one change in one part of the system can easily affect others in ways that were not expected, anticipated, or in ways that are not obviously important to the UX. In critical paths like this one, we are going to need to develop performance benchmarks that run on CI so that we can track performance regressions like this one and be able to correlate them to a specific code change. This will allows us to be much more proactive about fixing these issues in PRs rather than after the fact in a release.
We simply can't expect every person contributing to the Kibana repository to have an understanding of what the performance requirements are of all features and how one change may impact that. Upgrading lodash is a great example. It clearly could have major performance impacts on different parts of the system, however, without any benchmarks, we simply cannot be sure how this change affects anything that is important. While I don't expect we do major upgrades of lodash often, we do typically do one major Node.js upgrade once a year. These are highly likely to impact the Kibana's performance, and while usually they are positive impacts, they aren't always. These are just the easy examples and there are types of other dependencies that will have performance impacts too (eg. D3, charting libraries, courier changes, React upgrades, etc, etc.)
From my limited analysis the culprit seems calculateObjectHash being called with a ~5MB object from esaggs.ts / handleCourierRequest and calculating the hash itself takes about 1.2s on an i7-7567U
Just one note on this -- calculateObjectHash
was removed starting in 7.10
with https://github.com/elastic/kibana/pull/77646
From my limited analysis the culprit seems calculateObjectHash being called with a ~5MB object from esaggs.ts / handleCourierRequest and calculating the hash itself takes about 1.2s on an i7-7567U
Just one note on this --
calculateObjectHash
was removed starting in7.10
with #77646
Thanks for the update! Looking forward to 7.10
Most helpful comment
AFAICT, Platform team does not own any of the problematic code here so I'm going to remove our team's assignment.
At the risk of stating what may be obvious, here's my 2垄 on how we need to prevent these types of issues going forward:
Kibana is at the size now where one change in one part of the system can easily affect others in ways that were not expected, anticipated, or in ways that are not obviously important to the UX. In critical paths like this one, we are going to need to develop performance benchmarks that run on CI so that we can track performance regressions like this one and be able to correlate them to a specific code change. This will allows us to be much more proactive about fixing these issues in PRs rather than after the fact in a release.
We simply can't expect every person contributing to the Kibana repository to have an understanding of what the performance requirements are of all features and how one change may impact that. Upgrading lodash is a great example. It clearly could have major performance impacts on different parts of the system, however, without any benchmarks, we simply cannot be sure how this change affects anything that is important. While I don't expect we do major upgrades of lodash often, we do typically do one major Node.js upgrade once a year. These are highly likely to impact the Kibana's performance, and while usually they are positive impacts, they aren't always. These are just the easy examples and there are types of other dependencies that will have performance impacts too (eg. D3, charting libraries, courier changes, React upgrades, etc, etc.)