Currently we don't have perf regression test in CI. Merging PR is not able to guarantee performance is as fast as before. This situation happens many times:
https://discuss.tvm.ai/t/solved-relay-x86-target-performance-regression/2266
https://github.com/dmlc/tvm/issues/3088
I suggest add performance regression test into CI, once performance is significantly changed the PR should be blocked.
+1
However, performance regression test per PR might take too long, especially for model end to end test on every target device. We need to figure out a balanced solution.
+1. Some nightly benchmark might be good enough for us to track any regression/improvements. Apache Lucene has a pretty good example.
+1
However, performance regression test per PR might take too long, especially for model end to end test on every target device. We need to figure out a balanced solution.
I think we don't need test all networks, A resnet-18/mobilenet and maybe simple RNN are able to expose most of problems.
Or we can do nightly benchmarking on a series of commits. Authors of these commits that cause significant performance regression will be notified to identify the issue
I think the Rust project has a decent approach to this: https://perf.rust-lang.org/
They visualize the performance of the compiler, the generated code, etc nightly.
One worry I have is CI is already very slow and often becomes congested when lots of people are working on open PRs.
Some nightly infra is a better option so it won鈥檛 block the current CI
+1 to doing performance testing nightly (separate from CI). Moreover, we can do more important/shorter perf tests nightly and a more comprehensive set of perf tests weekly.
Some thoughts as I read through this thread. A big +1 for doing a joined up performance nightly testing on what we care about and figuring out ways of doing this.
Another alternative to visualizing performance and something I've used in the teams I've worked in on other projects has been lnt. 2 interesting links below for performance monitoring. LNT is also interesting from the point of view that it manages to collect perf profiles, https://llvm.org/docs/lnt/profiles.html and can help visualize differences in performance from the perf profiles in terms of real assembler. I've found this pretty useful in terms of productivity.
TBH, the rust web interface looks good - what would be good to check is
a. what are the dependencies on the target side ?
b. Whether the visualization can be separated from the data collection and the databases in which we store the performance data.
In terms of process,
Performance testing separate from CI is possibly good to bootstrap with but having a route to do performance testing after basic correctness sounds like a useful step as it would help with detecting performance issues pre-commit. The question is also is what scenarios with TVM are we interested in regular performance monitoring of because there are multiple usecases here that are likely to emerge.
My 2 cents on a Friday evening.
regards
Ramana
@ZihengJiang has volunteered to exploring a nightly pipeline infra that allows us to use jenkins to commits the new logs and we can figure out viz using javascript(perhaps something like vega-lite) and github pages.
I think we should separate the concern of logging and viz. i.e. to make it easy for anyone to create log of their interest in an unified format, and then have modular viz to visualize these data. Also @slyubomirsky has already setup something for relay nightly that perhaps can be shared.
The least uncertain part of the configuration was viz and the html side actually.
Sounds good - thanks a lot for that update @tqchen - Agreed that we should separate out logging to visualization as I expect we will want to experiment with visualization techniques before getting to one that we like.
Here is a possible pipeline that needs a Jenkins and github
just chime in to see if there is any progress on perf regression test
Most helpful comment
Some nightly infra is a better option so it won鈥檛 block the current CI