Concisely describe the proposed feature
I think it will be great if we can have a CI pipeline to run some benchmarks as regression tests. This way we can easily detect problems like https://github.com/taichi-dev/taichi/pull/937#issuecomment-626282263.
https://www.cnblogs.com/younggun/articles/1814989.html
I thought that's exactly what we do in tests/python?
By "regression" I mean to detect performance regression (e.g. a new change caused the performance, as measured by our benchmark tests, to drop by 50%).
In contrast, what we have currently in the CI are just unit tests. They are used to verify if the system is not fundamentally broken.
Thank for clarify this, so we want to verify the functionability not broken, also want to verify the performance not broken? Not sure how Travis CI could do this, currently we can only do this by git switch back-and-fore, then run benchmarks by hand.
also want to verify the performance not broken?
Yep
Not sure now Travis CI could do this,
I'm no expert on this either. But such kind of regression tests are actually quite common, so I guess Travis must have a way to run some command, then produce a few timing numbers.
I suggest we don't worry too much about this issue. We may prioritize this when Taichi is more mature. For now I'm simply creating an issue so that we don't forget :)
I'm no expert on this either. But such kind of regression tests are actually quite common, so I guess Travis must have a way to run some command, then produce a few timing numbers.
I searched the web and found no info about relation between Travis and CPRT...
A stright-forward attempt can be:
Add a file called last_benchmark.txt, contains numbers that generated for each commit.
And let the CI or human-eye to check if the value last_benchmark.txt is increased or decreased, and report that number aloud.
I searched the web and found no info about relation between Travis and CPRT...
Ah, the naming could be performance tests, benchmark (BM) tests... I think the terms are pretty confusing here.
Yeah, I think having a file to store the historical BM data is a good way to get things on going (Usually this would be stored in some database for ease of query, but obviously we'd then have to pay for that...) I think this can be even simpler -- configure the bot so that it posts the BM data on each PR. For example: https://github.com/pingcap/tidb/pull/17101#issuecomment-626607152
pingcap/tidb#17101 (comment)
Cool! But I guess we will pay money for that. not sure if @yuanming-hu like this...
It comes to me that we can upgrade our format server to [Click to update benchmark]:
https://github.com/taichi-dev/taichi/blob/471392bcda9ad204337591559355920cfc7736a4/.github/pull_request_template.md#L7
When clicked, it runs ti benchmark and update misc/benchmark.txt in a commit [skip ci] update benchmark just like the [skip ci] enforce code format.
Or we can trigger this when user pushes [benchmark] do benchmark for me like the [format] currently does.
Then the reviewers can check the Files changed page to see if the performance increased or decreased.
But I guess we will pay money for that. not sure if @yuanming-hu like this...
I think it's better to be funded, rather than paying out of our own pockets, even if we are very enthusiastic on this...
When clicked, it runs ti benchmark and update misc/benchmark.txt in a commit [skip ci] update benchmark just like the [skip ci] enforce code format.
Yeah, i think this can be a good start.. The good thing about having a report on the PR is that people can actively look into it, though. But again, these are all fancy stuffs, which we don't need urgently
Anyway, before these fancy stuffs, we must set up ti benchmark first. @xumingkuan do you have any idea on how to implement this? Many thanks :)
Great idea. Thanks for proposing this.
I think it's better to be funded, rather than paying out of our own pockets, even if we are very enthusiastic on this...
We can find a computer in our lab for benchmark purposes. We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates. Our group also has some free Google cloud accounts. I'll think about this.
Anyway, before these fancy stuffs, we must set up
ti benchmarkfirst. @xumingkuan do you have any idea on how to implement this? Many thanks :)
We actually already have some basic benchmarks: https://github.com/taichi-dev/taichi/tree/master/benchmarks. https://github.com/taichi-dev/taichi/blob/master/benchmarks/run.py can trigger these benchmarks:
fill_dense:
* flat_range x64 8.852 ms cuda 0.402 ms
* flat_struct x64 5.688 ms cuda 0.398 ms
* nested_range x64 8.549 ms cuda 0.826 ms
* nested_range_blocked x64 4.323 ms cuda 6.724 ms
* nested_struct x64 5.694 ms cuda 0.324 ms
* nested_struct_listgen_16x16 x64 5.693 ms cuda 0.317 ms
* nested_struct_listgen_8x8 x64 5.763 ms cuda 0.316 ms
* root_listgen x64 5.685 ms cuda 0.402 ms
fill_sparse:
* nested_struct x64 11.053 ms cuda 0.674 ms
* nested_struct_fill_and_clear x64 43.212 ms cuda 22.951 ms
memory_bound:
* memcpy x64 78.917 ms cuda 8.072 ms
* memset x64 92.055 ms cuda 5.042 ms
* saxpy x64 97.547 ms cuda 11.460 ms
* sscal x64 98.836 ms cuda 7.809 ms
minimal:
* fill_scalar x64 0.002 ms cuda 0.007 ms
mpm2d:
* range x64 0.793 ms cuda 0.027 ms
* struct x64 0.773 ms cuda 0.028 ms
These can be reused. For example, https://github.com/taichi-dev/taichi/blob/master/benchmarks/mpm2d.py should be able to detect the performance issue in introduced in #937. How to automatically summarize the benchmark results and display on GitHub is worth discussions.
I haven't got a chance to systematically work on performance issues though.
Anyway, before these fancy stuffs, we must set up
ti benchmarkfirst. @xumingkuan do you have any idea on how to implement this? Many thanks :)
Currently, I'm just setting TI_PRINT_BENCHMARK_STAT=1, which generates a log file when running each unit test, for my benchmark charts.
it runs
ti benchmarkand updatemisc/benchmark.txt
If you want something like this, we can just let this command set print_benchmark_stat = true, run tests, and then read all log files and collect the data (the number of statements).
If you want something like this, we can just let this command set print_benchmark_stat = true, run tests, and then read all log files and collect the data (the number of statements).
So currently print_benchmark_stat only shows the number of statements? If so, it's not enough, for example:
$1 = const [8]
$2 = pow $0, $1
versus:
$1 = mul $0, $0
$2 = mul $1, $1
$3 = mul $2, $2
Although the second have more statements, but it's actually more efficient than the first.
Also consider vector division:
v.x /= k;
v.y /= k;
v.z /= k;
versus:
tmp = 1 / k;
v.x *= tmp;
v.y *= tmp;
v.z *= tmp;
And not to mention loop unroll.
So what we want is Time Performance, instead of Size Performance. I think it's good to add SP, but TP is more important for Regression Test, since sometimes we want to sacrifice SP for TP like #944.
I think it's good to add SP, but TP is more important for Regression Test, since sometimes we want to sacrifice SP for TP like #944.
Yes, but we may need to solve this issue first before adding the regression test of time performance:
We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates.
solve this issue
Oh, I see, so we can first setup SP CPRT as a 缁冩墜 for TP CPRT before this issue is solved?
Most helpful comment
Great idea. Thanks for proposing this.
We can find a computer in our lab for benchmark purposes. We need a machine with consistent hardware otherwise the performance comparisons won't make much sense. I guess Travis will just randomly pick an available VM slot, whose hardware capability fluctuates. Our group also has some free Google cloud accounts. I'll think about this.
We actually already have some basic benchmarks: https://github.com/taichi-dev/taichi/tree/master/benchmarks. https://github.com/taichi-dev/taichi/blob/master/benchmarks/run.py can trigger these benchmarks:
These can be reused. For example, https://github.com/taichi-dev/taichi/blob/master/benchmarks/mpm2d.py should be able to detect the performance issue in introduced in #937. How to automatically summarize the benchmark results and display on GitHub is worth discussions.
I haven't got a chance to systematically work on performance issues though.