I noticed that the image-classifier returns different results if invoked with one file at a time and all the same file at once and was wondering if this is expected.
To reproduce:
$ ./bin/image-classifier tests/images/imagenet/* -image_mode=0to1 -m=densenet121/model.onnx
$ ./bin/image-classifier tests/images/imagenet/cat_285.png -image_mode=0to1 -m=densenet121/model.onnx
Result:
With all the files at once:
Model: densenet121/model.onnx
File: tests/images/imagenet/cat_285.png Result:281
File: tests/images/imagenet/dog_207.png Result:207
File: tests/images/imagenet/zebra_340.png Result:340
With one at a time:
Model: densenet121/model.onnx
File: tests/images/imagenet/cat_285.png Result:674
Model: densenet121/model.onnx
File: tests/images/imagenet/dog_207.png Result:674
Model: densenet121/model.onnx
File: tests/images/imagenet/zebra_340.png Result:674
Also when there is only one file given, the process takes much longer to produce a result.
Is this expected?
Good catch. I'm pretty sure this is a regression.
I bisected this to "Optimization on ConcatNode, Concat(Reshape(x) * N) -> Reshape(Concat(x * N))"
https://github.com/pytorch/glow/commit/710cc3b0db91854a36c44080c360ca751e37a393
using
git bisect start 45abbe8 3a9f9da --
git bisect run regress.sh
With regress.sh:
cd `dirname $0`;
ninja image-classifier
./bin/image-classifier tests/images/imagenet/cat_285.png -image_mode=0to1 -m=densenet121/model.onnx | grep -q 'Result:28[15]'
exit $?
@ZchiPitt can you please take a look and fix forward?
I think @beicy also looked at this issue. @beicy, did you make any progress on this?
@qcolombet Not yet. Haven't spent much time on this one yet :)
Thanks for the update @beicy. I just wanted to be sure @ZchiPitt and you were not looking at the same thing.
Sorry just saw it. Will look into it later today.
Shouldn't we try to check the correctness of the output for the stable image-classifier tests during our CI checks?
@opti-mix. Maybe, but can we download the models/weights from AWS during a Travis run? Seems like kind of a lot of data to schlep around.
@bertmaher Oh, I forgot that we do not run them on CI because of these huge models/weights sizes.
Hmm. At least we could make a script to compare the outputs of run.sh with expected outputs, so that you can locally run the tests and easily see if your changes break something. It is not as good as CI, but at least something.
We could also add a cron job to run them once a night.
+1 to @opti-mix and @jfix71
Running validation script async to PR should be doable: https://docs.travis-ci.com/user/cron-jobs/#Detecting-Builds-Triggered-by-Cron
And it's a trivial amount of work to enable this on CI as long as we have actual validation script.
@rdzhabarov I think we want/need to run this in Release + no ASAN so it ends in a sane amount of time, right? Because we do not have a job with this configuration, I think we would need to add 4th job with this config that essentially returns immediately on PRs. This seems annoying -- especially if it will build the dependencies before returning. Do you know if there's some smarter/better way, or if there's a way to disable building all the dependencies?
Yeah, we'd need to run that in release mode for sure.
We do not necessarily need to run run.sh during the PR push, we could just run it daily. One of the problems as @opti-mix mentioned is the phase of downloading weights/models etc. But we could avoid this problem if we run this job async from the PR travis jobs.
And In the travis config, we'd enable that job when it's triggered by travis cron job.
"This seems annoying -- especially if it will build the dependencies before returning." I'll check on this, probably should be a way to disable these steps.
Edit: there is conditional build support in travis, so the job will not run unless it's triggered by cron.
We do not necessarily need to run run.sh during the PR push, we could just run it daily.
If we run it daily and catch something, we still need to bisect to find out which specific PR resulted in a bug. But it is certainly better than what we have today!
But even better would be to catch it as soon as it occurs. Which means testing locally or on every PR.
IMHO doing a more thorough local testing when you work on new optimizations and the like is almost a "must" as the chances to break something are very high.
@rdzhabarov Ah ok didn't know about conditional build support, cool!
I checkout to the version before my merge and run the desnet with the commands provided by @qcolombet. It shows that when running "with all the files at once", the results are correct.
Will look more into it and try to locate the bug.
Which means testing locally or on every PR.
Running locally is a good requirement but not enforceable, I would rather have this kind of things runs on every PR. Hardware is supposed to be cheap :).
Conceptually I agree that it's better to run on each and every PR, but the question is whether we could tolerate longer Travis runs for a PR (and I honestly do not know how much time that would take). Which could be pretty annoying. I think it'd be good to actually check the total time for the weights downloads etc and run.sh run before making a call here. On the other hand, once we have a validation script it'd be very easy to switch from once daily to every PR.
Tracking this support in https://github.com/pytorch/glow/issues/1257.
What鈥檚 the status of this bug? Can we revert the commit that broke the test?
@nadavrot Ive found the bug and fixed it. Working on submitting a new PR soon.
@ZchiPitt thanks for fixing!
Most helpful comment
@nadavrot Ive found the bug and fixed it. Working on submitting a new PR soon.