Glow: image-classifier behaves differently with one or several images for densenet121

Created on 4 Jul 2018 · 21Comments · Source: pytorch/glow

I noticed that the image-classifier returns different results if invoked with one file at a time and all the same file at once and was wondering if this is expected.

To reproduce:
$ ./bin/image-classifier tests/images/imagenet/* -image_mode=0to1 -m=densenet121/model.onnx
$ ./bin/image-classifier tests/images/imagenet/cat_285.png -image_mode=0to1 -m=densenet121/model.onnx

Result:
With all the files at once:
Model: densenet121/model.onnx
File: tests/images/imagenet/cat_285.png Result:281
File: tests/images/imagenet/dog_207.png Result:207
File: tests/images/imagenet/zebra_340.png Result:340

With one at a time:
Model: densenet121/model.onnx
File: tests/images/imagenet/cat_285.png Result:674
Model: densenet121/model.onnx
File: tests/images/imagenet/dog_207.png Result:674
Model: densenet121/model.onnx
File: tests/images/imagenet/zebra_340.png Result:674

Also when there is only one file given, the process takes much longer to produce a result.

Is this expected?

Source

qcolombet

Most helpful comment

@nadavrot Ive found the bug and fixed it. Working on submitting a new PR soon.

ZchiPitt on 11 Jul 2018

🎉1 👍1

All 21 comments

Good catch. I'm pretty sure this is a regression.

bertmaher on 4 Jul 2018

👍2

I bisected this to "Optimization on ConcatNode, Concat(Reshape(x) * N) -> Reshape(Concat(x * N))"
https://github.com/pytorch/glow/commit/710cc3b0db91854a36c44080c360ca751e37a393
using

git bisect start 45abbe8 3a9f9da --
git bisect run regress.sh

With regress.sh:

cd `dirname $0`;
ninja image-classifier
./bin/image-classifier tests/images/imagenet/cat_285.png -image_mode=0to1 -m=densenet121/model.onnx | grep -q 'Result:28[15]'
exit $?

@ZchiPitt can you please take a look and fix forward?

bertmaher on 9 Jul 2018

👍1

I think @beicy also looked at this issue. @beicy, did you make any progress on this?

qcolombet on 9 Jul 2018

@qcolombet Not yet. Haven't spent much time on this one yet :)

beicy on 9 Jul 2018

Thanks for the update @beicy. I just wanted to be sure @ZchiPitt and you were not looking at the same thing.

qcolombet on 9 Jul 2018

Sorry just saw it. Will look into it later today.

ZchiPitt on 9 Jul 2018

Shouldn't we try to check the correctness of the output for the stable image-classifier tests during our CI checks?

opti-mix on 9 Jul 2018

@opti-mix. Maybe, but can we download the models/weights from AWS during a Travis run? Seems like kind of a lot of data to schlep around.

bertmaher on 10 Jul 2018

@bertmaher Oh, I forgot that we do not run them on CI because of these huge models/weights sizes.

Hmm. At least we could make a script to compare the outputs of run.sh with expected outputs, so that you can locally run the tests and easily see if your changes break something. It is not as good as CI, but at least something.

opti-mix on 10 Jul 2018

👍1

We could also add a cron job to run them once a night.

jfix71 on 10 Jul 2018

👍2

+1 to @opti-mix and @jfix71
Running validation script async to PR should be doable: https://docs.travis-ci.com/user/cron-jobs/#Detecting-Builds-Triggered-by-Cron
And it's a trivial amount of work to enable this on CI as long as we have actual validation script.

rdzhabarov on 10 Jul 2018

@rdzhabarov I think we want/need to run this in Release + no ASAN so it ends in a sane amount of time, right? Because we do not have a job with this configuration, I think we would need to add 4th job with this config that essentially returns immediately on PRs. This seems annoying -- especially if it will build the dependencies before returning. Do you know if there's some smarter/better way, or if there's a way to disable building all the dependencies?

jfix71 on 10 Jul 2018

Yeah, we'd need to run that in release mode for sure.
We do not necessarily need to run run.sh during the PR push, we could just run it daily. One of the problems as @opti-mix mentioned is the phase of downloading weights/models etc. But we could avoid this problem if we run this job async from the PR travis jobs.
And In the travis config, we'd enable that job when it's triggered by travis cron job.

"This seems annoying -- especially if it will build the dependencies before returning." I'll check on this, probably should be a way to disable these steps.

Edit: there is conditional build support in travis, so the job will not run unless it's triggered by cron.

rdzhabarov on 10 Jul 2018

We do not necessarily need to run run.sh during the PR push, we could just run it daily.

If we run it daily and catch something, we still need to bisect to find out which specific PR resulted in a bug. But it is certainly better than what we have today!

But even better would be to catch it as soon as it occurs. Which means testing locally or on every PR.

IMHO doing a more thorough local testing when you work on new optimizations and the like is almost a "must" as the chances to break something are very high.

opti-mix on 10 Jul 2018

👍1

@rdzhabarov Ah ok didn't know about conditional build support, cool!

jfix71 on 10 Jul 2018

I checkout to the version before my merge and run the desnet with the commands provided by @qcolombet. It shows that when running "with all the files at once", the results are correct.
Will look more into it and try to locate the bug.

ZchiPitt on 10 Jul 2018

Which means testing locally or on every PR.

Running locally is a good requirement but not enforceable, I would rather have this kind of things runs on every PR. Hardware is supposed to be cheap :).

qcolombet on 10 Jul 2018

👍1

Conceptually I agree that it's better to run on each and every PR, but the question is whether we could tolerate longer Travis runs for a PR (and I honestly do not know how much time that would take). Which could be pretty annoying. I think it'd be good to actually check the total time for the weights downloads etc and run.sh run before making a call here. On the other hand, once we have a validation script it'd be very easy to switch from once daily to every PR.

Tracking this support in https://github.com/pytorch/glow/issues/1257.

rdzhabarov on 10 Jul 2018

What’s the status of this bug? Can we revert the commit that broke the test?