Glow: Enable LLVM and Glow assertions in Release mode

Created on 17 Apr 2019  路  15Comments  路  Source: pytorch/glow

It would be great if LLVM and Glow assertions were enabled by default in Release mode for continuous integration builds. We enable these internally and frequently hit them when merging in upstream changes.

Most helpful comment

@rdzhabarov do you know how feasible it would be to modify the llvm build we have in CI to do this?

@jackm321
This is all very doable. Might be similar to how we added llvm8 docker image.
Let's sync up on this in person.

The only concern would be compilation + execution time. If that could be reasonable we should be able to make this as a CI job.

All 15 comments

I like this idea. Do you have a patch that enables assertions already? Also, have you measured tthe impact on the test suite runtime? cc: @rdzhabarov

Can't provide a patch at the moment (waiting on corporate to sign CLA), but one can be easily derived from the LLVM project. The impact on the test suite would be entirely compile-time; we've not measured it, but happy to do so.

Glow assertions were enabled by default in Release mode for continuous integration builds

That's already part of what we run in our "debug" circleci build.

As for LLVM it would be an interesting idea.

would be entirely compile-time;

That would require building LLVM during the ci runs (unless it's prebuild in the docker image) which could be time consuming.
I'd also like to see how much overhead would that bring. Obviously, I'd like to keep our CI time within a sane limits.

I understand not wanting to do an LLVM build in your ci runs. We use a prebuilt LLVM release build with assertions enabled, and have workarounds for a handful of LLVM asserts in our Glow internal codebase currently.

I think it would make sense to either install that into docker container or download from S3.

@geoffberry how do you run that on internal CI?

@rdzhabarov I'm not sure what you're asking. Do you mean how do we link with the prebuilt version of LLVM that we are using, or what tests are we running in CI, or something else?

I'm curious about your CI setup if any

Our CI is configured with asserts enabled in release mode for both LLVM and Glow. We use a prebuilt version of LLVM and protobuf and don't perform any type of containerization. With each commit we run all of the unit tests along with the image-classifier application for all Caffe2 inputs and for all quantization schemes.

Attaching results from running the Glow integrated tests with asserts on and off for Glow/LLVM.
LLVM version: 7.0.1
Glow: 71f2ec208b1306f422da7fb1562254eb225d32ae

int_tests_asserts_off.txt
int_tests_asserts_on.txt

Also, here are a few examples of places we've identified that need fixes due to unchecked LLVM errors:
https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/lib/ExecutionEngine/ExecutionEngine.cpp#L123

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/lib/ExecutionEngine/ExecutionEngine.cpp#L191

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/lib/Runtime/Provisioner/Provisioner.cpp#L147

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/lib/Runtime/HostManager/HostManager.cpp#L139

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/lib/Runtime/HostManager/HostManager.cpp#L195

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/tests/unittests/ExecutorTest.cpp#L645

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/tests/unittests/HostManagerTest.cpp#L99

https://github.com/pytorch/glow/blob/dedd60d80efe467b9efda89fb59f4c4e0268a26a/tests/unittests/HostManagerTest.cpp#L118

IIUC, these llvm::Error values need to be checked (i.e., converted to a boolean) for the checked bit to be set in order to avoid an LLVM assertion.

We're running into other types of asserts, but the above provide good examples of what we're seeing.

Interesting, are there Glow assertions that fail when running our "RELEASE" CI test suite?

My main hesitation in turning on Glow assertions for release runs is that they make the large tests super-duper slow (last time I looked it was about a 100x slowdown to run ResNet-50 on the interpreter, because we check every tensor dim for every FLOP in every convolution, but maybe it's better than that now).

I don't recall running into any Glow asserts off the top of my head (and a quick grep didn't find anything). Just having asserts enabled for LLVM in release mode would likely solve most of our problems. Assuming the Glow asserts are enabled in the debug build, I don't think it needs to be enabled in the Release build.

@rosierm thanks for raising this.
I have previously built llvm with assertions on and eliminated these problems in the past but they will of course continue to reappear if we don't have testing for it.
@rdzhabarov do you know how feasible it would be to modify the llvm build we have in CI to do this?

@rdzhabarov do you know how feasible it would be to modify the llvm build we have in CI to do this?

@jackm321
This is all very doable. Might be similar to how we added llvm8 docker image.
Let's sync up on this in person.

The only concern would be compilation + execution time. If that could be reasonable we should be able to make this as a CI job.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stoklund picture stoklund  路  5Comments

speryt picture speryt  路  3Comments

georgeokelly picture georgeokelly  路  4Comments

gcatron picture gcatron  路  4Comments

mciprian13 picture mciprian13  路  3Comments