Or-tools: EXCEPTION_ACCESS_VIOLATION still occurs on master

Created on 11 Mar 2020  Â·  21Comments  Â·  Source: google/or-tools

Copy of closed #1864 . All details are stated there.

Versions: from 7.5 to master on 2020_03_10
Language: Java, Python
Solver: CP-SAT
Operating system: Windows 10

On current master, crashing is even more rare. Nevertheless, I am still able to reproduce it in a matter of seconds.

Bug C++ Java Python Windows CP / CP-SAT Solver

All 21 comments

Please send a model.
Laurent Perron | Operations Research | [email protected] | (33) 1 42 68 53
00

Le mer. 11 mars 2020 à 10:56, prissky notifications@github.com a écrit :

Copy of closed #1864 https://github.com/google/or-tools/issues/1864 .
All details are stated there.

Versions: from 7.5 to master on 2020_03_10
Language: Java, Python
Solver: CP-SAT
Operating system: Windows 10

On current master, crashing is even more rare. Nevertheless, I am still
able to reproduce it in a matter of seconds.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/google/or-tools/issues/1918?email_source=notifications&email_token=ACUPL3MLT2PAUYZNZWCEY3DRG5N4BA5CNFSM4LFRZFRKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IUE55XA,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACUPL3LQMWP3UZDP7TLZPZ3RG5N4BANCNFSM4LFRZFRA
.

Attached below, you can find the same model from original issue:
problem_2020_01_31.zip

I just ran it 200 times with parallelism.
No crash.
Laurent Perron | Operations Research | [email protected] | (33) 1 42 68 53
00

Le mer. 11 mars 2020 à 12:14, prissky notifications@github.com a écrit :

Attached below, you can find the same model from original issue:
problem_2020_01_31.zip
https://github.com/google/or-tools/files/4317832/problem_2020_01_31.zip

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/or-tools/issues/1918?email_source=notifications&email_token=ACUPL3IJZGHJXUZA6HBILMDRG5XBLA5CNFSM4LFRZFRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOPEGCA#issuecomment-597574408,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACUPL3J45WQYV33JDURYOYDRG5XBLANCNFSM4LFRZFRA
.

The behavior is strange - usually, if it does not fail in the first iteration of the loop, it does not crash in the next iterations either. In such case, I restart the loop few times manually, and the error occurs soon...
Sorry for a poor description. I am not able to deduce the cause of the problem by myself, but I suspect it comes from the C++ and is not Java or Python related.

The solver is purely stateless.
Laurent Perron | Operations Research | [email protected] | (33) 1 42 68 53
00

Le mer. 11 mars 2020 à 14:49, prissky notifications@github.com a écrit :

The behavior is strange - usually, if it does not fail in the first
iteration of the loop, it does not crash in the next iterations either. In
such case, I restart the loop few times manually, and the error occurs
soon...
Sorry for a poor description. I am not able to deduce the cause of the
problem by myself, but I suspect it comes from the C++ and is not Java or
Python related.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/or-tools/issues/1918#issuecomment-597645957,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACUPL3PFG6MQHXVCLTHM6B3RG6JIBANCNFSM4LFRZFRA
.

I reproduced it very easily ... althought it seems that this happen more frequently on amd than intel cpu but maybe also number of cores matters.

Compilation with debug option gave me more precise information
C [jniortools.dll+0xb8ac62] operations_research::sat::LinearConstraintManager::~LinearConstraintManager+0x1d2

I reproduced it also in c++ ... there's no chance to complete loop of 100 iterations, tested on two computers with similar results.

Attached is source code in c++.

execute_proto.zip

successsuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccesssuccessfinished

Both on mac and linux.

I am starting to suspect a compiler bug on visual studio.
Laurent Perron | Operations Research | [email protected] | (33) 1 42 68 53
00

Le jeu. 12 mars 2020 à 23:01, gregy4 notifications@github.com a écrit :

I reproduced it also in c++ ... there's no chance to complete loop of 100
iterations, tested on two computers with similar results.

Attached is source code in c++.

execute_proto.zip
https://github.com/google/or-tools/files/4326817/execute_proto.zip

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/or-tools/issues/1918#issuecomment-598439525,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACUPL3NG4TVK5JEO7KQMF7DRHFLUFANCNFSM4LFRZFRA
.

I've just experienced the error on Debian 10 from C++ (successes but no finish).

I compiled and tested c++ example on debian and recompiled (and restested) on clean copy of or-tools sources in windows 10 to be sure. Both on current master. Results are that I have no error in linux and many not finished loops in windows 10. I also wanted to try to compile or-tools with clang on windows but I wasn't succesful.

Interesting are different statistics for linux and windows, in linux number of conflicts is mostly between 0 and 1, in windows number of conflicts in is thousands.

linux

CpSolverResponse:
status: FEASIBLE
objective: NA
best_bound: NA
booleans: 2250
conflicts: 1
branches: 5094
propagations: 67652
integer_propagations: 72746
walltime: 0.260825
usertime: 0.260825
deterministic_time: 0.636647
primal_integral: 0

windows

CpSolverResponse:
status: FEASIBLE
objective: NA
best_bound: NA
booleans: 2250
conflicts: 21243
branches: 34068
propagations: 1603960
integer_propagations: 81616
walltime: 0.99312
usertime: 0.99312
deterministic_time: 5.99405
primal_integral: 0

@gregy4 On windows, which version of VS did you use ? Did you use msbuild of minggw ? Did you use the Makefile based build or the CMake one (or the bazel one) ?

On windows I use VS 2019 community edition, compilation in x64 native tools command prompt and Makefile based build (make third_party, make cc).

I retested my c++ example on Windows with or-tools 7.6 compiled by clang (LLVM 10) and VS2017 installed. The problem is still here so it it seems that the problem doesn't depend only on vc compiler.

Same situation on v7.7 [Java, Win 10].

However there seems to be an improvement: v7.7 only crashes with num_search_workers >= 5, while on v7.6, the problem can be reproduced even with num_search_workers = 3 or 4. I suspect this change comes from the commit fix #2005.

On v7.8 [Win 10, MS VS, Release build by CMake], with num_search_workers = 8, I get a _read access violation_ exception thrown on https://github.com/google/or-tools/blob/stable/ortools/sat/linear_constraint_manager.cc#L58 every time.
https://github.com/google/or-tools/blob/a0a56698ba8fd07b7f84aee4fc45d891a8cd9828/ortools/sat/linear_constraint_manager.cc#L58

If commented out, solver works flawlessly (it only logs cuts anyway).

Don't know if related, but examples/cpp/golomb_sat.cpp is also consistently failing in the github CI (and appveyor ?) on Windows only (CMake based build)...

Yes, I got the exception with golomb_sat as well. Strangely enough, I had to set higher number of search workers to reproduce it there. Moreover, this does not seem to happen with debug-build. Again, commenting out the if-statement solves the problem.

I believe it is solved on master.
Laurent Perron | Operations Research | [email protected] | (33) 1 42 68 53
00

Le ven. 14 août 2020 à 15:23, prissky notifications@github.com a écrit :

Yes, I got the exception with golomb_sat as well. Strangely enough, I had
to set higher number of search workers to reproduce it there. Moreover,
this does not seem to happen with debug-build.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/google/or-tools/issues/1918#issuecomment-674072541,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACUPL3OL4TNWTYMA2KJ4EELSAU3FHANCNFSM4LFRZFRA
.

Thank you for your response. The error still persists even on today's master, both on golumb_sat and on my original problem.

N = 1, optimal length = 0 (conflicts:0, time=0.005046 s)
N = 2, optimal length = 1 (conflicts:0, time=0.015931 s)
N = 3, optimal length = 3 (conflicts:0, time=0.016050 s)
N = 4, optimal length = 6 (conflicts:0, time=0.015925 s)
N = 5, optimal length = 11 (conflicts:1, time=0.016201 s)
N = 6, optimal length = 17 (conflicts:3, time=0.015895 s)
N = 7, optimal length = 25 (conflicts:29, time=0.032087 s)
N = 8, optimal length = 34 (conflicts:137, time=0.048254 s)

...\repos\OR-Tools\out\buildx64-Release\bin\golomb_sat.exe (process 28760) exited with code -1073741819.

However, I had to downgrade protobuf to v3.12.2 to be able to build. Could it be related?

Many apologies! My bad, I was few commits behind. Current master works. Sorry again!

Was this page helpful?
0 / 5 - 0 ratings