Is it expected to get a segmentation fault when disabling sequential execution on a CUDA provider? I am trying this on a UNet architecture, and when disabling sequential execution on CPU I get a small speed up in prediction. On the other hand, I get a segfault when I try the same on CUDA.
This is on the master branch a few days after v0.3.0 was released.
I have the following log (log level set to VERBOSE)
2019-03-27 12:07:52.630958147 [I:onnxruntime:, parallel_executor.cc:109 RunNodeAsyncInternal] Begin execution
2019-03-27 12:07:52.630973948 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 256
2019-03-27 12:07:52.630979200 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 512
2019-03-27 12:07:52.630983203 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 1024
2019-03-27 12:07:52.630986816 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 2048
2019-03-27 12:07:52.630990610 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 4096
2019-03-27 12:07:52.630994136 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 8192
2019-03-27 12:07:52.630997661 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 16384
2019-03-27 12:07:52.631001382 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 32768
2019-03-27 12:07:52.631004899 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 65536
2019-03-27 12:07:52.631008419 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 131072
2019-03-27 12:07:52.631012444 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 262144
2019-03-27 12:07:52.631017133 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 524288
2019-03-27 12:07:52.631021192 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 1048576
2019-03-27 12:07:52.631025220 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 2097152
2019-03-27 12:07:52.631029218 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 4194304
2019-03-27 12:07:52.631033158 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 8388608
2019-03-27 12:07:52.631037221 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 16777216
2019-03-27 12:07:52.631041131 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 33554432
2019-03-27 12:07:52.631045129 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 67108864
2019-03-27 12:07:52.631049114 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 134217728
2019-03-27 12:07:52.631053134 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:25 BFCArena] Creating bin of max chunk size 268435456
2019-03-27 12:07:52.631263617 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:102 Extend] Extending allocation by 2097152 bytes.
2019-03-27 12:07:52.631272417 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:106 Extend] Total allocated bytes: 2097152
2019-03-27 12:07:52.631277591 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:109 Extend] Allocated memory at 0x7fd5e6e00000 to 0x7fd5e7000000
2019-03-27 12:07:52.631398365 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:102 Extend] Extending allocation by 2097152 bytes.
2019-03-27 12:07:52.631407488 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:106 Extend] Total allocated bytes: 4194304
2019-03-27 12:07:52.631411546 [I:onnxruntime:dipsauce_onnx_environment, bfc_arena.cc:109 Extend] Allocated memory at 0x7fd5e7000000 to 0x7fd5e7200000
Segmentation fault (core dumped)
EDIT: the segfault occurs at onnxruntime::CUDAExecutionProvider::AddDeferredReleaseCPUPtr(void*)
I just verified that it's the same situation with the current master branch commit.
We don't support parallel_executor for CUDA case. Do you have multiple GPU card? Why do you want to use parallel_executor?
Thanks @HectorSVC . I was not sure what 'parallel' meant in each case. I understand what it means on the CPU, and on the GPU I thought that it could enable a higher degree of parallelism, which is why I was trying it. I would expect that if it is not supported then it would throw an exception or return an error, instead of segfaulting. Thanks for the clarification.
Closing as the appropriate error message will be relayed after #988
Most helpful comment
We don't support parallel_executor for CUDA case. Do you have multiple GPU card? Why do you want to use parallel_executor?