I found that GPUDeviceStorage is not used, however CPUDeviceStorage is used, why is that?
Are you sure you compiled with CUDA support enabled? What is it that you are trying to do?
I didn't compile with CUDA, I'm reading the code, I see that when CUDA is used, it actually use GPUPooledStorageManager to do the allocation, on the other hand, CPUDeviceStorage is used no where.
I'm just curious why CPUDeviceStorage is there, and not in use.
I'm on the master branch.
Oh, ok. This is because CUDA allocations and deallocations are much more expensive (since they deal with pinned memory), especially with multiple GPUs per node. They also introduce stalls in the GPU pipeline, which is not something we want for maximum performance. That is why MXNet uses caching allocator (GPUPooledStorageManager) instead of a naive one.
@szha please close
Most helpful comment
Oh, ok. This is because CUDA allocations and deallocations are much more expensive (since they deal with pinned memory), especially with multiple GPUs per node. They also introduce stalls in the GPU pipeline, which is not something we want for maximum performance. That is why MXNet uses caching allocator (GPUPooledStorageManager) instead of a naive one.