nvidia-smi returns
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVS 315 Off | 0000:03:00.0 N/A | N/A |
| 30% 45C P0 N/A / N/A | 3MiB / 1023MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:04:00.0 Off | 0 |
| N/A 55C P0 67W / 235W | 55MiB / 11519MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
When setting gpi_deviceid in Makefile and running Caffe, are they the same as in the output of nvidia-smi (i.e. TEST_GPUID=1 uses Tesla K40)
Operating system: Ubuntu 16.04
Compiler: cxx
CUDA version (if applicable): 7.5
CUDNN version (if applicable):
BLAS: ATLAS
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 2.7
the Gpu id from Caffe perspective is the reverse of nvidia-smi . If you have 4 GPUs, then Caffe GPU 0 is 3 in nvidia-smi
Thanks. Where does it come from? I didn't find any reference to this setting
I don't know why this happen, that's my observation.
Setting this environment variable solved this issue for me:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
It's documented here
And the CUDA order is not always the reverse of nvidia-smi, sometimes the order is the same, it depends.
thanks for the explaination
This is a link to nvidia, but my issue is with Caffe. When I set gpu to 0 or 1, how do I know which is which beforehand?
caffe uses CUDA, so it follows CUDA's rules which are in the above link
@flx42 and @azgo14 are correct, this is working as expected following the CUDA configuration.
Ok, when I deviceQuery cuda, the order is
Device 0: "Tesla K40m"
Device 1: "NVS 315"
Is this the order in which caffe accesses the cards?
Also lspci -nn |grep 'NVIDIA' returns
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 315] [10de:107c](rev a1)
03:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller 10de:0e08
04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023](rev a1)
That's really confusing
Most helpful comment
It's documented here
And the CUDA order is not always the reverse of
nvidia-smi, sometimes the order is the same, it depends.