Caffe: Order of GPUs in nvidia-smi vs. caffe

Created on 15 Nov 2016  ·  11Comments  ·  Source: BVLC/caffe

nvidia-smi returns

+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVS 315 Off | 0000:03:00.0 N/A | N/A |
| 30% 45C P0 N/A / N/A | 3MiB / 1023MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:04:00.0 Off | 0 |
| N/A 55C P0 67W / 235W | 55MiB / 11519MiB | 99% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

When setting gpi_deviceid in Makefile and running Caffe, are they the same as in the output of nvidia-smi (i.e. TEST_GPUID=1 uses Tesla K40)

Your system configuration

Operating system: Ubuntu 16.04
Compiler: cxx
CUDA version (if applicable): 7.5
CUDNN version (if applicable):
BLAS: ATLAS
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 2.7

Most helpful comment

It's documented here
And the CUDA order is not always the reverse of nvidia-smi, sometimes the order is the same, it depends.

All 11 comments

the Gpu id from Caffe perspective is the reverse of nvidia-smi . If you have 4 GPUs, then Caffe GPU 0 is 3 in nvidia-smi

Thanks. Where does it come from? I didn't find any reference to this setting

I don't know why this happen, that's my observation.

Setting this environment variable solved this issue for me:
export CUDA_DEVICE_ORDER=PCI_BUS_ID

It's documented here
And the CUDA order is not always the reverse of nvidia-smi, sometimes the order is the same, it depends.

thanks for the explaination

This is a link to nvidia, but my issue is with Caffe. When I set gpu to 0 or 1, how do I know which is which beforehand?

caffe uses CUDA, so it follows CUDA's rules which are in the above link

@flx42 and @azgo14 are correct, this is working as expected following the CUDA configuration.

Ok, when I deviceQuery cuda, the order is
Device 0: "Tesla K40m"
Device 1: "NVS 315"

Is this the order in which caffe accesses the cards?

Also lspci -nn |grep 'NVIDIA' returns

03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 315] [10de:107c](rev a1)
03:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller 10de:0e08
04:00.0 3D controller [0302]: NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023](rev a1)

That's really confusing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

prathmeshrmadhu picture prathmeshrmadhu  ·  3Comments

Ruhjkg picture Ruhjkg  ·  3Comments

hawklucky picture hawklucky  ·  3Comments

iamhankai picture iamhankai  ·  3Comments

erogol picture erogol  ·  3Comments