Cudf: [QST] Multiple GPU memories with Dask-cuDF

Created on 3 Aug 2020 · 3Comments · Source: rapidsai/cudf

Follow-up: #5798

Hi again,

It seems for Dask-cuDF, CUDA_VISIBLE_DEVICES is for specifying the (multiple) GPUs for compute, but with only one GPU's memory. I confirmed on my machine that CUDA_VISIBLE_DEVICES works well with multiple GPUs and Dask-cuDF combination. It worked but seems using only one GPU memory even over multiple GPUs.

How can I distribute the memory usage into multiple GPUs with Dask-cuDF?
I have two NVLinked GeForce RTX 2080 Ti, is this a supported GPU for distributing the memory?

Thank you!

dask question

Source

jmkim

Most helpful comment

Thanks for sharing this example. Dask will only use a single GPU by default, so this is expected behavior.

If you want to use multiple GPUs, you'll want to either use the LocalCUDACluster API or launch two dask workers from the command line (one per GPU). Note that, for your example use case, you may want to use LocalCUDACluster.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

# Create a Dask Cluster with one worker per GPU
cluster = LocalCUDACluster()
client = Client(cluster)

With that setup, Dask will distribute work over all GPUs in your machine. For more information on how to use Dask with GPUs, please see the Dask-CUDA docs.

Also note that when you call compute you bring all of the result data for that task to the client process on a single GPU, which can cause memory problems. You may want to persist your data into distributed memory. Please see the Dask docs for more information.

beckernick on 3 Aug 2020

👍3

All 3 comments

I doubt you are only using one GPU's memory. What symptoms are you seeing that make you think the other GPU's memory is not being used? It would probably be very slow if that were true.

GeForce does not support NVLink. Do you mean SLI?

harrism on 3 Aug 2020

I doubt you are only using one GPU's memory. What symptoms are you seeing that make you think the other GPU's memory is not being used? It would probably be very slow if that were true.

When one GPU's memory is full, MemoryError is happened even though another GPU's memory is still empty.

I tested with following demo code:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

import dask_cudf

ddf_ratings = {}
ddf_ratings_computed = {}
iter = 0
memory_full = False

while not memory_full:
    try:
        ddf_ratings[iter] = dask_cudf.read_csv("ml-25m/ratings.csv")
        ddf_ratings_computed[iter] = ddf_ratings[iter].compute()
        iter += 1
    except MemoryError:
        # Stop the creation when memory is full.
        memory_full = True

print(iter, "dataframe created.")

Full demo code: https://colab.research.google.com/drive/1h4NE3whF6bYHUFb1H_C0Uw4NjpR_dWT5?usp=sharing

The result is like:

$ nvidia-smi
Mon Aug  3 16:10:36 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:02:00.0 Off |                  N/A |
| 43%   42C    P8    11W / 250W |  10329MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:81:00.0 Off |                  N/A |
| 37%   35C    P8    12W / 250W |     12MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     52668      C   ...dblab/anaconda3/envs/jupyter/bin/python 10317MiB |
+-----------------------------------------------------------------------------+

GeForce does not support NVLink. Do you mean SLI?

I am not sure following results are pointing that these GPUs are connected as NVLink or SLI...

$ nvidia-smi topo -m
        GPU0    GPU1    CPU Affinity
GPU0     X      NV2     0-7,16-23
GPU1    NV2      X      8-15,24-31

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

$ nvidia-smi nvlink -s
GPU 0: GeForce RTX 2080 Ti (UUID: GPU-<snipped>)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-<snipped>)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s

Thank you!

jmkim on 3 Aug 2020

Thanks for sharing this example. Dask will only use a single GPU by default, so this is expected behavior.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

# Create a Dask Cluster with one worker per GPU
cluster = LocalCUDACluster()
client = Client(cluster)

With that setup, Dask will distribute work over all GPUs in your machine. For more information on how to use Dask with GPUs, please see the Dask-CUDA docs.

beckernick on 3 Aug 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings