Cudf: [DOC] Replace OutOfMemory exception with UnsupportedGPU exception

Created on 20 Mar 2020  路  10Comments  路  Source: rapidsai/cudf

Describe the bug

We end up getting deployed in scenarios where cudf, on initialization, throws an RMM exception for out of memory during load, when in reality it is rejecting the available hardware.

This typically impacts our new-to-GPU users and even advanced ones during a config mistakes. E.g., on Azure, the default-available GPUs are K80s (until users jump quota hoops), so the typical Azure first-use experience is to spin up and get this misleading error. It's quite a tough and misleading experience for most people until they've been burnt enough.

We end up doing all sorts of things to try to get users to pick the right env etc. beforehand, but invariably, mistakes will happen, even by advanced users (misconfig, ...).

Not sure if this is better in cudf or rmm.

Steps/Code to reproduce bug

import cudf ; cudf.DataFrame({'x': [1]}) on an old popular GPU like the k80

And/or on common other init next steps like set_alloc

Expected behavior

Fail with UnsupportedDeviceError or something similarly indicative

Environment overview (please complete the following information)

Everywhere. We happen to hit it in Docker.

bug cuDF (Python)

All 10 comments

We have similarly poor error messages for old CUDA versions, old driver versions, etc. that we should handle in one swoop.

We should also strive to maintain allowing cudf to be imported on a machine with no GPU for things like API enumeration and whatnot.

@lmeyerov what version are you on?

We were getting reports from 0.7 -- 0.11. We're shipping 0.12 next week and then switch internal to 0.13

(0.13/0.14 should be faster b/c the 0.11 & 0.12 upgrades involved 100-200 unit tests & further automation around how we use it)

Also, we always ship as docker, and in cloud cases but not on-prem, get to control the host as ubuntu 18 + whatever the aws+azure nvidia drivers are at that time

If overhead on the check is a concern, another option that is fine for us as sw devs is explicit opt-in call.

Ex: something like a healthcheck() or validity() . in opencl, you get back a set of valid devices & their specs, and can even pick which you're using (==> cooperatively schedulable.)

That wouldn't help direct cudf users like data scientists tho.

This is pretty straightforward.

At import cudf:

Does numba or cupy already wrap the appropriate APIs? Or would we need to do so in cuDF cython?

To support @kkraus14's comment of being able to do import cudf on machines without GPUs, you can first do cudaGetDeviceCount() and only run the above if the number of devices is greater than zero.

If overhead on the check is a concern, another option that is fine for us as sw devs is explicit opt-in call.

Overhead on the check should be pretty low so I'm not too concerned.

We were getting reports from 0.7 -- 0.11. We're shipping 0.12 next week and then switch internal to 0.13

Note 0.12 has a bunch of non-trivial memory overhead for Strings where you may just want to go from 0.11 --> 0.13 which has the overhead completely removed and additional memory usage improvements.

We're stuck near-term on 0.12 b/c blazing isn't blessed for 0.13 afaict: https://anaconda.org/blazingsql/blazingsql

But yeah, the 0.12/0.13/0.14 upgrades seem to be battling overhead & memory issues we're seeing, so def excited!

@jrhemstad Sanity check re:not importing, will doing submodule imports (from cudf.io.parquet...) still trigger running cudf/__init__.py? I'm not up on python module semantics, so not sure if putting the check at module import time will still allow GPU-less module reflection.

@jrhemstad Sanity check re:not importing, will doing submodule imports (from cudf.io.parquet...) still trigger running cudf/__init__.py? I'm not up on python module semantics, so not sure if putting the check at module import time will still allow GPU-less module reflection.

Yes it still will: https://docs.python.org/3/reference/import.html#regular-packages

Was this page helpful?
0 / 5 - 0 ratings