Cudf: [BUG] Memory error when cudf is used in python multiprocessing Pool

Created on 19 Jun 2020 · 8Comments · Source: rapidsai/cudf

Cudf 0.14 gives error when used in python multiprocessing Pool. However, it works in version 0.12. Here is the code to reproduce.

import cudf
import pandas as pd
from multiprocessing import Pool

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

# Parallelize the method calls
with Pool(2) as pool:
    pool.map(get_df, [1,2])

Error is

MemoryError                               Traceback (most recent call last)
<ipython-input-1-757e5676c563> in <module>
     11 
     12 with Pool(2) as pool:
---> 13     pool.map(get_df, [1,2])

~/miniconda3/envs/gpu/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/miniconda3/envs/gpu/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorInitializationError initialization error

cudf was installed using Anaconda on bare metal. I am attaching the outputs of cudf/print_env.sh
print_env_12.txt
print_env_14.txt

bug cuDF (Python)

Source

achinta

All 8 comments

Looks related to the new memory resource bindings; will investigate.

shwina on 19 Jun 2020

Thanks for reporting. This is likely due to the call to fork(), which will attempt to share the CUDA context created in the parent process. One fix is to use spawn() instead:

import cudf
import pandas as pd
from multiprocessing import get_context

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

if __name__ == "__main__":
    ctx = get_context("spawn")

    # Parallelize the method calls
    with ctx.Pool(2) as pool:
        print(pool.map(get_df, [1,2]))

Does that help with your problem?

shwina on 19 Jun 2020

No. The error message is

AttributeError: Can't get attribute 'get_df' on <module '__main__' (built-in)>

achinta on 20 Jun 2020

Hmm, how are you running this test? Interactively with IPython/Jupyter or invoking it as a script?

shwina on 22 Jun 2020

MemoryError                               Traceback (most recent call last)
<ipython-input-1-757e5676c563> in <module>

Looks like in iPython.

No. The error message is

AttributeError: Can't get attribute 'get_df' on <module '__main__' (built-in)>

Looks like the known limitation (the 2nd gray box in the link) of the python's multiprocessing when used interactively with iPython.

When launched as a script, @shwina 's suggestion shouldn't see the AttributeError, should it?

But not sure about the original error message below, is it related to usage of fork vs spawn or sth else.

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorInitializationError initialization error

philtrade on 23 Jun 2020

👍1

... This is likely due to the call to fork(), which will attempt to share the CUDA context created in the parent process. One fix is to use spawn() instead:

import cudf
import pandas as pd
from multiprocessing import get_context

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

if __name__ == "__main__":
    ctx = get_context("spawn")

    # Parallelize the method calls
    with ctx.Pool(2) as pool:
        print(pool.map(get_df, [1,2]))

Verified that this works when run as a script. With the default fork() starting method, it would hit the initialization error.

philtrade on 23 Jun 2020

❤1

@shwina your suggestion works fine when run as a script. Thanks.

achinta on 24 Jun 2020

Thanks for letting us know!

shwina on 24 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings