Cudf: [BUG] Memory error when cudf is used in python multiprocessing Pool

Created on 19 Jun 2020  路  8Comments  路  Source: rapidsai/cudf

Cudf 0.14 gives error when used in python multiprocessing Pool. However, it works in version 0.12. Here is the code to reproduce.

import cudf
import pandas as pd
from multiprocessing import Pool

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

# Parallelize the method calls
with Pool(2) as pool:
    pool.map(get_df, [1,2])

Error is

MemoryError                               Traceback (most recent call last)
<ipython-input-1-757e5676c563> in <module>
     11 
     12 with Pool(2) as pool:
---> 13     pool.map(get_df, [1,2])

~/miniconda3/envs/gpu/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/miniconda3/envs/gpu/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorInitializationError initialization error

cudf was installed using Anaconda on bare metal. I am attaching the outputs of cudf/print_env.sh
print_env_12.txt
print_env_14.txt

bug cuDF (Python)

All 8 comments

Looks related to the new memory resource bindings; will investigate.

Thanks for reporting. This is likely due to the call to fork(), which will attempt to share the CUDA context created in the parent process. One fix is to use spawn() instead:

import cudf
import pandas as pd
from multiprocessing import get_context

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

if __name__ == "__main__":
    ctx = get_context("spawn")

    # Parallelize the method calls
    with ctx.Pool(2) as pool:
        print(pool.map(get_df, [1,2]))

Does that help with your problem?

No. The error message is

AttributeError: Can't get attribute 'get_df' on <module '__main__' (built-in)>

Hmm, how are you running this test? Interactively with IPython/Jupyter or invoking it as a script?

MemoryError                               Traceback (most recent call last)
<ipython-input-1-757e5676c563> in <module>

Looks like in iPython.

No. The error message is

AttributeError: Can't get attribute 'get_df' on <module '__main__' (built-in)>

Looks like the known limitation (the 2nd gray box in the link) of the python's multiprocessing when used interactively with iPython.

When launched as a script, @shwina 's suggestion shouldn't see the AttributeError, should it?

But not sure about the original error message below, is it related to usage of fork vs spawn or sth else.

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorInitializationError initialization error

... This is likely due to the call to fork(), which will attempt to share the CUDA context created in the parent process. One fix is to use spawn() instead:

import cudf
import pandas as pd
from multiprocessing import get_context

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

if __name__ == "__main__":
    ctx = get_context("spawn")

    # Parallelize the method calls
    with ctx.Pool(2) as pool:
        print(pool.map(get_df, [1,2]))

Verified that this works when run as a script. With the default fork() starting method, it would hit the initialization error.

@shwina your suggestion works fine when run as a script. Thanks.

Thanks for letting us know!

Was this page helpful?
0 / 5 - 0 ratings