Taichi: [CUDA] detected to be supported and crash on card without unified memory

Created on 13 Apr 2020 · 15Comments · Source: taichi-dev/taichi

Describe the bug
CUDA detected to be SUPPORTED on a machine without CUDA.
It's because is_cuda_api_avaliable returned true even if I don't have CUDA.

Log/Screenshots

(yuanming-hu/glfw) [bate@archit taichi]$ python examples/mpm128.py  
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-7haz507t
[Taichi] sandbox prepared
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 4e2e5605, python 3.8.2
[Hint] Use WSAD/arrow keys to control gravity. Use left/right mouse bottons to attract/repel. Press R to reset.
[W 04/13/20 09:29:21.266] [cuda_driver.h:call_with_warning@60] CUDA Error CUDA_ERROR_INVALID_DEVICE: invalid device ordinal while calling mem_advise (cuMemAdvise)
[E 04/13/20 09:29:21.860] Received signal 7 (Bus error)


***********************************
* Taichi Compiler Stack Traceback *                                                          
***********************************                                                          
/tmp/taichi-7haz507t/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)                                  
/tmp/taichi-7haz507t/taichi_core.so: taichi::signal_handler(int)                             
/usr/lib/libc.so.6(+0x3bd70) [0x7f359062bd70]                                                
/tmp/taichi-7haz507t/taichi_core.so: taichi::lang::MemoryPool::daemon()
/usr/lib/libstdc++.so.6(+0xcfb24) [0x7f357ff41b24]
/usr/lib/libpthread.so.0(+0x946f) [0x7f359021746f]
/usr/lib/libc.so.6: clone
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 8383
[New LWP 8388]
[New LWP 8389]
[New LWP 8390]
[New LWP 8391]
[New LWP 8396]
[New LWP 8397]
[New LWP 8398]
[New LWP 8399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007f3580fc910c in llvm::Twine::toVector(llvm::SmallVectorImpl<char>&) const ()
   from /tmp/taichi-7haz507t/taichi_core.so
(gdb)

To Reproduce
Just run the example/mpm128.py.

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

dependency potential bug

Source

archibate

Most helpful comment

Yes, it did.

(gdbtrig) [bate@archit taichi]$ TI_USE_UNIFIED_MEMORY=0 p examples/fractal.py 
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-mxhjexut
[Taichi] sandbox prepared
[I 04/13/20 09:42:03.300] [cuda_driver.cpp:CUDADriver@30] CUDA DETECTED
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 4e2e5605, python 3.8.2
X connection to :0 broken (explicit kill or server shutdown).

The with_cuda still returns true however, according to my TI_INFO("CUDA_DETECTED");.

archibate on 13 Apr 2020

👍2

All 15 comments

Introduced in #756

archibate on 13 Apr 2020

Does setting envvar TI_USE_UNIFIED_MEMORY=0 fix your problem?

yuanming-hu on 13 Apr 2020

👍1

Yes, it did.

(gdbtrig) [bate@archit taichi]$ TI_USE_UNIFIED_MEMORY=0 p examples/fractal.py 
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-mxhjexut
[Taichi] sandbox prepared
[I 04/13/20 09:42:03.300] [cuda_driver.cpp:CUDADriver@30] CUDA DETECTED
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 4e2e5605, python 3.8.2
X connection to :0 broken (explicit kill or server shutdown).

The with_cuda still returns true however, according to my TI_INFO("CUDA_DETECTED");.

archibate on 13 Apr 2020

👍2

I never tried the CUDA backend without unified memory on Linux - glad that it works now. Maybe we should simply disable CUDA backend unified memory on NVIDIA GTX 9 series on Linux? It's better than not having a CUDA backend at all, for these users.

yuanming-hu on 13 Apr 2020

How important unified memory is? If not having unified memory will cause taichi fail to run, maybe we want to make with_cuda return false when unified memory not supported?

archibate on 13 Apr 2020

Or, we should detect if a card support UM and set TI_UNIFIED_MEMORY=0 or 1 automatically? Or we should default to 0?

archibate on 13 Apr 2020

USE_UM=0, and got OOM when testing numpy:

______________________________________ test_numpy_2d _______________________________________
[gw2] linux -- Python 3.8.2 /usr/bin/python3

    @ti.all_archs
    def test_numpy_2d():
        val = ti.var(ti.i32)

        n = 4
        m = 7

        @ti.layout
        def values():
            ti.root.dense(ti.i, n).dense(ti.j, m).place(val)

        @ti.kernel
        def test_numpy(arr: ti.ext_arr()):
            for i in range(n):
                for j in range(m):
                    arr[i, j] += i + j

        a = np.empty(shape=(n, m), dtype=np.int32)

        for i in range(n):
            for j in range(m):
                a[i, j] = i * j

>       test_numpy(a)

tests/python/test_numpy.py:75: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
python/taichi/lang/kernel.py:484: in wrapped
    primal(*args, **kwargs)
python/taichi/lang/kernel.py:414: in __call__
    self.materialize(key=key, args=args, arg_features=arg_features)
python/taichi/lang/kernel.py:220: in materialize
    self.runtime.materialize()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <taichi.lang.impl.PyTaichi object at 0x7ff1406bf970>

    def materialize(self):
        if self.materialized:
            return
        self.create_program()
        Expr.layout_materialized = True

        def layout():
            for func in self.layout_functions:
                func()

        import taichi as ti
        ti.trace("Materializing layout...".format())
>       taichi_lang_core.layout(layout)
E       RuntimeError: [cuda_driver.h:operator()@66] CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling malloc (cuMemAlloc_v2)

python/taichi/lang/impl.py:179: RuntimeError
----------------------------------- Captured stdout call ------------------------------------
Running test on arch=Arch.x64
Running test on arch=Arch.cuda
[E 04/13/20 15:14:17.564] [cuda_driver.h:operator()@66] CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling malloc (cuMemAlloc_v2)

archibate on 13 Apr 2020

I think I found the problem of OOM:

MULTI-THREADING!!!!

No error with -t1.

archibate on 13 Apr 2020

Do you have 2 NV card? So your 2 cores running good. And I have 1 NV card, but 4 cores, thus resource not enough to lock, and oom.

archibate on 13 Apr 2020

Seems the first one to run will obtain the resource, and the second will OOM:

(glew) [bate@archit taichi]$ p multirestart.my.py & p multirestart.my.py  
[1] 5805
[Taichi] mode=development
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-vs1ag8qy
[Taichi] preparing sandbox at /tmp/taichi-iukg5cuq
[Taichi] sandbox prepared
[Taichi] sandbox prepared
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 2ffc1e55, python 3.8.2
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 2ffc1e55, python 3.8.2
[E 04/13/20 22:57:47.637] [cuda_driver.h:operator()@66] CUDA Error CUDA_ERROR_OUT_OF_MEMORY: out of memory while calling malloc (cuMemAlloc_v2)


***********************************
* Taichi Compiler Stack Traceback *                                                          
***********************************                                                          
/tmp/taichi-iukg5cuq/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)                                  
/tmp/taichi-iukg5cuq/taichi_core.so: taichi::lang::CUDADriverFunction<void*, unsigned long>::operator()(void*, unsigned long)                                                             
/tmp/taichi-iukg5cuq/taichi_core.so: taichi::lang::Program::initialize_runtime_system(taichi::lang::StructCompiler*)                                                                      
/tmp/taichi-iukg5cuq/taichi_core.so: taichi::lang::Program::materialize_layout()             
/tmp/taichi-iukg5cuq/taichi_core.so: taichi::lang::layout(std::function<void ()> const&)     
/tmp/taichi-iukg5cuq/taichi_core.so(+0x7dea29) [0x7f5e046a3a29]                              
/tmp/taichi-iukg5cuq/taichi_core.so(+0x68f566) [0x7f5e04554566]                              
/usr/lib/libpython3.8.so.1.0: PyCFunction_Call                                               
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall                                           
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0(+0x1e0902) [0x7f5e1413a902]                                     
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0(+0x17e0ad) [0x7f5e140d80ad]                                     
/usr/lib/libpython3.8.so.1.0(+0x17eec8) [0x7f5e140d8ec8]                                     
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode                                                
/usr/lib/libpython3.8.so.1.0(+0x2668c8) [0x7f5e141c08c8]                                     
/usr/lib/libpython3.8.so.1.0(+0x26aba3) [0x7f5e141c4ba3]                                     
/usr/lib/libpython3.8.so.1.0: PyRun_FileExFlags                                              
/usr/lib/libpython3.8.so.1.0: PyRun_SimpleFileExFlags                                        
/usr/lib/libpython3.8.so.1.0: Py_RunMain
/usr/lib/libpython3.8.so.1.0: Py_BytesMain
/usr/lib/libc.so.6: __libc_start_main
python(_start+0x2e) [0x563f74cfc05e]
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 5806
[New LWP 5819]
[New LWP 5820]
[New LWP 5821]
[New LWP 5822]
[New LWP 5829]
[New LWP 5830]
[New LWP 5832]
[New LWP 5833]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
3
0x00007f5e143e69af in wait4 () from /usr/lib/libc.so.6
(gdb)

archibate on 13 Apr 2020

I found: Allocating device memory 1.00 GB

Why allocate so much???

archibate on 13 Apr 2020

I would argue that I only have 2GB gpu memory in total... and of course it crashes in multi-threading test.

archibate on 13 Apr 2020

Solution 1: device_memory_fraction = 1 / (threads + 1) in test.
Solution 2: spinlock until memory enough in test.

archibate on 13 Apr 2020

👍1

I guess solution 1 is probably easier. Or we can just ask people not to use too many threads when GPU memory is scarce. (Sorry about my delayed reply - workday starts on my end so I have meetings in the morning...)

yuanming-hu on 13 Apr 2020

👍1

...Sorry about my delayed reply...

No rush at all!

archibate on 13 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Doc] Improve documentation

yuanming-hu · 3Comments

[bug] Testing rerun with failure no longer works with pytest 6.1.0

yuanming-hu · 3Comments

raise python errors instead of c++ abort #533

archibate · 4Comments

[Discussion] Should we drop support for Python 3.5?

yuanming-hu · 3Comments

module 'taichi' has no attribute 'gpu'

GeoffreyPlitt · 4Comments