Tvm: TVM for ROCm 2.x is currently not working

Created on 20 Apr 2019 · 48Comments · Source: apache/tvm

Environment: Ubuntu 18.04 + ROCm 2.2 + TVM (built from current master with ROCM = ON)

I ensure the target TVM library successfully detect and link with ROCM, and the tuning procedure runs successfully, however, while executing tvm.build(s, arg_bufs, 'rocm', name='matmul'), it failed with the following error:

WARNING:autotvm:Too many errors happen in the tuning. Now is in debug mode
Finish loading 500 records
DEBUG:autotvm:Finish loading 500 records
Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4, 256, 256). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4, 256, 256). A fallback configuration is used, which may bring great performance regression.

Best config:
,None,None
[14:47:54] /host/docker/matmul_tvm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[14:47:54] /host/docker/matmul_tvm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
Traceback (most recent call last):
  File "matmul_autotvm.py", line 260, in <module>
    search_matmul_config(4, 256, 256, 500) # m, k, n, num_trials
  File "matmul_autotvm.py", line 165, in search_matmul_config
    func = tvm.build(s, arg_bufs, 'rocm', name='matmul')
  File "/host/docker/matmul_tvm/tvm/python/tvm/build_module.py", line 617, in build
    fhost, mdev = _build_for_device(flist, tar, target_host)
  File "/host/docker/matmul_tvm/tvm/python/tvm/build_module.py", line 484, in _build_for_device
    mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
  File "/host/docker/matmul_tvm/tvm/python/tvm/codegen.py", line 36, in build_module
    return _Build(lowered_func, target)
  File "/host/docker/matmul_tvm/tvm/python/tvm/_ffi/_ctypes/function.py", line 206, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (2) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(TVMFuncCall+0x61) [0x7f9598de3f01]
  [bt] (1) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(+0x14b2e9) [0x7f95986992e9]
  [bt] (0) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(+0x231aaa) [0x7f959877faaa]
  File "/host/docker/matmul_tvm/tvm/src/codegen/codegen.cc", line 46
TVMError: Check failed: bf != nullptr: Target rocm is not enabled

Source

ghostplant

All 48 comments

All target TVM libraries link to ROCm successfully:

/host/docker/matmul_tvm/tvm/build# ldd libtvm.so
        linux-vdso.so.1 (0x00007fff995ed000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc8f0e65000)
        libhip_hcc.so => /opt/rocm/lib/libhip_hcc.so (0x00007fc8f0560000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc8f01d7000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc8efe39000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc8efc21000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc8ef830000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc8f1d7f000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc8ef611000)
        libhc_am.so => /opt/rocm/lib/libhc_am.so (0x00007fc8ef3b1000)
        libmcwamp.so => /opt/rocm/lib/libmcwamp.so (0x00007fc8ef197000)
        libhsa-runtime64.so.1 => /opt/rocm/hsa/lib/libhsa-runtime64.so.1 (0x00007fc8eeee0000)
        libhsakmt.so.1 => /opt/rocm/lib/libhsakmt.so.1 (0x00007fc8eecbc000)
        libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007fc8eeaa2000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc8ee89a000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fc8ee68f000)
        libpci.so.3 => /lib/x86_64-linux-gnu/libpci.so.3 (0x00007fc8ee482000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc8ee265000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007fc8ee04a000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007fc8ede2c000)

ghostplant on 21 Apr 2019

I printed all the registered providers:
For TVM built with CUDA on, TVM runtime detects following providers:
[codegen.build_aocl] [codegen.build_opengl] [codegen.build_cuda] [codegen.build_metal] [codegen.build_opencl] [codegen.build_aocl_sw_emu] [codegen.build_sdaccel] [codegen.build_c] [codegen.build_stackvm]
and no errors for the whole tuning procedure.

For TVM built with OpenCL on, TVM runtime detects following providers:
[codegen.build_aocl] [codegen.build_opengl] [codegen.build_metal] [codegen.build_opencl] [codegen.build_aocl_sw_emu] [codegen.build_sdaccel] [codegen.build_c] [codegen.build_stackvm]
and no errors for the whole tuning procedure.

For TVM built with ROCm on, TVM runtime detects following providers:
[codegen.build_aocl] [codegen.build_opengl] [codegen.build_metal] [codegen.build_opencl] [codegen.build_aocl_sw_emu] [codegen.build_sdaccel] [codegen.build_c] [codegen.build_stackvm]
So it's clear that codegen.build_rocm is not detected.

ghostplant on 21 Apr 2019

Have you enabled LLVM? The rocm target won't be enabled if USE_LLVM is off (even if USE_ROCM is on)

masahi on 21 Apr 2019

@masahi After enabling LLVM, both codegen.build_rocm and codegen.build_llvm is detected from the providers list. However, the runtime turned from undetected exception into segment fault crash:

(tuning..)
(tuning..)
(tuning..)
(tuning..)
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
No: 97  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1555835059.0408077)   [('tile_k', [2, 4, 512]), ('tile_y', [32, 16, 2, 4]), ('tile_x', [4, 16, 1, 64]), ('auto_unroll_max_step', 1500)],,None,34382977
No: 98  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1555835059.1273813)   [('tile_k', [64, 64, 1]), ('tile_y', [64, 4, 4, 4]), ('tile_x', [16, 1, 1, 256]), ('auto_unroll_max_step', 512)],,None,17407578
No: 99  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1555835059.1275487)   [('tile_k', [32, 1, 128]), ('tile_y', [1, 8, 8, 64]), ('tile_x', [1, 16, 4, 64]), ('auto_unroll_max_step', 1500)],,None,34940157
No: 100 GFLOPS: 0.00/0.00       result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1555835059.12769)     [('tile_k', [64, 8, 8]), ('tile_y', [1, 2048, 2, 1]), ('tile_x', [4, 16, 1, 64]), ('auto_unroll_max_step', 512)],,None,15529098
Finish loading 668 records
Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4096, 4096, 4096). A fallback configuration is used, which may bring great performance regression.

Best config:
,None,None
[08:24:19] /host/docker/matmul_tvm/tune_rocm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[08:24:19] /host/docker/matmul_tvm/tune_rocm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[codegen.build_aocl] [codegen.build_opengl] [codegen.build_metal] [codegen.build_opencl] [codegen.build_aocl_sw_emu] [codegen.build_sdaccel] [codegen.build_c] [codegen.build_stackvm] [codegen.build_rocm] [codegen.build_nvptx] [codegen.build_llvm] ..
'gfx906' is not a recognized processor for this target (ignoring processor)
'gfx906' is not a recognized processor for this target (ignoring processor)
Segmentation fault (core dumped)

ghostplant on 21 Apr 2019

Is your card really gfx906? Then I think you need a fairly recent LLVM.

masahi on 21 Apr 2019

@masahi Yes, all other rocm apps run well. I am using LLVM 6.0, can you suggest a specific version that you have tested for ROCm 2.x? Seems like I tried some other LLVM version but not working well for TVM.

ghostplant on 21 Apr 2019

I have only used LLVM 6.0 with my gfx803 card. If other rocm apps are working, you can try the same LLVM version that comes with your rocm installation.

masahi on 21 Apr 2019

@masahi I upgraded to llvm-8.0 this time, but TVM throws another kind of exception:

[13:11:17] /host/docker/matmul_tvm/tune_rocm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[13:11:17] /host/docker/matmul_tvm/tune_rocm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...

Traceback (most recent call last):
  File "matmul_autotvm.py", line 259, in <module>
    search_matmul_config(64, 1024, 1024, 100) # m, k, n, num_trials
  File "matmul_autotvm.py", line 178, in search_matmul_config
    func(a, b, c)
  File "/host/docker/matmul_tvm/tune_rocm/tvm/python/tvm/_ffi/function.py", line 144, in __call__
    return f(*args)
  File "/host/docker/matmul_tvm/tune_rocm/tvm/python/tvm/_ffi/_ctypes/function.py", line 206, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7f16cf74cd51]
  [bt] (2) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(+0x95b917) [0x7f16cf7bc917]
  [bt] (1) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(+0x95b64f) [0x7f16cf7bc64f]
  [bt] (0) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(+0x1470a3) [0x7f16cefa80a3]
  File "/host/docker/matmul_tvm/tune_rocm/tvm/src/runtime/rocm/rocm_module.cc", line 98
  File "/host/docker/matmul_tvm/tune_rocm/tvm/src/runtime/module_util.cc", line 73
ROCMError: Check failed: ret == 0 (-1 vs. 0) : hipModuleGetFunction matmul_kernel0 failed with error: hipErrorNotFound

ghostplant on 21 Apr 2019

@masahi Seems like TVM is not working well with ROCm 2.2

ghostplant on 21 Apr 2019

hmm I don't know what's happening. It's been a while since I used rocm backend. Does a very simple example like vector add work?

masahi on 21 Apr 2019

@masahi Can you provide such an sample? Maybe the code I am using not correct as well.
However, the same code works for CUDA and OpenCL backend, only not working for ROCm backend.

ghostplant on 21 Apr 2019

https://github.com/dmlc/tvm/blob/master/tutorials/tensor_expr_get_started.py
You can replace cuda with rocm.

masahi on 21 Apr 2019

@masahi Also failed with the same error:

vecadd# python3 example.py
<class 'tvm.tensor.Tensor'>
Traceback (most recent call last):
  File "example.py", line 155, in <module>
    fadd(a, b, c)
  File "/host/docker/matmul_tvm/tune_rocm/tvm/python/tvm/_ffi/function.py", line 144, in __call__
    return f(*args)
  File "/host/docker/matmul_tvm/tune_rocm/tvm/python/tvm/_ffi/_ctypes/function.py", line 206, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7fb25ce7ed51]
  [bt] (2) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(+0x95b917) [0x7fb25ceee917]
  [bt] (1) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(+0x95b64f) [0x7fb25ceee64f]
  [bt] (0) /host/docker/matmul_tvm/tune_rocm/tvm/build/libtvm.so(+0x1470a3) [0x7fb25c6da0a3]
  File "/host/docker/matmul_tvm/tune_rocm/tvm/src/runtime/rocm/rocm_module.cc", line 98
  File "/host/docker/matmul_tvm/tune_rocm/tvm/src/runtime/module_util.cc", line 73
ROCMError: Check failed: ret == 0 (-1 vs. 0) : hipModuleGetFunction myadd_kernel0 failed with error: hipErrorNotFound

ghostplant on 21 Apr 2019

Ok thanks. I will take a look

masahi on 21 Apr 2019

hi, @masahi @ghostplant

I have same problem, and I found the solution that works for me. So, I would sharing my experience on this. The hipErrorNotFound error was got when your environment is ROCm2.3 (not compatible with LLVM6 or 7) or use LLVM8+.

The reason is that LLVM uses -mattr=+code-object-v3 in default for HSACO generation, but the ROCm hipModuleLoad function which only accepts a HSACO with code object v2.

It means you need to pass -mattr=-code-object-v3 as the option explicitly. Note the value given to -mattr, the plus sign + be replaced to minus sign -.

fundamat on 24 Apr 2019

👍1

@masahi My error is caused on LLVM 8+. It is interesting to know why TVM for ROCm has to use another LLVM, considering hcc from ROCm is already based on modified LLVM clang-9.

ghostplant on 24 Apr 2019

thanks @fundamat for the info, this is very helpful.

@ghostplant From rocm we only use their HIP runtime. For codegen all we need is LLVM's AMDGPU backend. The hcc compiler is for compling HIP code (cuda like language supported by AMD). We don't generate HIP, rather we generate LLVM IR directly, so we don't need hcc.

masahi on 24 Apr 2019

@masahi Thanks, hope to get it fixed soon for ROCm 2.3

ghostplant on 24 Apr 2019

hi @fundamat can you point me to the relevant documentation, so that I can understand your finding?

masahi on 24 Apr 2019

@masahi

You can get more information at here: https://llvm.org/docs/AMDGPUUsage.html#code-object-metadata

And I just simply add the -mattr=-code-object-v3 in BuildAMDGPU at codegen_amdgpu.cc:182

config << "-mtriple=amdgcn-amd-amdhsa-hcc -mcpu=gfx"
       << DetectROCMComputeVersion(target) << " -mattr=-code-object-v3 "
       << target.substr(4, target.length() - 4);

fundamat on 25 Apr 2019

👍1

Change target also works, no need to change codegen.

On Wed, Apr 24, 2019 at 19:21 fundamat notifications@github.com wrote:

@masahi https://github.com/masahi

You can get more information at here:
https://llvm.org/docs/AMDGPUUsage.html#code-object-metadata

And I just simply add the -mattr=-code-object-v3 in BuildAMDGPU at
codegen_amdgpu.cc:182

config << "-mtriple=amdgcn-amd-amdhsa-hcc -mcpu=gfx"
<< DetectROCMComputeVersion(target) << " -mattr=-code-object-v3 "
<< target.substr(4, target.length() - 4);

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/dmlc/tvm/issues/3058#issuecomment-486495030, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAJTLXUHKWTG4U2I2AMFVCLPSEIS7ANCNFSM4HHJ2GPA
.

>

Bing Xu

antinucleon on 25 Apr 2019

@fundamat After applying your patch, I tried the tuning again, and hipErrorNotFound are solved, but following with "Segment Fault" again as I explained earlier. (Also segfault for official tuturial example)

[08:32:40] /host/docker/matmul_tvm/tune_rocm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[08:32:40] /host/docker/matmul_tvm/tune_rocm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
Segmentation fault (core dumped)
/root $

I am using LLVM-8 (current lastest stable version), and ROCm-2.3, as LLVM-9 is not stable so I didn't use it.
Can you share your specific library version for LLVM and ROCm?

ghostplant on 25 Apr 2019

@ghostplant
I have no idea on "Segment Fault" problem, but I'd be happy to share my library version used:
LLVM-9 (manually build)
ROCm 2.3

fundamat on 26 Apr 2019

@fundamat I built the lastest LLVM-9 on ubuntu 18.04 but the makefile fail on half way based on gcc-7.

@msakai Is there a Dockerfile that contains TVM for ROCm environment which is reproducible? I only see it for cpu/cuda/opencl but rocm is not available.

ghostplant on 28 Apr 2019

No sorry, our ROCm support is limited (not many people are interested) and I'm not familiar with docker. You can add one if you like.

masahi on 28 Apr 2019

@masahi If I can build TVM for ROCm without problems, I can write one. However, the environment for TVM + ROCm is not working, so the purpose I ask for a Dockerfile for ROCm is just to get an environment that uses TVM + ROCm normally without failures.

ghostplant on 28 Apr 2019

I've tested ROCm 2.3 + LLVM 6.0 on my gfx803 card. vector add test and VGG inference work. But for some workload tvm crashes with

### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INVALID_ISA (0x100f) at file:mcwamp_hsa.cpp line:1195

In particular, resnet workload doesn't work. I'll try upgrading LLVM.

@fundamat What is your ld.lld version? Is tutorial/relay_quick_start.py working for you?

masahi on 29 Apr 2019

@ghostplant Does segfault still occur on vector add? Please try the simplest test first.

masahi on 29 Apr 2019

I've tested ROCm 2.3 + LLVM 6.0 on my gfx803 card. vector add test and VGG inference work. But for some workload tvm crashes with
### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INVALID_ISA (0x100f) at file:mcwamp_hsa.cpp line:1195
In particular, resnet workload doesn't work. I'll try upgrading LLVM.

@fundamat What is your ld.lld version? Is tutorial/relay_quick_start.py working for you?

I have same issue, VGG, standard ResNet are working, but AutoTVM is not able to search out good schedule because this error blocks most of candidates.

I am using LLVM 9 + ROCM 2.3

antinucleon on 29 Apr 2019

@masahi Yes, what I tested is the simplest one, you can try this Dockerfile on your host, which is segfault on my host:

FROM rocm/dev-ubuntu-18.04

ENV HIP_PLATFORM hcc
ENV PATH $PATH:/opt/rocm/bin:/usr/local/rocm/bin

RUN apt-get update && apt install -y --no-install-recommends git ca-certificates \
    python3-pip python3-wheel python3-setuptools python3-pytest python3-dev \
    vim less netcat-openbsd inetutils-ping curl patch iproute2 \
    g++ libpci3 libnuma-dev make file libelf-dev libboost-system-dev libboost-filesystem-dev \
    miopen-hip miopengemm hipblas rocrand hipsparse hip-thrust cxlactivitylogger rocfft rocprofiler-dev llvm-7-dev cmake \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /root

RUN git clone https://github.com/dmlc/tvm --branch master --depth 1 --single-branch --recursive && \
    cd tvm && git submodule init && git submodule update

RUN mkdir -p tvm/build && cd tvm/build && cp ../cmake/config.cmake . && \
    sed -i 's/ROCM OFF/ROCM ON/g' config.cmake && \
    sed -i 's/LLVM OFF/LLVM ON/g' config.cmake && \
    cmake .. && make -j16

RUN pip3 install numpy tornado psutil xgboost numpy decorator attrs

ENV TVM_HOME=/root/tvm
ENV PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python

RUN curl -Ls https://github.com/dmlc/tvm/raw/master/tutorials/tensor_expr_get_started.py | sed 's/cuda/rocm/g' > tensor_expr_get_started.py

To build and run:

# [DOCKER BUILD]
docker build -t tvm-rocm --network=host .
# [RUN TEST]
docker run -it --privileged --network=host tvm-rocm python3 tensor_expr_get_started.py

And this is my output:

/root/docker$ docker run -it --privileged --network=host tvm-rocm python3 tensor_expr_get_started.py
<class 'tvm.tensor.Tensor'>
Segmentation fault (core dumped)

ghostplant on 29 Apr 2019

It seems you need a newer LLVM. That error comes from LLVM, not TVM.

masahi on 29 Apr 2019

@masahi OK, I updated to 7, 8, 9 respectively (updated in my scripts), and gfx906 warning no longer exists, but they all outputs Segmentation fault (core dumped).

ghostplant on 29 Apr 2019

@masahi I rebuilt llvm-9 from source and segfault is solved, so it is an issue that installing llvm-9 from binary will cause segfault.
After that, I tried tuning applications again, and a very small group of applications can run successfully, while the majority will output various python-level TVM exceptions, while I also tested all same cases run well using cuda / opencl.

ghostplant on 29 Apr 2019

@masahi @anijain2305 I have another question. Seems like source code generation after tuning is not working for ROCm platform, as TVM codegen using other platform like CUDA, OpenCL can output respective kernel source code, but TVM codegen for ROCm platform just outputs nothing..

ghostplant on 30 Apr 2019

you can use get_source("llvm") or get_source("asm").

masahi on 30 Apr 2019

@masahi OK, but how can I compile them after getting the llvm/asm-level source code?

ghostplant on 30 Apr 2019

@masahi For asm, I think I can follow https://gpuopen.com/amdgcn-assembly/, but how about codes from llvm format?

ghostplant on 30 Apr 2019

The output of get_source is not meant to be combined by other means. What do you want to do?

masahi on 30 Apr 2019

@masahi I want to tune an op with specific config, get the best-working kernel source code, and freeze it as a hipModule.

ghostplant on 30 Apr 2019

given that this question has evolved to more general discussions, I would recommend starting a conversation on https://discuss.tvm.ai/ :)

tqchen on 1 May 2019

@masahi Do you know why CUDA/OpenCL platform can get C source, but ROCm platform can't?

ghostplant on 3 May 2019

Because CUDA and OpenCL backends generate kernel source as C strings, while ROCm backends generates LLVM IR

masahi on 3 May 2019

@antinucleon Actually, most of my tuning results in this failure but not happens in CUDA backend:

error: local memory limit exceeded (73728) in default_function_kernel0

Do you know the solution to this error?

ghostplant on 4 May 2019

This is not clean, but you can modify this block for rocm target.

 if 'cuda' in self.task.target.keys or 'opencl' in self.task.target.keys:
            remote = request_remote(self.key, self.host, self.port)
            ctx = remote.context(str(self.task.target), 0)
            max_dims = ctx.max_thread_dimensions
            kwargs['check_gpu'] = {
                'max_shared_memory_per_block': ctx.max_shared_memory_per_block,
                'max_threads_per_block': ctx.max_threads_per_block,
                'max_thread_x': max_dims[0],
                'max_thread_y': max_dims[1],
                'max_thread_z': max_dims[2],
            }

For rocm, max_shared_memory_per_block should be 48KB, and max threads per block should be 256. Don't forget to add "if 'rocm' in self.task.target.keys".

masahi on 5 May 2019

👍1

@masahi Do you know whether this warning matters for tuning?

DEBUG:autotvm:No: 1000  GFLOPS: 0.00/0.00       result: MeasureResult(costs=(InstantiationError('Traceback (most recent call last):\n  [bt] (1) /opt/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7fc0db08f915]\n  [bt] (0) /opt/tvm/build/libtvm
.so(+0x8f2e8b) [0x7fc0db08be8b]\n  File "/opt/tvm/python/tvm/_ffi/_ctypes/function.py", line 71, in cfun\n    rv = local_pyfunc(*pyargs)\n  File "/opt/tvm/python/tvm/autotvm/measure/measure_methods.py", line 595, in verify_pass\n    rais
e InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel',),), error_no=1, all_cost=0.02058720588684082, timestamp=1557040253.856357)    [('tile_k', [
16, 2, 64]), ('tile_y', [1, 2, 32, 32]), ('tile_x', [1, 1024, 1, 2]), ('auto_unroll_max_step', 1500)],,None,12857110
Too many errors happen in the tuning. Now is in debug mode
WARNING:autotvm:Too many errors happen in the tuning. Now is in debug mode

ghostplant on 5 May 2019

If you specify constraints on shared mem size, number of threads, etc, verify_pass would reject kernel configurations that do not satisfied that constraints (invalid configurations are still considered because of random nature of autotvm). I think the warning you got is the result of rejection.

As long as autotvm visits configuration that satisfy your constraints, autotvm should be able to make progress.

I also recommend making 'auto_unroll_max_step' smaller. 1500 is too big and causes a pressure on register usage. 125-256 should be good.

masahi on 5 May 2019

@masahi Thanks, I updated this parameter, though I didn't find the tuning result get better.

ghostplant on 5 May 2019

@tqchen I cannot sign up Tvm discuss successfully, my email server is qq.com and I cannot receive the activation email, also no new emails from junk category.

ghostplant on 21 May 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[RELEASE][DRAFT] TVM v0.6 Release candidate

tqchen · 38Comments

[DISCUSS][RFC] Apache TVM Graduation

tqchen · 25Comments

[RFC] Relay IR Text Format

joshpoll · 28Comments

[RFC][WIP] Tensor Expression level automatic differentiation

sgrechanik-h · 25Comments

[RELAY][OP] Relay Operator Sprint

tqchen · 52Comments