Addons: Enable custom-ops for tensorflow-cpu

Created on 31 Jan 2020  路  6Comments  路  Source: tensorflow/addons

Currently tensorflow-cpu will fail when trying to load custom ops for undefined symbol: __cudaPushCallConfiguration:

from tensorflow_addons.activations.gelu import gelu
File "/usr/local/lib/python3.7/site-packages/tensorflow_addons/activations/gelu.py", line 24, in <module>
get_path_to_datafile("custom_ops/activations/_activation_ops.so"))
File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 57, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.7/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so: undefined symbol: __cudaPushCallConfiguration

I'm not quite sure what is causing this without having done a deep dive, but linking this possibly related PR since this was a departure from standard TF linking:
https://github.com/tensorflow/addons/pull/539

bug build help wanted

Most helpful comment

The problem is - libtensorflow_framework.so.2 exports CUDA stubs they use to dynamically load CUDA runtime. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/cuda/cudart_stub.cc
However, tensorflow-cpu doesn't have these stubs!
A simple reordering of TFA linking to allow CUDA libraries be first seems to solve the problem.
Let me explain.
Here is an import table of _activation_ops.so:

root@cff0ec50c2b5:~/addons# objdump -T bazel-bin/tensorflow_addons/custom_ops/activations/_activation_ops.so | grep cuda
0000000000000000      DF *UND*  0000000000000000              __cudaPushCallConfiguration
0000000000000000      DF *UND*  0000000000000000              __cudaUnregisterFatBinary
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFatBinary
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFatBinaryEnd
0000000000000000      DF *UND*  0000000000000000              __cudaPopCallConfiguration
0000000000000000      DF *UND*  0000000000000000              cudaLaunchKernel
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFunction

Exports of libtensorflow_framework.so.2:

root@cff0ec50c2b5:~# objdump -T /tensorflow-2.1.0/python3.6/tensorflow_core/libtensorflow_framework.so.2 | grep __cuda
00000000014e79e0 g    DF .text  0000000000000143  Base        __cudaRegisterFunction
00000000014e78a0 g    DF .text  000000000000013b  Base        __cudaRegisterVar
00000000014e7390 g    DF .text  0000000000000091  Base        __cudaUnregisterFatBinary
00000000014e7610 g    DF .text  000000000000013f  Base        __cudaPopCallConfiguration
00000000014e7430 g    DF .text  0000000000000091  Base        __cudaRegisterFatBinaryEnd
00000000014e7750 g    DF .text  000000000000014f  Base        __cudaRegisterFatBinary
00000000014e74d0 g    DF .text  0000000000000134  Base        __cudaPushCallConfiguration

After a simple modification of https://github.com/tensorflow/addons/blob/master/tensorflow_addons/tensorflow_addons.bzl

root@cff0ec50c2b5:~/addons# objdump -T bazel-bin/tensorflow_addons/custom_ops/activations/_activation_ops.so | grep cuda

now returns nothing and _activation_ops.so grows in size.

Looks great but I've not tested how it works yet. :laughing:

All 6 comments

Custom ops should be separated to cpu shared files and gpu shared files.
Load specific shared files depends on tf.test.is_built_with_gpu_support()

Custom ops should be separated to cpu shared files and gpu shared files.
Load specific shared files depends on tf.test.is_built_with_gpu_support()

Given the dlopen() dynamic kernel strategy that TF uses this shouldn't be required:
https://github.com/tensorflow/community/blob/master/rfcs/20180604-dynamic-kernels.md

However, I agree that your suggestion is a possible solution if for some reason we're unable this fixed.

The problem is - libtensorflow_framework.so.2 exports CUDA stubs they use to dynamically load CUDA runtime. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/cuda/cudart_stub.cc
However, tensorflow-cpu doesn't have these stubs!
A simple reordering of TFA linking to allow CUDA libraries be first seems to solve the problem.
Let me explain.
Here is an import table of _activation_ops.so:

root@cff0ec50c2b5:~/addons# objdump -T bazel-bin/tensorflow_addons/custom_ops/activations/_activation_ops.so | grep cuda
0000000000000000      DF *UND*  0000000000000000              __cudaPushCallConfiguration
0000000000000000      DF *UND*  0000000000000000              __cudaUnregisterFatBinary
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFatBinary
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFatBinaryEnd
0000000000000000      DF *UND*  0000000000000000              __cudaPopCallConfiguration
0000000000000000      DF *UND*  0000000000000000              cudaLaunchKernel
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFunction

Exports of libtensorflow_framework.so.2:

root@cff0ec50c2b5:~# objdump -T /tensorflow-2.1.0/python3.6/tensorflow_core/libtensorflow_framework.so.2 | grep __cuda
00000000014e79e0 g    DF .text  0000000000000143  Base        __cudaRegisterFunction
00000000014e78a0 g    DF .text  000000000000013b  Base        __cudaRegisterVar
00000000014e7390 g    DF .text  0000000000000091  Base        __cudaUnregisterFatBinary
00000000014e7610 g    DF .text  000000000000013f  Base        __cudaPopCallConfiguration
00000000014e7430 g    DF .text  0000000000000091  Base        __cudaRegisterFatBinaryEnd
00000000014e7750 g    DF .text  000000000000014f  Base        __cudaRegisterFatBinary
00000000014e74d0 g    DF .text  0000000000000134  Base        __cudaPushCallConfiguration

After a simple modification of https://github.com/tensorflow/addons/blob/master/tensorflow_addons/tensorflow_addons.bzl

root@cff0ec50c2b5:~/addons# objdump -T bazel-bin/tensorflow_addons/custom_ops/activations/_activation_ops.so | grep cuda

now returns nothing and _activation_ops.so grows in size.

Looks great but I've not tested how it works yet. :laughing:

The problem is - libtensorflow_framework.so.2 exports CUDA stubs they use to dynamically load CUDA runtime. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/cuda/cudart_stub.cc
However, tensorflow-cpu doesn't have these stubs!
A simple reordering of TFA linking to allow CUDA libraries be first seems to solve the problem.
Let me explain.
Here is an import table of _activation_ops.so:

root@cff0ec50c2b5:~/addons# objdump -T bazel-bin/tensorflow_addons/custom_ops/activations/_activation_ops.so | grep cuda
0000000000000000      DF *UND*  0000000000000000              __cudaPushCallConfiguration
0000000000000000      DF *UND*  0000000000000000              __cudaUnregisterFatBinary
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFatBinary
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFatBinaryEnd
0000000000000000      DF *UND*  0000000000000000              __cudaPopCallConfiguration
0000000000000000      DF *UND*  0000000000000000              cudaLaunchKernel
0000000000000000      DF *UND*  0000000000000000              __cudaRegisterFunction

Exports of libtensorflow_framework.so.2:

root@cff0ec50c2b5:~# objdump -T /tensorflow-2.1.0/python3.6/tensorflow_core/libtensorflow_framework.so.2 | grep __cuda
00000000014e79e0 g    DF .text  0000000000000143  Base        __cudaRegisterFunction
00000000014e78a0 g    DF .text  000000000000013b  Base        __cudaRegisterVar
00000000014e7390 g    DF .text  0000000000000091  Base        __cudaUnregisterFatBinary
00000000014e7610 g    DF .text  000000000000013f  Base        __cudaPopCallConfiguration
00000000014e7430 g    DF .text  0000000000000091  Base        __cudaRegisterFatBinaryEnd
00000000014e7750 g    DF .text  000000000000014f  Base        __cudaRegisterFatBinary
00000000014e74d0 g    DF .text  0000000000000134  Base        __cudaPushCallConfiguration

After a simple modification of https://github.com/tensorflow/addons/blob/master/tensorflow_addons/tensorflow_addons.bzl

root@cff0ec50c2b5:~/addons# objdump -T bazel-bin/tensorflow_addons/custom_ops/activations/_activation_ops.so | grep cuda

now returns nothing and _activation_ops.so grows in size.

Looks great but I've not tested how it works yet. 馃槅

Awesome, @failure-to-thrive you are the expert on magical bugs. 馃槀

Hi, I'm running into the same issue: I'm trying to use Tensorflow cpu to reduce my docker file weight and now the import yields the same error:
````
activations/_activation_ops.so: undefined symbol: __cudaPushCallConfiguration
`````

My only usage of Addons is for AdamW during training and the gelu activation. I know gelu has been moved to TF core: i'll try to use the nightly version and stop using the custom ops from addons.

Thanks,

Building the addons from sources fixed it for me on TF 2.2 (installed with pip).
I followed the CPU custom ops instructions at:
https://github.com/tensorflow/addons/tree/master#cpu-custom-ops

Was this page helpful?
0 / 5 - 0 ratings