I'm testing about jit in C++ ( libtorch )
My test cases
int main(int argc, const char* argv[]) {
auto& ops = torch::jit::getAllOperators();
std::cout << "torch jit operators\n";
for (auto& op: ops) {
auto& name = op->schema().name();
if (name.find("torchvision") != std::string::npos)
std::cout << "op : " << op->schema().name() << "\n";
}
std::cout << "\n";
return 0;
}
>>> torch jit operators
Expected (I changed some codes in vision)
>>> torch jit operators
op : torchvision::_cuda_version
op : torchvision::deform_conv2d
op : torchvision::ps_roi_pool
op : torchvision::ps_roi_align
op : torchvision::_new_empty_tensor_op
op : torchvision::roi_pool
op : torchvision::roi_align
op : torchvision::nms
In 1st case, seems no jit opeators registry on libtorch.
Is this only occurs my env? or some codes are wrong?
Because I checked expected output when I changed srcs.
Install command follows on reamdme
torch 1.5.0 and torchvision 0.6.0
@vincentqb
This is not pytorch problem.
It's about call the registry API when using C++ torchvision.
And the problem is registry parts not calling when the custom project includes Torchvision
So I moved registry parts to header, then the project is running registry codes when compile.
Could you check the jit operators in C++?
If I understand correctly: you have one version that works, and another that doesn't? Can you provide a complete minimal example to run with how you compile so we can reproduce?
cc @fmassa
@vincentqb
Basically I just install by cmake and running hello_world
And Now I'm trying to 0.6, but 0.5 has same problem
README.rstcmake -DCMAKE_PREFIX_PATH="/home/user/torch_build/libtorch150" -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_CUDA=on ..
make -j16
make install
user(t3) in ~/torch_build via env150
โ python3 collect_env.py
Collecting environment information...
PyTorch version: 1.5.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 8.3.0-16ubuntu3~16.04) 8.3.0
CMake version: version 3.17.0
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 418.87.01
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.4
Versions of relevant libraries:
[pip3] numpy==1.18.3
[pip3] torch==1.5.0+cu101
[pip3] torchvision==0.6.0+cu101
[conda] Could not collect
#include <iostream>
#include <torchvision/vision.h>
int main()
{
auto& ops = torch::jit::getAllOperators();
std::cout << "torch jit operators\n";
for (auto& op: ops) {
auto& name = op->schema().name();
if (name.find("torchvision") != std::string::npos)
std::cout << "op : " << op->schema().name() << "\n";
}
std::cout << "\n";
return 0;
}
And I running upper main function.
I expect the torch::jit::getAllOperators in main can find operators related with torchvision.
But In my env, that shows nothing.
So I moved the torch::RegisterOperators() parts move to header(.h) not .cpp
If do this, compiler directly read the registry parts.
And then, shows torchvision methods as my expected.
In my experience, I same got problem on my custom kernel when I using jit.
So I changed registry called directly, then I can running fine and can find my kernel in getAllOperators
Thanks for the report @zsef123 !
ccing @suo , as I believe this is currently an issue with the torchscript RegisterOperators (or our use of it), which makes the operators not readily available for the users, who need instead to re-register the ops again in user code.
@fmassa
So the vision lib purposes each user manually register ops in projects not automatically registered by include torch header.
thats right? I think I mis-understood that design
That's currently what's needed, but I don't think this is a desirable setup.
Maybe we just need to move the RegisterOperator calls from the C++ file into the headers in torchvision, so that they get registered automatically once the user includes it.
Let's keep this issue open until we figure out a better way of handling this
I think what's happening is that the linker is throwing out the static registration of the operators when linking in the torchvision lib, since the RegisterOperator object is never referenced from the main executable. Can you try linking in torchvision with -Wl,--whole-archive?
@suo Thanks to comments!!
I'm trying to like under codes.
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(hello-world)
find_package(TorchVision REQUIRED)
add_executable(hello-world main.cpp)
target_link_libraries(hello-world
-Wl,--whole-archive
TorchVision::TorchVision
)
But this errors occurs
[ 50%] Linking CXX executable hello-world
/usr/lib/gcc/x86_64-linux-gnu/8/libgcc.a(_muldi3.o): In function `__multi3':
(.text+0x0): multiple definition of `__multi3'
/usr/lib/gcc/x86_64-linux-gnu/8/libgcc.a(_muldi3.o):(.text+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/8/libgcc.a(_negdi2.o): In function `__negti2':
(.text+0x0): multiple definition of `__negti2'
...
Is this my setup problem?
user(t3) in vision/examples/cpp/hello_world/build on ๎ master [!?] via env150 took 2s
โ cmake --version
cmake version 3.17.0
CMake suite maintained and supported by Kitware (kitware.com/cmake).
user(t3) in vision/examples/cpp/hello_world/build on ๎ master [!?] via env150
โ gcc --version
gcc (Ubuntu 8.3.0-16ubuntu3~16.04) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
whole-archive is not the right flag. The real problem is because your executable isn't using any symbols from torchvision, the linker is pruning it away. You can verify that this is the case by running ldd on the resulting executable; you will see that there is a dependency on libtorch.so but not on torchvision. If you specify
target_link_libraries(hello-world
-Wl,--no-as-needed
TorchVision::TorchVision
)
This will force the linker to include.
@fmassa we should consider updating the TorchVision cmake target to automatically specify --no-as-needed, this will help prevent this common failure mode.
@bmanga could you look into adding --no-as-needed in the torchvision CMakeLists?
@fmassa May I suggest to use an inline variable or a function to register the operators instead?
I can give that a try first, so we can avoid hacks in the cmake scripts.
@ezyang what do you think?
I think it might be preferable to avoid having to ask users to register_torchvision_ops() in their C++ files.
@fmassa Inline variables should work just like the current approach. Officially they are C++17, but we can get essentially identical functionality using __attribute__((weak)).
I personally don't mind a RegisterOps function. I think there might be users of torchvision that don't need them, and for those who do it's just a function call in their main.
Either approach is IMO better than using linker hacks, otherwise we end up with a mess like pytorch's cmake scripts. As a bonus, we can even remove the hardcoded requirement that the lib must be built as a shared library.
EDIT: I think __attribute__((constructor)) is exactly what we need here. I will try to create a PR tomorrow.
@bmanga I don't think __attribute__((constructor)) will work? It will put things in the ctors section of your library, which is the same thing that static initializers do, and the linker will prune it away if you literally don't depend on any symbols.
There's nothing wrong with having a register operators function. It doesn't do any harm, and it will save you from having to do --no-as-needed. But, for better or worse, PyTorch as a library relies a LOT on static initializers, and I think it's better to fall in line with the prevailing convention as a downstream library (torchvision). So I am advocating that for the default behavior, we should pass in the right linker flags. It is not too much cmake.
cc'ing @malfet too
@ezyang true, late night thoughts should be left for the morning ๐
I think it is a bit unreasonable to expect the operators to be registered just by linking the library without even including a single header. Is that really a common case?
What about the following?
vision.hset_target_properties(exe PROPERTIES LINK_WHAT_YOU_USE TRUE) to the torchvision consumer target.If you think that is still not acceptable, I will copy the code from caffe2_interface_library macro.
Hi @bmanga
Thanks for the discussion. After seeing the discussion here, I think it might be preferable to follow PyTorch's convention.
@bmanga can you clarify why we would need to copy the code from caffe2_interface_library, instead of just passing --no-as-needed?
@fmassa because if we really want this behavior
"-Wl,--no-as-needed,libtorchvision.so,-Wl,--as-needed"caffe2_interface_library mostly solves these problems, although it is not completely correct either because it assumes that the user target has --no-as-needed enabled by default, which may not be the case and be a source of other issues.
I agree that writing the relevant cmake code portably is not so easy. I am a bit curious how the #include trick would work; since I feel like you still have to use the symbols somehow, just externing this isn't enough. (Maybe if the header sets some weak symbol)?
@ezyang yes, that's what I had in mind when I mentioned inline variables.
In the vision.h header, we could have something like:
#if (defined __cpp_inline_variables) || __cplusplus >= 201703L
#define VISION_INLINE_VARIABLE inline
#else
#ifdef _MSC_VER
#define VISION_INLINE_VARIABLE __declspec(selectany)
#else
#define VISION_INLINE_VARIABLE __attribute__((weak))
#endif
#endif
namespace vision {
namespace impl {
int RegisterOps() noexcept;
VISION_INLINE_VARIABLE int dummy = RegisterOps();
}
}
And that should be enough to ensure that the library is loaded (RegisterOps is defined in a cpp file), and that the code is run only once.
It feels a bit dirty doing this, but I'm willing to give it a try.