Maskrcnn-benchmark: TypeError: function takes exactly 5 arguments (1 given)

Created on 8 Aug 2019 · 6Comments · Source: facebookresearch/maskrcnn-benchmark

when i train my dataset, i got this error

Traceback (most recent call last):
  File "train_net.py", line 192, in <module>
    main()
  File "train_net.py", line 185, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "train_net.py", line 86, in train
    arguments,
  File "/root/maskrcnn-benchmark-sync/tools/../maskrcnn_benchmark/engine/trainer.py", line 68, in do_train
    for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 623, in __next__
    return self._process_next_batch(batch)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: function takes exactly 5 arguments (1 given)

i think there is some error in dataset, i have checked my dataset , for example label,box valid..., but nothing help.
so i trace the code to find the reason, i trace here, line 137 at dataloader.py

try:
                samples = collate_fn([dataset[i] for i in batch_indices])
            except Exception:
                # It is important that we don't store exc_info in a variable,
                # see NOTE [ Python Traceback Reference Cycle Problem ]
                data_queue.put((idx, ExceptionWrapper(sys.exc_info())))
            else:
                data_queue.put((idx, samples))
                del samples

i think problem is in BatchCollator, but i can not find any function need 5 arguments.
one more thing, i train with batchsize 1 on one gpu look fine.

Source

kakaluote

Most helpful comment

solved. set num_workers to 0, useful information appeared. PIL image read crashed by non ascii char, some image filename has bad string encode.

kakaluote on 12 Aug 2019

👍9 🚀2 👀1 🎉1

All 6 comments

I was not able to find these lines in PyTorch v1.1+.

Please, provide your environment setup with collect_env_info from https://github.com/facebookresearch/maskrcnn-benchmark/blob/24c8c90efdb7cc51381af5ce0205b23567c3cd21/maskrcnn_benchmark/utils/collect_env.py#L11

Note this dependency from the installation guide
PyTorch 1.0 from a nightly release. It will not work with 1.0 nor 1.0.1.

Dorozhko-Anton on 9 Aug 2019

i'm using pytorch1.0.1,i dont know how to install pytorch 1.0 nightly, i use pip to install pytorch1.0

pip install torch==1.0.1 -f https://download.pytorch.org/whl/cu92/stable

the code and env works fine on my other datasets, i only change the datasets, so the problem should be in the datasets, but i can not check all images one by one.

PyTorch version: 1.0.1
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti

Nvidia driver version: 384.111
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/lib64/libcudnn.so.5.1.10
/usr/local/cuda-8.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] torch                     1.0.1                    pypi_0    pypi
[conda] torchvision               0.2.2                      py_3    pytorch

kakaluote on 10 Aug 2019

solved. set num_workers to 0, useful information appeared. PIL image read crashed by non ascii char, some image filename has bad string encode.

kakaluote on 12 Aug 2019

👍9 🚀2 👀1 🎉1

solved. set num_workers to 0, useful information appeared. PIL image read crashed by non ascii char, some image filename has bad string encode.

Thank you, it helps!

NarcissusInMirror on 10 May 2020

I'm still getting this error. I renamed all of my images to 1, 2, 3 and now, with n_cpu=0, this error appears:

RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPUTensorId, CUDATensorId, QuantizedCPUTensorId, VariableTensorId]

So no more crashing by the names, however, when I remove the n_cpu parameter, this still appears.