Detectron2: The checkpoint contains parameters not used by the model

Created on 8 Feb 2020 · 11Comments · Source: facebookresearch/detectron2

Trying to train by using TridentNet on custom dataset . The config is the following which i used,

`from projects.TridentNet.tridentnet import add_tridentnet_config

cfg = get_cfg()
add_tridentnet_config(cfg)
cfg.merge_from_file(project_root+"/projects/TridentNet/configs/tridentnet_fast_R_50_C4_3x.yaml")
cfg.DATASETS.TRAIN = ("train", )
cfg.OUTPUT_DIR = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.DATASETS.TEST = ("val", )
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.MAX_ITER = 200000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.TEST.EVAL_PERIOD = 200
cfg.SOLVER.CHECKPOINT_PERIOD = 600
cfg.SOLVER.MOMENTUM = 0.87

from detectron2.modeling import build_model
from detectron2.checkpoint import DetectionCheckpointer

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()`

Got the following error

tridenNet_error

Source

Samjith888

Most helpful comment

Your issue is answered in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues already.
If you need help, please also include environment information following the issue template.

I have build the detectron with cuda 10.1 by using following command

for CUDA 10.1:

pip install detectron2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/index.html

Samjith888 on 10 Feb 2020

❤1 👍1

All 11 comments

You're loading an ImageNet pre-trained model (because that's what's written in the config file) and ImageNet pre-trained model contains classification layers that are not used by detection model. So it's expected.

ppwwyyxx on 8 Feb 2020

You're loading an ImageNet pre-trained model (because that's what's written in the config file) and ImageNet pre-trained model contains classification layers that are not used by detection model. So it's expected.

Which is the imagenet pretrained model for detection ?

Samjith888 on 9 Feb 2020

that's what's written in the config file

ppwwyyxx on 9 Feb 2020

You're loading an ImageNet pre-trained model (because that's what's written in the config file) and ImageNet pre-trained model contains classification layers that are not used by detection model. So it's expected.

Did u meant that 'cfg.MODEL.WEIGHTS = "detectron2://ImageNetPretrained/MSRA/R-50.pkl" contains classification layer ?How to solve this error ? will u show which part of the config file mentioned the classification layer?

Samjith888 on 10 Feb 2020

Yes.
It's expected, which means it's not an error.

ppwwyyxx on 10 Feb 2020

Yes.
It's expected, which means it's not an error.

But training failed in 0th iteration itself.. u can see it in the question

Samjith888 on 10 Feb 2020

Please provide full logs. I can't see what is the error in the screenshot

ppwwyyxx on 10 Feb 2020

Please provide full logs. I can't see what is the error in the screenshot

>   proposal_generator.anchor_generator.cell_anchors.0
  proposal_generator.rpn_head.anchor_deltas.{bias, weight}
  proposal_generator.rpn_head.conv.{bias, weight}
  proposal_generator.rpn_head.objectness_logits.{bias, weight}
  roi_heads.box_predictor.bbox_pred.{bias, weight}
  roi_heads.box_predictor.cls_score.{bias, weight}
[02/10 15:36:04 d2.checkpoint.c2_model_loading]: The checkpoint contains parameters not used by the model:
  fc1000_b
  fc1000_w
  conv1_b
[02/10 15:36:04 d2.engine.train_loop]: Starting training from iteration 0
[02/10 15:36:04 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
Registering val image 
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9958/9958 [01:20<00:00, 124.41it/s]
9958 Images registered successfully.
[02/10 15:37:25 d2.data.build]: Distribution of instances among all 1 categories:
|  category  | #instances   |
|:----------:|:-------------|
|   person   | 57016        |
|            |              |
WARNING [02/10 15:37:25 d2.engine.defaults]: No evaluator found. Use `DefaultTrainer.test(evaluators=)`, or implement its `build_evaluator` method.
Traceback (most recent call last):
  File "tridentnet_custom_train.py", line 96, in <module>
    trainer.train()
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/engine/defaults.py", line 373, in train
    super().train(self.start_iter, self.max_iter)
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/engine/train_loop.py", line 212, in run_step
    loss_dict = self.model(data)
  File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 129, in forward
    _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
  File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/projects/TridentNet/tridentnet/trident_rcnn.py", line 66, in forward
    pred_instances, losses = super().forward(images, features, proposals, all_targets)
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 392, in forward
    box_features = self._shared_roi_transform(
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 378, in _shared_roi_transform
    x = self.pooler(features, boxes)
  File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/modeling/poolers.py", line 215, in forward
    return self.level_poolers[0](x[0], pooler_fmt_boxes)
  File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/layers/roi_align.py", line 94, in forward
    return roi_align(
  File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/layers/roi_align.py", line 19, in forward
    output = _C.roi_align_forward(
RuntimeError: CUDA error: invalid device function (ROIAlign_forward_cuda at /mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f4ac3239627 in /opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xa24 (0x7f4aa8c8c770 in /mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/_C.cpython-38-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xb6 (0x7f4aa8c09fc6 in /mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/_C.cpython-38-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x543a9 (0x7f4aa8c1a3a9 in /mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/_C.cpython-38-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x5039e (0x7f4aa8c1639e in /mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/_C.cpython-38-x86_64-linux-gnu.so)
<omitting python frames> frame #10: THPFunction_apply(_object*, _object*) + 0xb2f (0x7f4af51c0d1f in /opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
[1]    11845 segmentation fault (core dumped)  python tridentnet_custom_train.py
(d2_train)

Samjith888 on 10 Feb 2020

Your issue is answered in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues already.
If you need help, please also include environment information following the issue template.

ppwwyyxx on 10 Feb 2020

❤1 👍1

Your issue is probably answered in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md already.
If you need help, please also include environment information following the issue template.

I have already trained retinanet model using detectron. I got the above error when i tried with other models
The output of 'python -m detectron2.utils.collect_env'

$ python -m detectron2.utils.collect_env

sys.platform linux
Python 3.8.1 (default, Jan 8 2020, 22:29:32) [GCC 7.3.0]
numpy 1.18.1
detectron2 0.1 @/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2
detectron2 compiler GCC 7.4
detectron2 CUDA compiler 10.0
detectron2 arch flags sm_61
DETECTRON2_ENV_MODULE
PyTorch 1.4.0 @/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torch
PyTorch debug build False
CUDA available True
GPU 0,1 GeForce GTX 1080
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.0, V10.0.130
Pillow 6.2.2
torchvision 0.5.0 @/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
cv2 4.2.0

PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

I have found that ,

detectron2 CUDA compiler 10.0
CUDA_HOME /usr/local/cuda
PyTorch built with:
- CUDA Runtime 10.1

Detectron2 CUDA compiler is 10.0 but pytorch build cuda is 10.1. Should i rebuild the detectron2 or should i install cuda 10.0 and rebuild pytorch with cuda 10.0?

Samjith888 on 10 Feb 2020

Your issue is answered in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues already.
If you need help, please also include environment information following the issue template.

I have build the detectron with cuda 10.1 by using following command