register_coco_instances("mis", {}, "./Set1/missouri_camera_traps_set1.json", "./")
It is in COCO camera traps dataset format, so I loaded that way. I got the following metadata class from "MetadataCatalog.get("mis")":
Metadata(evaluator_type='coco', image_root='./', json_file='./Set1/missouri_camera_traps_set1.json', name='mis', thing_classes=['empty', 'agouti', 'collared_peccary', 'paca', 'red_brocket_deer', 'white-nosed_coati', 'spiny_rat', 'ocelot', 'red_squirrel', 'common_opossum', 'bird_spec', 'great_tinamou', 'white_tailed_deer', 'mouflon', 'red_deer', 'roe_deer', 'wild_boar', 'red_fox', 'european_hare', 'wood_mouse', 'coiban_agouti'], thing_dataset_id_to_contiguous_id={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20})
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("mis",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl" # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 100 # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64 # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 21
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
WARNING [12/16 08:47:28 d2.config.compat]: Config './detectron2_repo/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-332-6eeee973ba83> in <module>()
22
23 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
---> 24 trainer = DefaultTrainer(cfg)
25 trainer.resume_or_load(resume=False)
26 trainer.train()
3 frames
/content/detectron2_repo/detectron2/modeling/meta_arch/rcnn.py in __init__(self, cfg)
40 assert len(cfg.MODEL.PIXEL_MEAN) == len(cfg.MODEL.PIXEL_STD)
41 num_channels = len(cfg.MODEL.PIXEL_MEAN)
---> 42 pixel_mean = torch.Tensor(cfg.MODEL.PIXEL_MEAN).to(self.device).view(num_channels, 1, 1)
43 pixel_std = torch.Tensor(cfg.MODEL.PIXEL_STD).to(self.device).view(num_channels, 1, 1)
44 self.normalizer = lambda x: (x - pixel_mean) / pixel_std
RuntimeError: CUDA error: device-side assert triggered
I am running on Google Colab
------------------------ --------------------------------------------------
sys.platform linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]
Numpy 1.17.4
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.0
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.1
PyTorch Debug Build False
torchvision 0.4.2
CUDA available True
GPU 0 Tesla K80
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.0, V10.0.130
Pillow 6.2.1
cv2 4.1.2
------------------------ --------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
Thank you for your time.
Your detectron2 is built with cuda 10.0:
Detectron2 CUDA Compiler 10.0
DETECTRON2_ENV_MODULE
PyTorch 1.3.1
PyTorch Debug Build False
torchvision 0.4.2
CUDA available True
GPU 0 Tesla K80
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.0, V10.0.130
But your pytorch is running with cuda 10.1:
- CUDA Runtime 10.1
You need to make them consistent.
Thank you for the clarification. Would you have any tip on how can I perform that change?
Hi, I'm having this same problem on Colab when trained with custom dataset, though my CUDA is consistent.
------------------------ ---------------------------------------------------------------
sys.platform linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]
numpy 1.17.5
detectron2 0.1 @/content/detectron2_repo/detectron2
detectron2 compiler GCC 7.4
detectron2 CUDA compiler 10.0
detectron2 arch flags sm_75
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.4.0+cu100 @/usr/local/lib/python3.6/dist-packages/torch
PyTorch debug build False
CUDA available True
GPU 0 Tesla T4
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.0, V10.0.130
Pillow 6.2.2
torchvision 0.5.0+cu100 @/usr/local/lib/python3.6/dist-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
cv2 4.1.2
------------------------ ---------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
Apparently I did something wrong with the data register process, as it works after I restarted Colab.
For me, it was because the wrong NUM_CLASSES in my config for a new dataset.
Most helpful comment
For me, it was because the wrong NUM_CLASSES in my config for a new dataset.