Detectron2: Problem with register_coco_instances while registering a COCO dataset

Created on 6 Nov 2019  Â·  15Comments  Â·  Source: facebookresearch/detectron2

Hi, I am following this getting started Colab notebook. I am trying to train a custom model using the TACO dataset which comes as a COCO-formatted dataset.

I prepared this Colab notebook for doing the experiments with the dataset. After I registered the dataset using register_coco_instances I am not able to start the training process and the error I get looks like so:

KeyError                                  Traceback (most recent call last)
/content/detectron2_repo/detectron2/data/catalog.py in get(name)
     51         try:
---> 52             f = DatasetCatalog._REGISTERED[name]
     53         except KeyError:

KeyError: 'd'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
6 frames
/content/detectron2_repo/detectron2/data/catalog.py in get(name)
     54             raise KeyError(
     55                 "Dataset '{}' is not registered! Available datasets are: {}".format(
---> 56                     name, ", ".join(DatasetCatalog._REGISTERED.keys())
     57                 )
     58             )

KeyError: "Dataset 'd' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, my_dataset, taco_dataset"

The above-mentioned notebook can be used to reproduce the issue.

Most helpful comment

@zsc1220 it seems the dataset is not registered when you use multiple GPUs. Where do you register the dataset? In train_net you might need to register it in the main() function.

All 15 comments

cfg.DATASETS.TRAIN should be a tuple of strings however yours is a string.

I face the same error and according to your tutorial it should be a string (path)
Here are lines from your code:

cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("balloon/train",)
cfg.DATASETS.TEST = ()   # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"  # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon)

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

No. If you run it with python, ("balloon/train",) is not a string. It's a tuple.

oh sorry I have missed the , after the string!

@ppwwyyxx I updated to a tuple but still it does not help:

KeyError                                  Traceback (most recent call last)
/content/detectron2_repo/detectron2/data/catalog.py in get(name)
     51         try:
---> 52             f = DatasetCatalog._REGISTERED[name]
     53         except KeyError:

KeyError: 'data/'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
6 frames
/content/detectron2_repo/detectron2/data/catalog.py in get(name)
     54             raise KeyError(
     55                 "Dataset '{}' is not registered! Available datasets are: {}".format(
---> 56                     name, ", ".join(DatasetCatalog._REGISTERED.keys())
     57                 )
     58             )

KeyError: "Dataset 'data/' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, taco_dataset"

The Colab notebook's been updated.

cfg.DATASETS.TRAIN should contain names of your dataset as you register them. Not the directory.

Thanks for your help throughout @ppwwyyxx. I was able to get the model to train:

[11/10 09:26:37 d2.engine.train_loop]: Starting training from iteration 0
[11/10 09:27:05 d2.utils.events]: eta: 0:06:23  iter: 19  total_loss: 5.656  loss_cls: 4.131  loss_box_reg: 0.188  loss_mask: 0.693  loss_rpn_cls: 0.531  loss_rpn_loc: 0.048  time: 1.3740  data_time: 0.0053  lr: 0.000005  max_mem: 2446M
[11/10 09:27:32 d2.utils.events]: eta: 0:05:45  iter: 39  total_loss: 5.314  loss_cls: 3.944  loss_box_reg: 0.308  loss_mask: 0.692  loss_rpn_cls: 0.208  loss_rpn_loc: 0.032  time: 1.3528  data_time: 0.0047  lr: 0.000010  max_mem: 2446M
[11/10 09:27:59 d2.utils.events]: eta: 0:05:21  iter: 59  total_loss: 5.189  loss_cls: 3.548  loss_box_reg: 0.281  loss_mask: 0.690  loss_rpn_cls: 0.457  loss_rpn_loc: 0.047  time: 1.3535  data_time: 0.0049  lr: 0.000015  max_mem: 2446M
[11/10 09:28:25 d2.utils.events]: eta: 0:04:56  iter: 79  total_loss: 4.186  loss_cls: 2.773  loss_box_reg: 0.151  loss_mask: 0.687  loss_rpn_cls: 0.318  loss_rpn_loc: 0.028  time: 1.3474  data_time: 0.0047  lr: 0.000020  max_mem: 2446M
[11/10 09:28:52 d2.utils.events]: eta: 0:04:30  iter: 99  total_loss: 3.981  loss_cls: 2.038  loss_box_reg: 0.327  loss_mask: 0.686  loss_rpn_cls: 0.427  loss_rpn_loc: 0.053  time: 1.3479  data_time: 0.0043  lr: 0.000025  max_mem: 2471M
[11/10 09:29:21 d2.utils.events]: eta: 0:04:04  iter: 119  total_loss: 2.759  loss_cls: 1.108  loss_box_reg: 0.179  loss_mask: 0.684  loss_rpn_cls: 0.427  loss_rpn_loc: 0.050  time: 1.3643  data_time: 0.0051  lr: 0.000030  max_mem: 2552M
[11/10 09:29:48 d2.utils.events]: eta: 0:03:37  iter: 139  total_loss: 2.177  loss_cls: 0.762  loss_box_reg: 0.128  loss_mask: 0.678  loss_rpn_cls: 0.422  loss_rpn_loc: 0.057  time: 1.3598  data_time: 0.0047  lr: 0.000035  max_mem: 2552M
[11/10 09:30:14 d2.utils.events]: eta: 0:03:09  iter: 159  total_loss: 2.534  loss_cls: 0.803  loss_box_reg: 0.077  loss_mask: 0.670  loss_rpn_cls: 0.375  loss_rpn_loc: 0.076  time: 1.3544  data_time: 0.0045  lr: 0.000040  max_mem: 2552M
[11/10 09:30:42 d2.utils.events]: eta: 0:02:42  iter: 179  total_loss: 1.567  loss_cls: 0.507  loss_box_reg: 0.053  loss_mask: 0.656  loss_rpn_cls: 0.212  loss_rpn_loc: 0.043  time: 1.3572  data_time: 0.0047  lr: 0.000045  max_mem: 2552M
[11/10 09:31:09 d2.utils.events]: eta: 0:02:15  iter: 199  total_loss: 1.537  loss_cls: 0.516  loss_box_reg: 0.068  loss_mask: 0.658  loss_rpn_cls: 0.229  loss_rpn_loc: 0.043  time: 1.3580  data_time: 0.0045  lr: 0.000050  max_mem: 2552M
[11/10 09:31:37 d2.utils.events]: eta: 0:01:49  iter: 219  total_loss: 1.717  loss_cls: 0.639  loss_box_reg: 0.008  loss_mask: 0.653  loss_rpn_cls: 0.169  loss_rpn_loc: 0.022  time: 1.3586  data_time: 0.0048  lr: 0.000055  max_mem: 2552M
[11/10 09:32:04 d2.utils.events]: eta: 0:01:22  iter: 239  total_loss: 1.438  loss_cls: 0.479  loss_box_reg: 0.024  loss_mask: 0.632  loss_rpn_cls: 0.168  loss_rpn_loc: 0.043  time: 1.3592  data_time: 0.0044  lr: 0.000060  max_mem: 2552M
[11/10 09:32:31 d2.utils.events]: eta: 0:00:55  iter: 259  total_loss: 2.169  loss_cls: 0.794  loss_box_reg: 0.052  loss_mask: 0.626  loss_rpn_cls: 0.350  loss_rpn_loc: 0.093  time: 1.3583  data_time: 0.0043  lr: 0.000065  max_mem: 2552M
[11/10 09:32:59 d2.utils.events]: eta: 0:00:28  iter: 279  total_loss: 1.572  loss_cls: 0.559  loss_box_reg: 0.047  loss_mask: 0.605  loss_rpn_cls: 0.213  loss_rpn_loc: 0.037  time: 1.3609  data_time: 0.0043  lr: 0.000070  max_mem: 2552M
[11/10 09:33:26 d2.utils.events]: eta: 0:00:01  iter: 299  total_loss: 1.832  loss_cls: 0.683  loss_box_reg: 0.170  loss_mask: 0.570  loss_rpn_cls: 0.196  loss_rpn_loc: 0.041  time: 1.3593  data_time: 0.0043  lr: 0.000075  max_mem: 2552M
[11/10 09:33:27 d2.engine.hooks]: Overall training speed: 297 iterations in 0:06:45 (1.3639 s / it)
[11/10 09:33:27 d2.engine.hooks]: Total training time: 0:06:46 (0:00:01 on hooks)
OrderedDict()

But I am still confused about why the model does not infer anything. I have updated the Colab notebook with minimal code to reproduce the issue. I have also updated the notebook with TensorBoard.

You most likely need to train longer.
As the issue template says, we do not answer questions about how to train a better model.

Cool. Thanks.

Hi @ppwwyyxx
When I use 1 gpu to train ,it can be run well ,such as :
python tools/train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

But if i use 4 gpu to train , such as :
python tools/train_net.py --num-gpus 4 --config-file configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml
It has the following error , its really very strange !

`
Traceback (most recent call last):
File "tools/train_net.py", line 166, in
args=(args,),
File "/home/zsc/pythoncode/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/home/zsc/anaconda3/envs/car/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/zsc/anaconda3/envs/car/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/zsc/pythoncode/detectron2/detectron2/data/catalog.py", line 52, in get
f = DatasetCatalog._REGISTERED[name]
KeyError: 'car'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/zsc/anaconda3/envs/car/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, args)
File "/home/zsc/pythoncode/detectron2/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(
args)
File "/home/zsc/pythoncode/detectron2/tools/train_net.py", line 144, in main
trainer = Trainer(cfg)
File "/home/zsc/pythoncode/detectron2/detectron2/engine/defaults.py", line 223, in __init__
data_loader = self.build_train_loader(cfg)
File "/home/zsc/pythoncode/detectron2/detectron2/engine/defaults.py", line 397, in build_train_loader
return build_detection_train_loader(cfg)
File "/home/zsc/pythoncode/detectron2/detectron2/data/build.py", line 327, in build_detection_train_loader
proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
File "/home/zsc/pythoncode/detectron2/detectron2/data/build.py", line 256, in get_detection_dataset_dicts
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
File "/home/zsc/pythoncode/detectron2/detectron2/data/build.py", line 256, in
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
File "/home/zsc/pythoncode/detectron2/detectron2/data/catalog.py", line 56, in get
name, ", ".join(DatasetCatalog._REGISTERED.keys())
KeyError: "Dataset 'car' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val"
`

@zsc1220 it seems the dataset is not registered when you use multiple GPUs. Where do you register the dataset? In train_net you might need to register it in the main() function.

Wow , well done , it works , Thanks a lot !!! @ppwwyyxx

cfg.DATASETS.TRAIN should contain names of your dataset as you register them. Not the directory.

sorry, what did you mean by this?

cfg.DATASETS.TRAIN should contain names of your dataset as you register them. Not the directory.

sorry, what did you mean by this?

If you name your dataset as "cool_dataset" and this dataset is located in "${HOME}/cool_dataset_is_here",

  • Good
cfg.DATASETS.TRAIN = ("cool_dataset", )
  • Bad
home = os.environ.get("HOME")
cfg.DATASETS.TRAIN = (os.path.join(home, "cool_dataset_is_here"), )

Thanks a lot, it helped

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jinfagang picture jinfagang  Â·  3Comments

DeepLakhani99 picture DeepLakhani99  Â·  4Comments

RomRoc picture RomRoc  Â·  4Comments

kl720 picture kl720  Â·  3Comments

joeythegod picture joeythegod  Â·  4Comments