Is there any recommendation to train Faster-RCNN starting from the pretrained backbone? I'm using VOC 2007 dataset and I'm able to do transfer learning starting from:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
Using COCO pretrained 'fasterrcnn_resnet50_fpn' i'm able to obtain an mAP of 79% on VOC 2007 test set. Problems arise when i try to train from scratch using only the pretrained backbone:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
I have been trying to train this model for weeks but the highest mAP i got was 63% (again on test set).
Now, i know that training from scratch is harder, but i really would like to know how to set the training parameters to obtain a decent accuracy, in the future i may want to change the backbone and chances are that i will be not able to find a pretrained faster-rcnn on which i can do transfer learning.
I haven't tried training on Pascal from scratch.
But here is a tip: in Detectron, we generally train for a fixed number of iterations (in this case, 90000).
This corresponds to roughly 13 epochs for COCO, but for Pascal it represents many more epochs given that the training set size is smaller.
You might want to take that into account when setting the number of iterations / lr scheduler steps (i.e., train for 15x more epochs or so)
Given that this doesn't seem to be an issue with the current implementation, I'm closing the issue, but feel free to comment on this if you have further questions
I tried to train over 200 epochs, the loss keep decreasing (down to 0.01) but the mAP over Test and validation decreases over the time from 63% at the 20th epoch to 56% at the 200th. It just overfits the training set. I'm out of idea, it is like it is missing something really important (e.g augmentation).
@lpuglia have you tried using maskrcnn-benchmark, and if yes, how much accuracy did you get?
A word of caution, the evaluation code for Pascal 2007 in maskrcnn-benchmark is not necessarily 100% accurate, and might give higher numbers than the original VOCdevkit
@fmassa I have been using the python eval from a clone of Girshick repository:
https://github.com/jwyang/faster-rcnn.pytorch/blob/master/lib/datasets/voc_eval.py
which apparently is the same as:
https://github.com/facebookresearch/Detectron/blob/master/detectron/datasets/voc_eval.py
I know that it is not zero-diff accurate compared to original matlab implementation but 63% is much lower than 79%, also, it is quite improbable that I'm doing something wrong there since it works great when i do transfer learning and not so much when I train from scratch.
The problem extends also to other backbones, if I train from scratch with VGG16 as backbone, i never get better than 59%. This is weird since i can easily get 69% in 10 epochs using this code base.
As per my understanding the main difference between pytorch and this code is that they have a different starting point for the backbone, in the README you can read:
NOTE. We compare the pretrained models from Pytorch and Caffe, and surprisingly find Caffe pretrained models have slightly better performance than Pytorch pretrained. We would suggest to use Caffe pretrained models from the above link to reproduce our results.
If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.
Which may be an explanation for the low accuracy in pytorch, now I'm wondering what is the exact procedure used to obtain the weights of pytorch backbones, are they trained on ImageNet? what augmentation and normalization they use?
@lpuglia some more questions:
maskrcnn-benchmark has exactly the same implementation as Detectron, including the backbone from Caffe2 (apart from the evaluation code for pascal). It matches very closely Detectron on several experiments, while the implementation in torchvision was simplified (while basically matching maskrcnn-benchmark on COCO). If the results using maskrcnn-benchmark for pascal are better than with using the implementation in torchvision, it would be great to let me know, so that I can understand what factor is the main one for the difference.@fmassa
backbone = torchvision.models.vgg16(pretrained=True).features
backbone.out_channels = 512
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0], output_size=7,
sampling_ratio=2)
model = torchvision.models.detection.faster_rcnn.FasterRCNN(backbone,
num_classes, rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler)
This model trained from scratch using PASCAL 07 never reaches 60% accuracy.
Can you tell me which one in particular?
I'm gonna train maskrcnn-benchmark and see what accuracy I get
Oh, if you are not using FPN and are training on Pascal, then I might know (one of) the issues.
In the RPN, we used to discard anchors that go out of the boundaries of the image.
This was apparently important for Pascal dataset, when the model doesn't have FPN. But for COCO with models having FPN, this was totally unnecessary and in the sake of simplifying things, I just removed it.
Can you try adding the box_is_inside_image function from https://github.com/fmassa/maskrcnn-benchmark/commit/071f1e793e98e4abc69de933cac910e95bec8196#diff-b99b4d8eb481e525b2f9900f40def679L135
and add the following lines
# discard anchors that go out of the boundaries of the image
inds_inside = box_is_inside_image(anchors_per_image, image_size)
labels_per_image[~inds_inside] = -1
just before https://github.com/pytorch/vision/blob/bbd363ca2713fb68e1e190206578e600a87baf90/torchvision/models/detection/rpn.py#L289-L291
and let me know the results? I might want to add back those lines in the current version of the code
@fmassa how do i get the image_size there? I was checking the history of the file but that function doesn't seem to be ever used (except in the boxlist_is_inside_image version).
@lpuglia the image sizes can be obtained from image, as it carries this information https://github.com/pytorch/vision/blob/bbd363ca2713fb68e1e190206578e600a87baf90/torchvision/models/detection/rpn.py#L411
image.image_sizes
@fmassa Just to be clear I passed that argument to the function assign_targets_to_anchors and then i used it in the loop like:
for anchors_per_image, targets_per_image, image_size in zip(anchors, targets, image_sizes):
But the accuracy doesn't increase, no better than 60%.
@lpuglia so you added the following lines in the code and it didn't help?
# discard anchors that go out of the boundaries of the image
inds_inside = box_is_inside_image(anchors_per_image, image_size)
labels_per_image[~inds_inside] = -1
@fmassa indeed, i can't see any difference, since I'm already writing I can add that I tried with other backbones (like mobilenet_v2) with no success.
@lpuglia can you share your code on github? I won't have time to have a closer look before August, but having the code you are using might help me identify the issue
@fmassa Thank you, I will, do you think that with maskrcnn-benchmark i will get the correct mAP?
@lpuglia maskrcnn-benchmark follows exactly all the implementation details of Detectron, so it should reproduce whatever they have in Detectron.
Hey~ @lpuglia @fmassa
I encountered the same problem. What I want to do is using this implementation to reproduce the result reported in the paper of faster rcnn. using vgg16 backbone and trained on VOC2007.
I tried some times and got result no better than 60% neither.
Then I found out this implementation is different from original implementation.
So I thought maybe that's the problem. Next I changed the code a little bit:
vgg16 = torchvision.models.vgg16(pretrained=False)
state_dict = torch.load('vgg16_caffe.pth')
vgg16.load_state_dict({k: v for k, v in state_dict.items() if k in vgg16.state_dict()})
backbone = vgg16.features[:-1] # not using last maxpooling layer
# freeze top4 conv
for layer in backbone[:10]:
for p in layer.parameters():
p.requires_grad = False
class BoxHead(nn.Module):
"""
box head for vgg16 of faster rcnn backbone.
weights are loaded from vgg16_caffe.pth
change TwoMLHead to BoxHead.
"""
def __init__(self, vgg16):
super(BoxHead, self).__init__()
classifier = vgg16.classifier
classifier = list(classifier)
del classifier[6]
del classifier[5]
del classifier[2]
self.classifier = nn.Sequential(*classifier)
def forward(self, x):
x = x.flatten(start_dim=1)
x = self.classifier(x)
return x
box_predictor = FastRCNNPredictor(4096, 21) # 1024 -> 4096# modified! initial params
# https://github.com/chenyuntc/simple-faster-rcnn pytorch/blob/master///model/faster_rcnn_vgg16.py#L109
nn.init.normal_(self.cls_score.weight, std=0.01)
nn.init.constant_(self.cls_score.bias, 0)
nn.init.normal_(self.bbox_pred.weight, std=0.001)
nn.init.constant_(self.bbox_pred.bias, 0)
anchor_generator = AnchorGenerator(sizes=((128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=-1)
I made these changes and thought this will give me a competitive result before training...
truth is I'm too naive.
The best result I got is 61.1% lower than reported 69.9%.
My code is here: https://github.com/hktxt/Faster-RCNN
Most of them are copied from https://github.com/pytorch/vision/blob/master/torchvision/models/detection.
set dataset correctly, just run trainX.py
Oh, wait a bit.
I've just realized something.
If you are not using FPN, the resnet and the heads that should be completely different (the heads are much bigger than what the FPN-based model do).
There is most probably an issue with your backbone / head model definitions that are the root cause of the problem.
I have not added the C4 backbone because it is generally much larger and slower than the FPN-based version, while working worse.
If you want to reproduce results using the C4 backbone, for now it might just be simpler to use the implementation in maskrcnn-benchmark.
@fmassa I'm not sure what you mean, but this does not explain why transfer learning works much better than training from scratch even if i live it to train for 200 epoch (which was the original question of the topic)
@lpuglia yes, it explains it.
In fact, the head for the standard Faster R-CNN is the whole layer4 from resnet, or all the classifier from VGG16, which are pre-trained for classification already.
While the head from FPN-based models are initialized from scratch, and have only two MLPs
@fmassa maybe you are missing a thing:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
this model HAS FPN, the only thing missing is the weight initialization given by pretrained=True. How is it possible that training from scratch on PASCAL gives worst result compared to transfer learning?
We are mixing 2 topics in this issue:
box_head doesn't support non-FPN backbone (vgg16 accuracy lover than 60%).My initial question was about training from scratch vs transfer learning.
@lpuglia ok, got it.
Training from scratch giving worse performances is expected.
Indeed, the pre-trained models were trained on COCO, which have many more images than Pascal VOC, and the classes in Pascal are a subset of the classes in COCO.
BTW, most of the top performing methods in Pascal first pre-train in COCO and then fine-tune on Pascal.
@fmassa I found out what my main problem was, I was using the val set for validation only. However, to get good result on PASCAL VOC 2007 you are supposed to use trainval all together. Also, thanks to @hktxt comment I got 66% accuracy training from scratch (just 3% less than the expected). If anyone is intereseted here the highlights:
vgg = torchvision.models.vgg16(pretrained=True)
backbone = vgg.features[:-1]
for layer in backbone[:10]:
for p in layer.parameters():
p.requires_grad = False
backbone.out_channels = 512
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
class BoxHead(nn.Module):
def __init__(self, vgg):
super(BoxHead, self).__init__()
self.classifier = nn.Sequential(*list(vgg.classifier._modules.values())[:-1])
def forward(self, x):
x = x.flatten(start_dim=1)
x = self.classifier(x)
return x
box_head = BoxHead(vgg)
model = torchvision.models.detection.faster_rcnn.FasterRCNN(
backbone, #num_classes,
rpn_anchor_generator = anchor_generator,
box_roi_pool = roi_pooler,
box_head = box_head,
box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(4096, num_classes=21))
dataset = VOCDetection(img_folder=root, year='2007', image_set='trainval', transforms=transforms)
The only aumentation i used was RandomHorizontalFlip.
--epochs 40
--lr-steps 30
--momentum 0.9
--lr-gamma 0.1
@lpuglia awesome, thanks for letting me know!
Also, did you add the visibility checks for the anchors in the RPN, as I mentioned in https://github.com/pytorch/vision/issues/1116#issuecomment-512731349 ?
This could maybe give the few remaining points left
@fmassa It was enabled the whole time, I don't know how much did it influenced the training, I'm gonna repeat the test commenting it out and let you know (my guess is that it doesn't change much)
oh, that's a good news.
I'll try your training strategy. seems like most parts are the same, except the AnchorGenerator and dropout in vgg backbone.
@hktxt what dropout are you referring to in particular? (by the way i'm using a batch size of 4)
@fmassa removing the visibility check decreases the accuracy from 66 to 64%
@lpuglia thanks for the info! Very helpful! I might include the visibility check again, as this gives a few more points on Pascal without FPN
The BoxHead, if you print the net you'll see the dropout layer, however, it was droped in mine.
bs=4? I use bs=1 for trianing, got ~60%... no improvements....
@hktxt did you tried to add the visibility checks? how many images do you feed during the training?
@fmassa I'm still working on it and i can see that using caffe pretrained model gives another 1% on the final accuracy. I will try to close the gap more, did you removed anything else beside the visibility check?
@lpuglia there are some other minor changes. Here is what I remember now.
This can affect the results negatively for AP@50, but improves results for higher thresholds.
We are in here using l1_loss in the RPN https://github.com/pytorch/vision/blob/2287c8f2dc9dcad955318cc022cabe4d53051f65/torchvision/models/detection/rpn.py#L368-L372
while in Detectron we use smooth_l1_loss with beta parameter of 1 / 9
I removed some custom inits in the heads in https://github.com/fmassa/maskrcnn-benchmark/commit/3e0e12a652331eeff87777e9a3be81a939817141
This didn't change performance at all in my experiments on COCO, but maybe this could change something for Pascal? Not sure
I use trainval set which contains 5011 images for training, and test set contains 4952 images.
I also used RandomHorizontalFlip(0.5) for data augmentation.
@fmassa I tried them both, the first actually decrease the accuracy for some reasons, the second makes no difference. I will train from scratch using COCO and then use transfer learning to see if i can get 70% on Pascal, thanks for the help!
@hktxt my advice is to make sure to have the visibility checks enabled and use the following class for conversion:
class ConvertVOCtoCOCO(object):
CLASSES = (
"__background__", "aeroplane", "bicycle",
"bird", "boat", "bottle", "bus", "car",
"cat", "chair", "cow", "diningtable", "dog",
"horse", "motorbike", "person", "pottedplant",
"sheep", "sofa", "train", "tvmonitor",
)
def __call__(self, image, target):
# return image, target
anno = target['annotations']
filename = anno["filename"].split('.')[0]
h, w = anno['size']['height'], anno['size']['width']
boxes = []
classes = []
objects = anno['object']
if not isinstance(objects, list):
objects = [objects]
for obj in objects:
bbox = obj['bndbox']
bbox = [int(bbox[n]) - 1 for n in ['xmin', 'ymin', 'xmax', 'ymax']]
boxes.append(bbox)
classes.append(self.CLASSES.index(obj['name']))
boxes = torch.as_tensor(boxes, dtype=torch.float32)
classes = torch.as_tensor(classes)
image_id = anno['filename'][:-4]
image_id = torch.as_tensor([int(image_id)])
target = {}
target["boxes"] = boxes
target["labels"] = classes
target['name'] = image_id #convert filename in int8
return image, target
Also (I don't know if this is useful yet) but make sure to have a 10022 image dataset flipping all the images. This is different from random flipping because you make sure that every image is shown to the network twice in different orientation per epoch. If you use this strategy you will need just 15 epoch to train the netwrork. Here is my code:
class VOCDetection_flip(torchvision.datasets.VOCDetection):
def __init__(self, img_folder, year, image_set, transforms):
super(VOCDetection_flip, self).__init__(img_folder, year, image_set)
self._transforms = transforms
def __getitem__(self, idx):
real_idx = idx//2
img, target = super(VOCDetection_flip, self).__getitem__(real_idx)
target = dict(image_id=real_idx, annotations=target['annotation'])
if self._transforms is not None:
img, target = self._transforms(img, target)
# img = img[[2, 1, 0],:]
if (idx % 2) == 0:
height, width = img.shape[-2:]
img = img.flip(-1)
bbox = target["boxes"]
bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
target["boxes"] = bbox
return img, target
def __len__(self):
return 2*len(self.images)
@hktxt FYI i can get easily 72% mAP using the example provided in FasterRCNN source code using mobilenet_v2 as backbone:
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
model = torchvision.models.detection.faster_rcnn.FasterRCNN(backbone,
num_classes=21,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler)
no need to modify the BoxHead.
Hey guys @fmassa @lpuglia thanks for the great discussion. I followed and modified some of your code to train for Object detectoin in pascal VOC 2007 . I also followed and borrowed code from the pytorch tutorial for 'transfer learning for object detection for Penn Fudan dataset' especially to evaluate the model. However I didn't get any good mAP results with VGG as backbone.. infact it was showing 0 mAP. I will attach the code below. Could you comment if anything is missing there thanks.
import os
import numpy as np
import torch
from PIL import Image
import torch.nn as nn
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor, AnchorGenerator
from engine import train_one_epoch, evaluate
import utils
import transforms as T
from torchvision.datasets import VOCDetection
from tqdm import tqdm
from torch.utils.tensorboard import SummaryWriter
#%%
class PrepareInstance(object):
CLASSES = (
"__background__ ",
"aeroplane",
"bicycle",
"bird",
"boat",
"bottle",
"bus",
"car",
"cat",
"chair",
"cow",
"diningtable",
"dog",
"horse",
"motorbike",
"person",
"pottedplant",
"sheep",
"sofa",
"train",
"tvmonitor",
)
def __call__(self, image, target):
anno = target['annotation']
h, w = anno['size']['height'], anno['size']['width']
boxes = []
classes = []
area = []
iscrowd = []
objects = anno['object']
if not isinstance(objects, list):
objects = [objects]
for obj in objects:
bbox = obj['bndbox']
bbox = [int(bbox[n]) - 1 for n in ['xmin', 'ymin', 'xmax', 'ymax']]
boxes.append(bbox)
classes.append(self.CLASSES.index(obj['name']))
iscrowd.append(int(obj['difficult']))
area.append((bbox[2] - bbox[0]) * (bbox[3] - bbox[1]))
boxes = torch.as_tensor(boxes, dtype=torch.float32)
classes = torch.as_tensor(classes)
area = torch.as_tensor(area)
iscrowd = torch.as_tensor(iscrowd)
image_id = anno['filename'][0:6]
image_id = torch.as_tensor([int(image_id)])
target = {}
target["boxes"] = boxes
target["labels"] = classes
target["image_id"] = image_id
# for conversion to coco api
target["area"] = area
target["iscrowd"] = iscrowd
return image, target
class VOCDetection_flip(VOCDetection):
def __init__(self, img_folder, year, image_set, transforms):
super().__init__(img_folder, year, image_set)
self._transforms = transforms
def __getitem__(self, idx):
real_idx = idx//2
img, target = super(VOCDetection_flip, self).__getitem__(real_idx)
target = dict(image_id=real_idx, annotations=target['annotation'])
if self._transforms is not None:
img, target = self._transforms(img, target)
# img = img[[2, 1, 0],:]
if (idx % 2) == 0:
height, width = img.shape[-2:]
img = img.flip(-1)
bbox = target["boxes"]
bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
target["boxes"] = bbox
return img, target
def __len__(self):
return 2*len(self.images)
def get_voc(root, image_set, transforms):
t = [PrepareInstance()]
if transforms is not None:
t.append(transforms)
transforms = T.Compose(t)
dataset = VOCDetection(root, '2007', image_set, transforms=transforms, download=False)
return dataset
def get_transform(istrain=False):
transforms = []
transforms.append(T.ToTensor())
if istrain:
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
class BoxHead(nn.Module):
def __init__(self, vgg):
super(BoxHead, self).__init__()
self.classifier = nn.Sequential(*list(vgg.classifier._modules.values())[:-1])
self.in_features = 4096 # feature out from mlp
def forward(self, x):
x = x.flatten(start_dim=1)
x = self.classifier(x)
return x
#def get_model_FRCNN(num_classes):
#
# # modified from this issue page https://github.com/pytorch/vision/issues/1116
# vgg = torchvision.models.vgg16(pretrained=True)
# backbone = vgg.features[:-1]
# for layer in backbone[:10]:
# for p in layer.parameters():
# p.requires_grad = False
# backbone.out_channels = 512
# anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
# aspect_ratios=((0.5, 1.0, 2.0),))
# roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
# output_size=7,
# sampling_ratio=2)
#
# box_head = BoxHead(vgg)
# in_features = box_head.in_features
#
# model = torchvision.models.detection.faster_rcnn.FasterRCNN(
# backbone, #num_classes,
# rpn_anchor_generator = anchor_generator,
# box_roi_pool = roi_pooler,
# box_head = box_head,
# box_predictor = FastRCNNPredictor(in_features, num_classes))
#
# return model
def get_model_FRCNN(num_classes):
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
output_size=7,
sampling_ratio=2)
model = torchvision.models.detection.faster_rcnn.FasterRCNN(backbone,
num_classes,
rpn_anchor_generator=anchor_generator,
box_roi_pool=roi_pooler)
return model
#%%
if __name__ == "__main__":
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 21 # 20 classes + background for VOC
dataset = get_voc('.', 'trainval', transforms = get_transform(istrain=False))
dataset_test = get_voc('.', 'test', transforms = get_transform(istrain=False))
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=4, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=8, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
print('data prepared, train data: {}'.format(len(dataset)))
print('data prepared, test data: {}'.format(len(dataset_test)))
#%%
# get the model using our helper function
model = get_model_FRCNN(num_classes)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=30,
gamma=0.1)
# let's train it for 10 epochs
num_epochs = 40
# setup log data writer
if not os.path.exists('log'):
os.makedirs('log')
writer = SummaryWriter(log_dir='log')
#%%
iters_per_epoch = int( len(data_loader) / data_loader.batch_size)
for epoch in range(num_epochs):
loss_epoch = {}
loss_name = ['loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg']
for ii, (images, targets) in tqdm(enumerate(data_loader),total=len(data_loader)):
model.train()
optimizer.zero_grad()
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
# training
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
losses.backward()
optimizer.step()
lr_scheduler.step()
info = {}
for name in loss_dict:
info[name] = loss_dict[name].item()
writer.add_scalars("losses", info, epoch * iters_per_epoch + ii)
if (epoch + 1 ) % 1 == 0:
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
writer.close()
@lpuglia I think we should add one example with Pascal VOC somewhere. If you could send an initial PR, I could look into improving it and merging it in torchvision.
Great I think that would be good. Also I noticed the default data set dowloader for VOC doesn't have the test split. But the test split (images and labels) has been released by the community already. I don't know why it was not included in the downloads though.
@fmassa this is a good idea, very often beginners do not have the power to crunch all the data in coco, VOC is much easier to start with.
@fmassa here is the pull request:
https://github.com/pytorch/vision/pull/1216
it should work out of the box.
Thanks @lpuglia for the PR!
I'll have a closer look to the PR (and get it merged) once I'm back from holidays
@lpuglia
Hello friends, I have spent two weeks on torchvision.fasterrcnn_resnet, unfortunately, I still have not been able to complete the training. Can you provide me some training network code for me? thank you very much! My email is [email protected].
@AFutureD Here is the pull request code:
https://github.com/lpuglia/torchvision_voc
it uses resnet as default backbone
I have a new annotated dataset and have used tensorflow for faster-rcnn transfer learning and it works well, but want to migrate to pytorch. This thread has me worried it isn't quite out of the box yet? Am I wrong, and if so is there some tutorial/treatment specifically of faster-rcnn transfer learning?
Sorry to add noise to this thread I am very new to pytorch and really want to use it for my application as I'm tired of having to explain to people what a session is, am ready to move to something pythonic. :) I have followed the basic tutorials on transfer learning/torchvision at the web site and really love it.
@lpuglia Thanks for your code. :)
Most helpful comment
@hktxt FYI i can get easily 72% mAP using the example provided in FasterRCNN source code using
mobilenet_v2as backbone:no need to modify the
BoxHead.