Vision: Make ResNet max_resolution tuneable

Created on 4 May 2020 · 6Comments · Source: pytorch/vision

🚀 Feature

Perhaps this is my ignorance.. but I am attempting to avoid cropping or downsampling of my 4K sized images.. as they go into Faster R CNN with a ResNet 101 backbone. Details can be found at https://discuss.pytorch.org/t/how-do-i-avoid-downsampling-with-faster-rcnn-resnet-backbone/79740

I found the max_size= parameter.. but it took only an integer.. which I couldn't not quite understand what that was for.. perhaps I need to use min_size= ?

I'm happy to help update docs pointing out where to go, in order to turn off downsampling.. however I have a feeling this is quite low in the api and no exposed at the levels I am utilizing torchvision..?

Motivation

My GPU has more RAM and I would like to minimize the downsampling and set batch_size to 1.. giving all possible RAM to avoid loss of pixel information as the image enters the training phase.

Pitch

Add either a explanation how to implement, or documentation stating what you must do if it cannot be done with the torchvision library in it's current roadmap. I think my 6GB card is just the start and many people are starting to process larger than 4K images in training.

Alternatives

Additional context

I was able to run maximum VRAM utilization with batch_size = 1 using this project

https://github.com/jwyang/faster-rcnn.pytorch/tree/pytorch-1.0

Hoping I can port my work to torchvision and get more to do likewise, as it's much cleaner code! Great work so far

enhancement help wanted documentation object detection

Source

EMCP

All 6 comments

@fmassa Do you think this's worth a PR?

zhangguanheng66 on 5 May 2020

Hi,

Improving the docs would be great!

It is currently possible to define what is the max size of the image that will be fed to Faster R-CNN, as you said via the max_size and min_size.
Their use is the following: the resizing inside Faster R-CNN will be such that it keeps the aspect ratio of the image. So if your image is 4000x8000, and min_size is 200, the images will be rescaled to 200x400.
max_size is used to add an upper bound on the maximum size of the image. For example, if we consider the previous case of a 4000x8000 image, if min_size=200 and max_size=300, then Faster R-CNN will rescale your input to 150x300.

Let me know if it is clear. If you could send a PR clarifying those points it would be great.

fmassa on 5 May 2020

👍1

Sounds like a plan. Then I need to verify the model I built using the min-max parameters because when I am overriding the backbone with ResNet... the min / max do not have any affect on memory footprint.. could it be I have tuned down the # of anchors in the RPN so that it is easier to fit within the measly 6GB ?

EMCP on 5 May 2020

@EMCP without further information it's hard to know what's going on. But the image resolution for Faster R-CNN can be tuned with the arguments I mentioned

fmassa on 5 May 2020

👍1

Once I get past the entire tutorial I will verify work and attempt a PR, meanwhile I am a bit stuck with an AssertionError within the CocoEvaluator area... will peak over at pycocotools to see whats up

https://discuss.pytorch.org/t/torchvision-cocoevaluator-data-set-assertionerror/79882

EMCP on 5 May 2020

I figured out what I was doing wrong... I have chosen to perform a freezing of the gradients within the backbone.. my mistake now makes the network run on the 6GB card.. but this is not what I want... I want what is now being called "fine-tuning"..

https://discuss.pytorch.org/t/how-do-i-avoid-downsampling-with-faster-rcnn-resnet-backbone/79740/2

When I tried to follow another post regarding fine-tuning , I get shape errors .. likely due to the fact that in that example they create a "Criterion" object nn.CrossEntropyLoss() .. and since that is a classification problem.. not detection.. I need some time to work with the code

       if model_conf["hyperParameters"]["freeze_pretrained_gradients"]:
            print("Using backbone as fixed feature extractor")
            modules = list(backbone_nn.children())[:-1]  # delete the last fc layer.
            backbone_nn = nn.Sequential(*modules)

            # FasterRCNN needs to know the number of
            # output channels in a backbone. For resnet101, it's 2048
            for param in backbone_nn.parameters():
                param.requires_grad = False
            backbone_nn.out_channels = 2048
        else:
            print("Using fine-tuning of the model")
            modules = list(backbone_nn.children())[:-1]  # delete the last fc layer.
            backbone_nn = nn.Sequential(*modules)

            # FasterRCNN needs to know the number of
            # output channels in a backbone. For resnet101, it's 2048
            for param in backbone_nn.parameters():
                param.requires_grad = True
            backbone_nn.out_channels = 2048
        #

now I need to sort my scales.. apparently?

```
Using hyperParameters:
{'hyperParameters': {'anchor_ratios': [0.5, 1, 2],
'anchor_scales': [4, 8, 16, 32],
'batch_size': 1,
'display_interval': 100,
'epoch_max': 400,
'epoch_start': 0,
'freeze_pretrained_gradients': False,
'learning_decay_gamma': 0.01,
'learning_decay_milestones': [5, 10, 45],
'learning_decay_step': 15,
'learning_rate': 0.005,
'learning_weight_decay': 0.0005,
'max_size_image': 1080,
'min_size_image': 800,
'momentum': 0.9,
'net': 'wide_resnet101_2',
'normalization_mean': [0.485, 0.456, 0.406],
'normalization_std': [0.229, 0.224, 0.225],
'optimizer': 'sgd',
'pooling_size': 7,
'rpn_nms_thresh': 0.7,
'rpn_post_nms_top_n_train': 5,
'rpn_pre_nms_top_n_train': 10,
'testing': {'check_epoch': 20,
'check_session': 1,
'enable_visualization': True}},
'pytorch_engine': {'enable_cuda': True,
'enable_multiple_gpus': False,
'enable_tfb': True,
'num_workers': 1,
'resume_checkpoint': False,
'resume_checkpoint_epoch': 1,
'resume_checkpoint_num': 0,
'resume_checkpoint_session': 1,
'session': 1},
'pytorch_engine_scoring': {'enable_cuda': False,
'enable_multiple_gpus': False,
'enable_tfb': True,
'num_workers': 1,
'resume_checkpoint': False,
'resume_checkpoint_epoch': 1,
'resume_checkpoint_num': 0,
'resume_checkpoint_session': 1,
'session': 1}}
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
found 5 categories in data
Creating model backbone with wide_resnet101_2
Using fine-tuning of the model
/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
/opt/conda/conda-bld/pytorch_1587428266983/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, , Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple)
Traceback (most recent call last):
File "/home/emcp/Dev/git/EMCP/faster-rcnn-torchvision/model_components/training.py", line 131, in
train(data_conf=config_json, model_conf=model_conf)
File "/home/emcp/Dev/git/EMCP/faster-rcnn-torchvision/model_components/training.py", line 95, in train
print_freq=model_conf["hyperParameters"]["display_interval"])
File "/home/emcp/Dev/git/EMCP/faster-rcnn-torchvision/model_components/references/detection/engine.py", line 33, in train_one_epoch
loss_dict = model(images, targets)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(input, *kwargs)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 71, in forward
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(input, *kwargs)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 754, in forward
box_features = self.box_roi_pool(features, proposals, image_shapes)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(input, **kwargs)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 186, in forward
self.setup_scales(x_filtered, image_shapes)
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 156, in setup_scales
scales = [self.infer_scale(feat, original_input_shape) for feat in features]
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 156, in
scales = [self.infer_scale(feat, original_input_shape) for feat in features]
File "/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 143, in infer_scale
assert possible_scales[0] == possible_scales[1]
AssertionError