Vision: Failed in fine-tuning inception_v3

Created on 18 Oct 2017 · 26Comments · Source: pytorch/vision

I failed in using inception_v3 on my own dataset. (Ubuntu14.04, cuda8.0, python3.6.2)

It outputs warning when loaded:

/home/ljy/anaconda3/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/inception.py:65: UserWarning: src is not broadcastable to dst, but they have the same number of elements.  Falling back to deprecated pointwise behavior.

It failed which training:

Traceback (most recent call last):
  File "/home/ljy/pytorch-examples-master/cub_pytorch/main.py", line 382, in <module>
    main()
  File "/home/ljy/pytorch-examples-master/cub_pytorch/main.py", line 213, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "/home/ljy/pytorch-examples-master/cub_pytorch/main.py", line 251, in train
    loss = criterion(output, target_var)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 482, in forward
    self.ignore_index)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 746, in cross_entropy
    return nll_loss(log_softmax(input), target, weight, size_average, ignore_index)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 537, in log_softmax
    return _functions.thnn.LogSoftmax.apply(input)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 126, in forward
    ctx._backend = type2backend[type(input)]
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/_thnn/__init__.py", line 15, in __getitem__
    return self.backends[name].load()
KeyError: <class 'tuple'>

Source

JingyunLiang

Most helpful comment

@jamiechoi1995 @MichaelLiang12, @TiRune is correct, inception_v3 has an aux branch, and if this is not disabled the forward function will return a tuple (see here), which when passed to the criterion will throw this error.

So you have two choices:
1) disable aux_logits when the model is created here by also passing aux_logits=False to the inception_v3 function.

2) edit your train function to accept and unpack the returned tuple here to be something like:

output, aux = model(input_var)

alykhantejani on 1 Nov 2017

👍25 🎉4 ❤3 🚀1 😕1

All 26 comments

Hi @MichaelLiang12,

What PyTorch version are you using (found by torch.__version__), also can you provide us with a minimum working example to reproduce this?

Thanks

alykhantejani on 18 Oct 2017

Also the user warning you are getting when loading the model is fixed in master (via #231)

alykhantejani on 18 Oct 2017

🎉1

Same issue:

(tensorflow) wcai@tdtd-desktop ~/tensorflow/AI_competition/pytorch $ python main.py -a inception_v3 . --pretrained
=> using pre-trained model 'inception_v3'
/home/wcai/tensorflow/lib/python3.5/site-packages/torchvision/models/inception.py:65: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
m.weight.data.copy_(values)
Traceback (most recent call last):
File "main.py", line 353, in
main()
File "main.py", line 176, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 214, in train
loss = criterion(output, target_var)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(input, *kwargs)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 482, in forward
self.ignore_index)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/functional.py", line 746, in cross_entropy
return nll_loss(log_softmax(input), target, weight, size_average, ignore_index)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/functional.py", line 537, in log_softmax
return _functions.thnn.LogSoftmax.apply(input)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 126, in forward
ctx._backend = type2backend[type(input)]
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/_thnn/__init__.py", line 15, in __getitem__
return self.backends[name].load()
KeyError:

Python: Python 3.5.2

print (torch.__version__)
0.2.0_3

jamiechoi1995 on 27 Oct 2017

Hi @jamiechoi1995,

Can you provide a minimum working example of this failing (i.e. an input that causes this when you pass it the model).

From the stack trace it seems like the input to the loss is a tuple, instead of a Variable.

alykhantejani on 27 Oct 2017

Hi @alykhantejani

You can reproduce this problom by using the code in https://github.com/pytorch/examples/tree/master/imagenet

I modify the size of rescale and crop to 299 for inception v3,
and my train&validate data are jpg files and the corresponding json files.

Using the same code with size of 224 in resnet model is OK,
but when I swith it to inception v3, I got this problem.

Thanks.

jamiechoi1995 on 30 Oct 2017

Isn't this problem because the Aux error branch in the network? If you remove it it should work :)

TiRune on 1 Nov 2017

So you have two choices:
1) disable aux_logits when the model is created here by also passing aux_logits=False to the inception_v3 function.

2) edit your train function to accept and unpack the returned tuple here to be something like:

output, aux = model(input_var)

alykhantejani on 1 Nov 2017

👍25 🎉4 ❤3 🚀1 😕1

@alykhantejani:Hi, why we have to disable the aux_logits?, what are these aux_logits? does they effect the training/validation?

I'm trying to reproduce the accuracy from a model trained using with the bvlc_googlenet (without pretrained weights). So when I do aux branch off with pytorch(googlenet) it works and reports val_acc with 50% which is very low when compared to the caffe. any other methods to reproduce the same accurcy using pytorch?
Thanks.

@jamiechoi1995 @MichaelLiang12, @TiRune is correct, inception_v3 has an aux branch, and if this is not disabled the forward function will return a tuple (see here), which when passed to the criterion will throw this error.

So you have two choices:
1. disable `aux_logits` when the model is created [here](https://github.com/pytorch/examples/blob/master/imagenet/main.py#L75) by also passing `aux_logits=False` to the `inception_v3` function.

2. edit your `train` function to accept and unpack the returned tuple [here](https://github.com/pytorch/examples/blob/master/imagenet/main.py#L194) to be something like:
output, aux = model(input_var)

rajasekharponakala on 23 Mar 2019

@rajasekharponakala the aux_logits is a separate classifier that is added to help during training, but it is not used during inference.

I'm trying to reproduce the accuracy from a model trained using with the bvlc_googlenet (without pretrained weights). So when I do aux branch off with pytorch(googlenet) it works and reports val_acc with 50% which is very low when compared to the caffe. any other methods to reproduce the same accurcy using pytorch?

Both googlenet and inception_v3 use pre-trained weights from TensorFlow, and as far as I know we didn't manage to reproduce accuracies from the paper when training from scratch.

fmassa on 24 Mar 2019

Hi @fmassa, thanks. I followed (pytorch discourse) to add below lines in train() imagenet example.

output, aux = model(input_var)
loss1 = criterion(output, target)
loss2 = criterion(aux, target)
loss = loss1 + 0.4*loss2

but ended with error:

Traceback (most recent call last):
  File "imagenet.py", line 407, in <module>
    main()
  File "imagenet.py", line 114, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "imagenet.py", line 240, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, args)
  File "imagenet.py", line 281, in train
    output, aux = model(input)
ValueError: too many values to unpack (expected 2)

any idea?

rajasekharponakala on 25 Mar 2019

you need to set your model to train() mode, it's probably in eval mode

fmassa on 25 Mar 2019

Thanks. Yes, I'm following the example/imagenet/main.py script:

def main()
      ...
def main_worker()
      ...
def train()
      ....
      model.train()
      ....
      outputs, aux_outputs = model(inputs)
      loss1 = criterion(outputs, target)
      loss2 = criterion(aux_outputs, target)
      loss = loss1 + 0.4*loss2
def validate()
      ...
      model.eval()
      ...
     outputs = model(inputs)
     loss = criterion(outputs, target)
     ....
def adjust_learning_rate()
     ...
def accuracy()
     ...

I found some other method in dicourse

        output = model(input) 
        loss = None
        # for nets that have multiple outputs such as inception
        if isinstance(output, tuple):
            loss = sum((criterion(o,target) for o in output))
        else:
            loss = criterion(output, target)

This times it throws different error:

Traceback (most recent call last):
  File "imagenet.py", line 417, in <module>
    main()
  File "imagenet.py", line 114, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "imagenet.py", line 240, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, args)
  File "imagenet.py", line 298, in train
    acc1, acc5 = accuracy(output, target, topk=(1, 5))
  File "imagenet.py", line 405, in accuracy
    _, pred = output.topk(maxk, 1, True, True)
AttributeError: 'tuple' object has no attribute 'topk'

rajasekharponakala on 25 Mar 2019

The issue is that both googlenet and inception can return auxiliary classifiers in training mode.
Your code is not taking that into account, or you didn't set aux classifiers. Double-check that and you'll be able to find the issue.

fmassa on 25 Mar 2019

Yeah. def main_worker() set to

if args.pretrained:
        print("=> using pre-trained model '{}'".format(args.arch))
        model = models.__dict__[args.arch](pretrained=True)
    else:
        print("=> creating model '{}'".format(args.arch))
        model = models.__dict__[args.arch](aux_logits=True)

and also vision/models/googlenet.py has

class GoogLeNet(nn.Module):

    def __init__(self, num_classes=1000, aux_logits=True, transform_input=False, init_weights=True):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits
        self.transform_input = transform_input
        .....
        def forward() #has self.aux_logits

rajasekharponakala on 25 Mar 2019

@rajasekharponakala one thing to note here is that GoogLeNet has two aux branches where as inception v3 only has one.

So for GoogLeNet you have to use:
aux1, aux2, output = model(inputs)

TheCodez on 25 Mar 2019

👍1

@TheCodez: Thanks, its working now!
format:
aux1, aux2, output = model(inputs) loss1 = criterion(outputs, target) loss2 = criterion(aux1, target) loss3 = criterion(aux2, target) loss = loss1 + 0.4*(loss2+loss3)

rajasekharponakala on 25 Mar 2019

@rajasekharponakala the correct weighting scheme for GoogLeNet is using 0.3:

aux1, aux2, output = model(inputs)     
loss1 = criterion(outputs, target)
loss2 = criterion(aux1, target)
loss3 = criterion(aux2, target)
loss = loss1 + 0.3 * (loss2 + loss3)

TheCodez on 25 Mar 2019

Yeah, thanks.

rajasekharponakala on 25 Mar 2019

👍1

@TheCodez @fmassa @alykhantejani @rajasekharponakala Do we have to set auxiliary classifiers in test mode? I get very poor test accuracy when I retrieve trained model ( auxiliary classifiers are set here). I'm using inception v3 model for my task!

tejasri19 on 10 Jul 2019

@tejasri19 for inference, don't forget to set your model to eval() mode.

You don't need to use the aux classifiers for inference, only for training

fmassa on 10 Jul 2019

👍1

Hi, i have a question. In the https://github.com/pytorch/vision/blob/master/torchvision/models/googlenet.py
it's

        if self.training and self.aux_logits:
            return _GoogLeNetOutputs(x, aux2, aux1)
        return x

_GoogLeNetOutputs = namedtuple('GoogLeNetOutputs', ['logits', 'aux_logits2', 'aux_logits1'])

so, should it be
output, aux2, aux1 = model(inputs)
but not
aux1, aux2, output = model(inputs)

Is it right? Thanks.

Holmeyoung on 16 Jul 2019

It should be output, aux2, aux1.

fmassa on 16 Jul 2019

Thanks for this thread it really helped me but now I'm getting this error when unpacking the model output:
output, aux1= model(data)
ValueError: too many values to unpack (expected 2)

and even when I added an extra output to unpack:
output, aux2, aux1 = model(data)
I still have the following error:
not enough values to unpack (expected 3, got 2)

gamesMum on 16 May 2020

I solved it by unpacking the output in seperatelly:
output = model(data).logits
aux1 = model(data).aux_logits
It seems that there are extra outputs such as counts that I don't believe we need for training

gamesMum on 16 May 2020

@gamesMum I would advise not to do that, as you are essentially running your model twice.
Instead just use this once:
output = model(data)

and then access using: