Pytorch-cyclegan-and-pix2pix: Error(s) in loading state_dict for ResnetGenerator

Created on 19 Jun 2018  路  20Comments  路  Source: junyanz/pytorch-CycleGAN-and-pix2pix

Today, I want to test my trained model. There are some errors not occur before.

Traceback (most recent call last):
File "test.py", line 19, in
model.setup(opt)
File "/home/t-fayan/vision/pytorch-CycleGAN-and-pix2pix/models/base_model.py", line 43, in setup
self.load_networks(opt.which_epoch)
File "/home/t-fayan/vision/pytorch-CycleGAN-and-pix2pix/models/base_model.py", line 130, in load_networks
net.load_state_dict(state_dict)
File "/home/t-fayan/anaconda2/envs/py27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResnetGenerator:
Missing key(s) in state_dict: "model.10.conv_block.6.bias", "model.10.conv_block.6.weight", "model.10.conv_block.7.running_var", "model.10.conv_block.7.running_mean", "model.11.conv_block.6.bias", "model.11.conv_block.6.weight", "model.11.conv_block.7.running_var", "model.11.conv_block.7.running_mean", "model.12.conv_block.6.bias", "model.12.conv_block.6.weight", "model.12.conv_block.7.running_var", "model.12.conv_block.7.running_mean", "model.13.conv_block.6.bias", "model.13.conv_block.6.weight", "model.13.conv_block.7.running_var", "model.13.conv_block.7.running_mean", "model.14.conv_block.6.bias", "model.14.conv_block.6.weight", "model.14.conv_block.7.running_var", "model.14.conv_block.7.running_mean", "model.15.conv_block.6.bias", "model.15.conv_block.6.weight", "model.15.conv_block.7.running_var", "model.15.conv_block.7.running_mean", "model.16.conv_block.6.bias", "model.16.conv_block.6.weight", "model.16.conv_block.7.running_var", "model.16.conv_block.7.running_mean", "model.17.conv_block.6.bias", "model.17.conv_block.6.weight", "model.17.conv_block.7.running_var", "model.17.conv_block.7.running_mean", "model.18.conv_block.6.bias", "model.18.conv_block.6.weight", "model.18.conv_block.7.running_var", "model.18.conv_block.7.running_mean".
Unexpected key(s) in state_dict: "model.10.conv_block.5.weight", "model.10.conv_block.5.bias", "model.10.conv_block.6.running_mean", "model.10.conv_block.6.running_var", "model.11.conv_block.5.weight", "model.11.conv_block.5.bias", "model.11.conv_block.6.running_mean", "model.11.conv_block.6.running_var", "model.12.conv_block.5.weight", "model.12.conv_block.5.bias", "model.12.conv_block.6.running_mean", "model.12.conv_block.6.running_var", "model.13.conv_block.5.weight", "model.13.conv_block.5.bias", "model.13.conv_block.6.running_mean", "model.13.conv_block.6.running_var", "model.14.conv_block.5.weight", "model.14.conv_block.5.bias", "model.14.conv_block.6.running_mean", "model.14.conv_block.6.running_var", "model.15.conv_block.5.weight", "model.15.conv_block.5.bias", "model.15.conv_block.6.running_mean", "model.15.conv_block.6.running_var", "model.16.conv_block.5.weight", "model.16.conv_block.5.bias", "model.16.conv_block.6.running_mean", "model.16.conv_block.6.running_var", "model.17.conv_block.5.weight", "model.17.conv_block.5.bias", "model.17.conv_block.6.running_mean", "model.17.conv_block.6.running_var", "model.18.conv_block.5.weight", "model.18.conv_block.5.bias", "model.18.conv_block.6.running_mean", "model.18.conv_block.6.running_var".

What does it mean? Also, if I test pretrained model like horse2zebra, these errors occur too. But I didn't encounter these errors before.

Most helpful comment

I've had similar issue while trying to test CycleGAN (--model test) on my own dataset.
Default --norm instance was used for both training and testing.
Deleting the root project directory and cloning this repository again did not help.

I've noticed that the problem is wrong keys in the state_dict dictionary.
E.g. model tries to load missing key "model.10.conv_block.6.weight", however there is unexpected key "model.10.conv_block.5.weight". So, I decided to fix these wrong keys.

Based on my error message I have created two lists (missing_list and expected_list). Afterwards I've replaced wrong keys with corresponding correct ones.

Example of my snippet is here. I've inserted it after line 135 here.

All 20 comments

Facing same issue

I deleted the root project directory and clone this repository again. Then, it works. I don't know the reason. This is just a fast method to solve this issue for me.

Yes, please check out the latest commit.

I face the same problem.
I'd like to run a CycleGAN pre-trained model.
However, " RuntimeError : Unexpected key (s) in state_dict " occurs. Please help me.

I face the same issue. I've trained pix2pix model with the previous version of the code and tried to test it using the older and the latest commit and got the same "missing keys in state_dict" error in both.

Could you check if you have used the same normalization (batchnorm, instancenorm) during training and test?

Faced the same error.

screen shot 2018-07-12 at 10 42 24 pm

I used the default normalization (instancenorm) during training and test.
I was able to solve the problem downloading the newest version of the code and training the model again.

I faced same error on applying a pre-train model (cyclegan) in the newest version.
Is normalization(instancenorm) also related to this case?

Sorry, I solved this problem by correct docker setting.
When I used pytorch-nightly instead of pytorch, I got good results.
This is my dockerfile.(forgive my dockerfile that is dirty)

FROM nvidia/cuda:9.0-cudnn7-devel

RUN apt-get update && apt-get install -y \
   build-essential curl wget git cmake vim pkg-config unzip libgtk2.0-dev python3 python3-pip \
   imagemagick graphviz > /dev/null

# Miniconda3
ENV PATH /opt/conda/bin:$PATH
ENV LB_LIBRARY_PATH /opt/conda/lib:$LB_LIBRARY_PATH
RUN curl -Ls https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/install-miniconda.sh && \
   /bin/bash /tmp/install-miniconda.sh -b -p /opt/conda && \
   conda update -n base conda && \
   conda update --all -y

# Basic dependencies
RUN conda config --add channels conda-forge
RUN conda install -y readline mkl openblas numpy scipy hdf5 \
   pillow matplotlib cython pandas gensim protobuf \
   lmdb leveldb boost jupyterlab
RUN pip install pydot_ng nnpack h5py scikit-learn scikit-image hyperdash backports.ssl_match_hostname

# OpenCV
RUN conda install opencv3 -c menpo -y
RUN conda install dominate bz2file visdom

# PyTorch
RUN conda install pytorch-nightly torchvision cuda90 -c pytorch -y

# For CycleGAN and pix2pix
ADD ./vision /vision
WORKDIR /vision
RUN python3 setup.py install
WORKDIR /

@taesung89

The issue with unexpected key: num_batches_tracked should be fixed by the latest commit.
Regarding the first error on this thread, I believe it's because PyTorch's default setting has changed. Could you get the latest pytorch and try again?

Thank you very much. I downloaded the new version of the code and fixed the problem.

Could you check if you have used the same normalization (batchnorm, instancenorm) during training and test?

Thank you @junyanz This resolved it for me.

I've had similar issue while trying to test CycleGAN (--model test) on my own dataset.
Default --norm instance was used for both training and testing.
Deleting the root project directory and cloning this repository again did not help.

I've noticed that the problem is wrong keys in the state_dict dictionary.
E.g. model tries to load missing key "model.10.conv_block.6.weight", however there is unexpected key "model.10.conv_block.5.weight". So, I decided to fix these wrong keys.

Based on my error message I have created two lists (missing_list and expected_list). Afterwards I've replaced wrong keys with corresponding correct ones.

Example of my snippet is here. I've inserted it after line 135 here.

Another solution to this is to modify net.load_state_dict(state_dict, strict=False) where I added the strict=False option. This allows the network to load weights as long as the sizes and the number of parameters fit, even if the key-names aren't exact.

I faced the same issue. But I added "--no_dropout" when I tested, the issue was gone. As follows:
python test.py --no_dropout

I faced the same issue. But I added "--no_dropout" when I tested, the issue was gone. As follows:
python test.py --no_dropout

Thank you @SunLeL ,it works for me!!!!

I faced the same issue. But I added "--no_dropout" when I tested, the issue was gone. As follows:
python test.py --no_dropout

Amazing! I found when I use unet_256, it is ok. The error happens when I use resnet_6blocks.
By the way, use --no_dropout works for me!!!

Another solution to this is to modify net.load_state_dict(state_dict, strict=False) where I added the strict=False option. This allows the network to load weights as long as the sizes and the number of parameters fit, even if the key-names aren't exact.

Yes, this method works. One has to make the changes in models/base_model.py file.
Btw, --no_dropout also works. But the results are different.

Was this page helpful?
0 / 5 - 0 ratings