I am still a beginner with pytorch, but it seems to me that the classifier part of the VGG model should have one extra line:
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
**nn.Softmax()**
)
From the paper:
"A stack of convolutional layers (which has a different depth in different architectures) is followed by
three Fully-Connected (FC) layers: the first two have 4096 channels each, the third performs 1000-
way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is
the soft-max layer. The configuration of the fully connected layers is the same in all networks."
The softmax is not necessary in the main model, because we use a loss layer during training called cross_entropy that combines the log soft max with then negative log likelihood (more stable than using softmax straightaway).
For inference you can add a F.softmax in the output of your model.
Most helpful comment
The softmax is not necessary in the main model, because we use a loss layer during training called
cross_entropythat combines the log soft max with then negative log likelihood (more stable than using softmax straightaway).For inference you can add a
F.softmaxin the output of your model.