Hi everyone,
I am trying to do activity maximization described in Karen Simonyan's paper (http://arxiv.org/pdf/1312.6034.pdf) using the pre-trained ImageNet model in Caffe. I want to find the input image that maximize the score of each class in the Softmax layer. In other to do so, derivatives of the scores versus input image need to be computed. Do you know how to use back propagation to calculate those derivatives in Caffe?
Thank you so much!!!
Sincerely,
Tan Nguyen
It is a little tricky since that is not often needed, and is rather a visualization trick :) Anyway, here is a brief outline how to do it:
(1) set force_backward() in the network definition to true, so that the gradient is backproped throughout the network to the input.
(2) Either in C or in Python, get the gradients of the input blob. This gives you Figure 2 of their paper.
(3) Write your own optimizer - this is a little tricky and depends on what optimization tool you use. Think this way: the network basically is a function that gives you f(input) and f'(input), so you can plug that into any standard optimizer like sklearn's optimization toolbox. Eventually Figure 1 will sort-of pop up.
I wrote one during NIPS 2013 with ipython notebook following the above idea, so it's definitely doable - I am too embarrassed to share that notebook though, it was a hack in a casino room when it was storming outside.
Hi Yangqing,
Thank you so much for your great help!!! Following your instructions, we came up with a way to get the gradients of the score (output of layer "fc8") with respect to the input data as follows:
1) Delete the softmax layer in imagenet_deploy.prototxt
2) Using net.blobs['data'].diff to get the desired gradients
We wonder if it is the right way to take the gradients we want? Also, how can we initialize the input to a zero image in caffe?
Hi Yangquing,
I tried to compute the gradients of the scores (outputs of layer "fc8") with respect to the input image using net.backward(), but got a 1 x 3 x 227 x 227 array. Because there are 1000 hidden units in "fc8", should the result be a 5-D array instead? My code is as follow (I deleted the "prob" layer in imagenet_deploy.prototxt so that the final output is the outputs of layer "fc8". I also deleted drop_out layers in imagenet_deploy.prototxt because they are not necessary after the network is trained):
input_image = caffe.io.load_image(caffe_root+IMAGE_FILE)
input_resized = np.reshape(caffe.io.resize_image(input_image,net.image_dims),[1,227,227,3])
caffe_input = np.asarray([net.preprocess('data',in_) for in_ in input_resized])
out=net.forward({net.inputs[0]:caffe_input})
bottom_diff=net.backward({net.outputs[0]:out['fc8']})
I am not sure if I used net.backward correctly or not. Would you please give me some hint how to use that function? You mentioned in last response that you wrote a program for activity maximization described in Karen's paper during NIPS 2013 with ipython notebook. Do you mind sharing that notebook with me? It would be a very great help!
Thank you so much!!!
I am highly interested in a solution for this as well!
I am able to reproduce Figure 2 from the paper, but still failing at Figure 1
I've been trying to reproduce Figure 1 for a couple of days now, here is what I've done so far in python:
deploy.prototxt
input: "data"
input_dim: 1
input_dim: 3
input_dim: 227
input_dim: 227
input: "label"
input_dim: 1
input_dim: 1
input_dim: 1
input_dim: 1
force_backward: true
...
visualization.py
n_iterations = 1000
learning_rate = 1
label_index = 287
data = np.random.random((1,3,227,227))
label = np.array([label_index]).reshape((1,1,1,1))
for i in range(n_iterations):
fw = net.forward(data=data, label=label)
bw = net.backward()
diff = bw['data']
data -= learning_rate * data * diff
with the normal softmax / softmax_loss layer. Basically I compute the derivative of the loss function wrt to the image (as @Yangqing suggested).
I get fairly decent results, however the reconstructed image is not as clear as in Figure 1.
Should I use another loss function than softmax (e.g. euclidean loss on the last fc-layer)? With softmax I get of probability of 0.99 for my class after some iterations, but as said before, the image is still not meaningful as in Fig. 1
@resterhall the authors of the paper mention that using softmax is not a good idea because the optimization may focus on the denominator of the loss. Thus, you image is adapted to "not look like something else" instead of looking like the class you want. Try substituting the SOFTMAX layer by SIGMOID.
Please ask on the caffe-users mailing list. Thanks!
Hello. Could I ask for update @minhtannguyen and @resterhall?
I have the following files:
deploy.prototxt
name: "CaffeNet"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 227
input_dim: 227
force_backward: true
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "pool1"
type: POOLING
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm1"
type: LRN
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
deconvnet.py
# IMPORT LIBRARY
import numpy as np
import matplotlib.pyplot as plt
import sys, caffe, os, operator
import caffe.io
from caffe.proto import caffe_pb2
from PIL import Image
# FORWARDVISUALIZE FUNCTION
# Visualize class vs probability
def classVisualize (forwardOutput, blobName):
reshapedArray = np.reshape(forwardOutput[blobName][0], (len(forwardOutput[blobName][0]),1))
plt.plot(reshapedArray)
plt.show()
# VISSQUARE
# take an array of shape (n, height, width) or (n, height, width, channels)
# and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)
def visSquare(data, padsize=1, padval=0):
data -= data.min()
data /= data.max()
# force the number of filters to be square
n = int(np.ceil(np.sqrt(data.shape[0])))
padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
# tile the filters into an image
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
plt.imshow(data)
plt.show()
# MAIN FUNCTION
# Define variables and path
modelFile = '/home/caffe-master/examples/deconvnet/deploy.prototxt'
trainedModel = '/home/caffe-master/examples/imagenet/caffe_reference_imagenet_model'
meanFile = '/home/caffe-master/python/caffe/imagenet/ilsvrc_2012_mean.npy'
imageFile = '/home/caffe-master/examples/images/cat.jpg'
# Create net
caffe.set_mode_gpu()
net = caffe.Classifier(modelFile, trainedModel)
net.set_mean('data', np.load(meanFile))
net.set_raw_scale('data', 255)
net.set_channel_swap('data', (2,1,0))
# Display Net structure
print 'Net structure:'
for k,v in net.blobs.items():
print (k, v.data.shape)
# Define input image to net
inputImage = caffe.io.load_image(imageFile)
inputResized = np.reshape(caffe.io.resize_image(inputImage, net.image_dims),[1,227,227,3])
caffeInput = np.asarray([net.preprocess('data', in_) for in_ in inputResized])
# Forward
forwardOutput = net.forward(**{net.inputs[0]: caffeInput})
# Backward: compute diff
# backwardOutput['data'].shape: (1,3,227,227)
backwardOutput = net.backward(**{net.outputs[0]: forwardOutput['conv2']})
visSquare(backwardOutput['data'].transpose(0, 2, 3, 1))
However, the result does not have a good quality as the result in the paper (http://arxiv.org/pdf/1312.6034.pdf). Could you help me?
Also, the paper says "We can conclude that apart from the RELU layer, computing the approximate feature map reconstruction R_n using a DeconvNet is equivalent to computing the derivative ∂f /∂Xn using backpropagation, which is a part of our visualisation algorithms."
However, I think the code is doing backpropagation not only conv and pooling layer, but also relu layer. Does this affect result?
I appreciate your time!
Hi there,
Thanks @minhtannguyen for your expalanation, I've been trying to visualize the figure 2 for one whole week could you please tell me how did you manage to do that? I've got some results but not sure which is true?
Most helpful comment
It is a little tricky since that is not often needed, and is rather a visualization trick :) Anyway, here is a brief outline how to do it:
(1) set force_backward() in the network definition to true, so that the gradient is backproped throughout the network to the input.
(2) Either in C or in Python, get the gradients of the input blob. This gives you Figure 2 of their paper.
(3) Write your own optimizer - this is a little tricky and depends on what optimization tool you use. Think this way: the network basically is a function that gives you f(input) and f'(input), so you can plug that into any standard optimizer like sklearn's optimization toolbox. Eventually Figure 1 will sort-of pop up.
I wrote one during NIPS 2013 with ipython notebook following the above idea, so it's definitely doable - I am too embarrassed to share that notebook though, it was a hack in a casino room when it was storming outside.