Incubator-mxnet: Simplest way to get actvations of certain layer during training.

Created on 25 Jan 2017 · 4Comments · Source: apache/incubator-mxnet

Hello,
I would like to make a picture of the activations of a certain layer during training after every batch.
Let's say I would make a batch_end_callback and then do something and then dump the activations as an image to disk.

What would be the easiest way to go about this ?
Is this even possible ? If not, why not :)

Source

juliandewit

Most helpful comment

@juliandewit I usually measure the states of the mid-level inputs by inserting a CustomOp that intercept the gradient and activation. Like the following:

import ast
import mxnet as mx

def safe_eval(expr):
    if type(expr) is str:
        return ast.literal_eval(expr)
    else:
        return expr

class IdentityOp(mx.operator.CustomOp):
    def __init__(self, logging_prefix="identity", input_debug=False, grad_debug=False):
        super(IdentityOp, self).__init__()
        self.logging_prefix=logging_prefix
        self.input_debug = input_debug
        self.grad_debug = grad_debug

    def forward(self, is_train, req, in_data, out_data, aux):
        if(self.input_debug):
            logging.debug("%s: in_norm=%f, in_shape=%s"
                          %(self.logging_prefix, np.linalg.norm(in_data[0].asnumpy()), str(in_data[0].shape)))
        self.assign(out_data[0], req[0], in_data[0])

    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        if (self.grad_debug):
            logging.debug("%s: grad_norm=%f, grad_shape=%s"
                          % (self.logging_prefix, np.linalg.norm(out_grad[0].asnumpy()), str(out_grad[0].shape)))
        self.assign(in_grad[0], req[0], out_grad[0])


@mx.operator.register("identity")
class IdentityOpProp(mx.operator.CustomOpProp):
    def __init__(self, logging_prefix="identity", input_debug=False, grad_debug=False):
        super(IdentityOpProp, self).__init__(need_top_grad=True)
        self.input_debug = safe_eval(input_debug)
        self.grad_debug = safe_eval(grad_debug)
        self.logging_prefix = str(logging_prefix)

    def list_arguments(self):
        return ['data']

    def list_outputs(self):
        return ['output']

    def infer_shape(self, in_shape):
        data_shape = in_shape[0]
        output_shape = in_shape[0]
        return [data_shape], [output_shape], []

    def create_operator(self, ctx, shapes, dtypes):
        return IdentityOp(input_debug=self.input_debug,
                          grad_debug=self.grad_debug,
                          logging_prefix=self.logging_prefix)

def identity(data, name="identity", logging_prefix=None,
             input_debug=False, grad_debug=False):
    return mx.symbol.Custom(data=data,
                            name=name,
                            logging_prefix=name,
                            input_debug=input_debug,
                            grad_debug=grad_debug,
                            op_type="identity")

We can insert such "identity" operator in the network, like

a = ...
a = identity(a)
b = mx.sym....(a)

You can control the inner behavior of the identity operator, like to print the norm + gradient norm, or saving the activations to the disk.

Another way is to group the activation layer and the final loss function together. Like the following:

mid_level_layer = ...
final_loss = ...mid_level_layer...

out = mx.sym.Group([final_loss, mx.sym.BlockGrad(mid_level_layer)])

We can then use the call_back function to save the second output.

sxjscience on 26 Jan 2017

👍4

All 4 comments

See example/python-howto/monitor_weight

piiswrong on 26 Jan 2017

@juliandewit I usually measure the states of the mid-level inputs by inserting a CustomOp that intercept the gradient and activation. Like the following:

import ast
import mxnet as mx

def safe_eval(expr):
    if type(expr) is str:
        return ast.literal_eval(expr)
    else:
        return expr

class IdentityOp(mx.operator.CustomOp):
    def __init__(self, logging_prefix="identity", input_debug=False, grad_debug=False):
        super(IdentityOp, self).__init__()
        self.logging_prefix=logging_prefix
        self.input_debug = input_debug
        self.grad_debug = grad_debug

    def forward(self, is_train, req, in_data, out_data, aux):
        if(self.input_debug):
            logging.debug("%s: in_norm=%f, in_shape=%s"
                          %(self.logging_prefix, np.linalg.norm(in_data[0].asnumpy()), str(in_data[0].shape)))
        self.assign(out_data[0], req[0], in_data[0])

    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        if (self.grad_debug):
            logging.debug("%s: grad_norm=%f, grad_shape=%s"
                          % (self.logging_prefix, np.linalg.norm(out_grad[0].asnumpy()), str(out_grad[0].shape)))
        self.assign(in_grad[0], req[0], out_grad[0])


@mx.operator.register("identity")
class IdentityOpProp(mx.operator.CustomOpProp):
    def __init__(self, logging_prefix="identity", input_debug=False, grad_debug=False):
        super(IdentityOpProp, self).__init__(need_top_grad=True)
        self.input_debug = safe_eval(input_debug)
        self.grad_debug = safe_eval(grad_debug)
        self.logging_prefix = str(logging_prefix)

    def list_arguments(self):
        return ['data']

    def list_outputs(self):
        return ['output']

    def infer_shape(self, in_shape):
        data_shape = in_shape[0]
        output_shape = in_shape[0]
        return [data_shape], [output_shape], []

    def create_operator(self, ctx, shapes, dtypes):
        return IdentityOp(input_debug=self.input_debug,
                          grad_debug=self.grad_debug,
                          logging_prefix=self.logging_prefix)

def identity(data, name="identity", logging_prefix=None,
             input_debug=False, grad_debug=False):
    return mx.symbol.Custom(data=data,
                            name=name,
                            logging_prefix=name,
                            input_debug=input_debug,
                            grad_debug=grad_debug,
                            op_type="identity")

We can insert such "identity" operator in the network, like

a = ...
a = identity(a)
b = mx.sym....(a)

You can control the inner behavior of the identity operator, like to print the norm + gradient norm, or saving the activations to the disk.

Another way is to group the activation layer and the final loss function together. Like the following:

mid_level_layer = ...
final_loss = ...mid_level_layer...

out = mx.sym.Group([final_loss, mx.sym.BlockGrad(mid_level_layer)])

We can then use the call_back function to save the second output.

sxjscience on 26 Jan 2017

👍4

Thank you VERY much for your answers.
As I was afraid it was a bit harder than expected but your solution seems workable.

I was aware of weight monitoring but the 2nd solution seems more appropriate for my specific problem.

juliandewit on 26 Jan 2017

Hello @sxjscience , Can you please explain in detail how to call back second output? I am new to MXNet and getting confused with the different approaches available. I have used mx.sym.Group[output_layer, fc_layer]. I would like to get the output of fc_layer which is an intermediate layer to my network.
Please help..