Cntk: Kernel dimension for convolution

Created on 18 Jul 2016 · 19Comments · Source: microsoft/CNTK

Hi,
I want to build a simple convolutional network for binary classification, CIFAR-10 like, but I have a problem with ConvReLULayer function from Macros.ndl. I've tried to rewrite it to BrainScript but I'm obviously missing something because an exception occurred.
EXCEPTION occurred: Convolution operation requires that kernel dim 16 <= input dim 3.

My conv.bs contains

imageW          = 64
    imageH          = 64
    inputChannels   = 3
    labelDim        = 2

    features = ImageInput(imageW, imageH, inputChannels, tag = "feature", imageLayout="cudnn")
    featOffs = Constant(128)
    featScaled = Minus(features, featOffs)
    labels = Input(labelDim, tag='label')

    # conv1
    kW1 = 5
    kH1 = 5
    cMap1 = 16 # number of feature maps
    inWCount1 = kW1 * kH1 * inputChannels
    hStride1 = 1
    vStride1 = 1
    wScale = 0.0043
    conv1 = ConvReLULayer(featScaled, cMap1, inWCount1, kW1, kH1, hStride1, vStride1, wScale)

My macro.bs contains

ConvW(outMap, inWCount, wScale) = [
    W = Parameter(outMap, inWCount, init = "uniform", initValueScale=wScale, initOnCPUOnly=true)
].W

ConvB(outMap) = [
    b = ParameterTensor(1 : 1 : outMap) 
].b

ConvReLULayer(inp, outMap, inWCount, kW, kH, hStride, vStride, wScale) = [
    W = ConvW(outMap, inWCount, wScale)
    b = ConvB(outMap)
    c = Convolution(W, inp, (kW : kH : outMap), stride=(hStride: vStride : outMap), imageLayout = "cudnn")
    z = Plus(c, b);
    y = RectifiedLinear(z);
].y

Original ConvReLULayer function

ConvReLULayer(inp, outMap, inWCount, kW, kH, hStride, vStride, wScale, bValue)
[
    W = LearnableParameter(outMap, inWCount, init = Gaussian, initValueScale = wScale)
    b = ImageParameter(1, 1, outMap, init = fixedValue, value = bValue, imageLayout = $imageLayout$)
    c = Convolution(W, inp, kW, kH, outMap, hStride, vStride, zeroPadding = true, imageLayout = $imageLayout$)
    p = Plus(c, b)
    y = RectifiedLinear(p)
]

Source

Arminea

👍1

All 19 comments

Hi Arminea,

The issue is related to the convolutional shape you're using in the ConvReLULayer.
You're using a convolution kernel of (kW : kH : outMap), for conv1 let's see what it gives you :

Input : [64 : 64 : 3]
Convolution Kernel [5 : 5 : 16]
=> If you think about that, you should find that your convolution kernel is trying to extract more depth that you have [16 vs 3].

In order to fix it, you should use a ND-Convolution which uses the inMap parameter :

ConvReLULayer(inp, inMap, outMap, inWCount, kW, kH, hStride, vStride, wScale) = [
    W = ConvW(outMap, inWCount, wScale)
    b = ConvB(outMap)
    c = Convolution(W, inp, (kW : kH : inMap), stride=(hStride: vStride : inMap), imageLayout = "cudnn")
    z = Plus(c, b);
    y = RectifiedLinear(z);
].y

Let us know if it fixes your issue.
Morgan

mfuntowicz on 19 Jul 2016

Hi,
thanks for answering so quickly :) I'll try it later today. Just curious, what should be the value of inMap? I suppose 3, am I right? And I was not sure about b = ParameterTensor(1 : 1 : outMap) in ConvB function. Is it ok?

Tereza

Arminea on 19 Jul 2016

👍1

Hi,

Yes you're right, for the first layer, inMap will be 3. More generally, the Nth layer inMap will be the outMap of the N-1 :).

For the bias parameter, you've a predefined macro in BrainScript :

BS.Parameters.BiasParam(outDim) which is defined as follow :

BiasParam (dim) = ParameterTensor ((dim), init='fixedValue', value=0.0)

So you can replace your bias declaration with :

b = BS.Parameters.BiasParam(1:1:outMap)

Morgan

mfuntowicz on 19 Jul 2016

👍1

I've tried your solution. It works ... sort of. I give you a log

Validating network. 27 nodes to process in pass 1.

Validating --> labels = InputValue() :  -> [2 x *]
Validating --> ol.W = LearnableParameter() :  -> [2 x 128]
Validating --> h1.W = LearnableParameter() :  -> [128 x 16 x 16 x 32]
Validating --> conv2.W.W = LearnableParameter() :  -> [32 x 400]
Validating --> conv1.W.W = LearnableParameter() :  -> [16 x 75]
Validating --> features = InputValue() :  -> [64 x 64 x 3 x *]
Validating --> featOffs = LearnableParameter() :  -> [1 x 1]
Validating --> featScaled = Minus (features, featOffs) : [64 x 64 x 3 x *], [1 x 1] -> [64 x 64 x 3 x *]
Validating --> conv1.c = Convolution (conv1.W.W, featScaled) : [16 x 75], [64 x 64 x 3 x *] -> [64 x 64 x 1 x *]
Validating --> conv1.b.b = LearnableParameter() :  -> [1 x 1 x 16]
Validating --> conv1.z = Plus (conv1.c, conv1.b.b) : [64 x 64 x 1 x *], [1 x 1 x 16] -> [64 x 64 x 16 x *]
Validating --> conv1.y = RectifiedLinear (conv1.z) : [64 x 64 x 16 x *] -> [64 x 64 x 16 x *]
Validating --> pool1 = MaxPooling (conv1.y) : [64 x 64 x 16 x *] -> [32 x 32 x 16 x *]
Validating --> conv2.c = Convolution (conv2.W.W, pool1) : [32 x 400], [32 x 32 x 16 x *] -> [32 x 32 x 1 x *]
Validating --> conv2.b.b = LearnableParameter() :  -> [1 x 1 x 32]
Validating --> conv2.z = Plus (conv2.c, conv2.b.b) : [32 x 32 x 1 x *], [1 x 1 x 32] -> [32 x 32 x 32 x *]
Validating --> conv2.y = RectifiedLinear (conv2.z) : [32 x 32 x 32 x *] -> [32 x 32 x 32 x *]
Validating --> pool2.p = Pooling (conv2.y) : [32 x 32 x 32 x *] -> [16 x 16 x 32 x *]
Validating --> h1.t = Times (h1.W, pool2.p) : [128 x 16 x 16 x 32], [16 x 16 x 32 x *] -> [128 x *]
Validating --> h1.b = LearnableParameter() :  -> [128 x 1]
Validating --> h1.z = Plus (h1.t, h1.b) : [128 x *], [128 x 1] -> [128 x 1 x *]
Validating --> h1.y = Sigmoid (h1.z) : [128 x 1 x *] -> [128 x 1 x *]
Validating --> ol.z.PlusArgs[0] = Times (ol.W, h1.y) : [2 x 128], [128 x 1 x *] -> [2 x 1 x *]
Validating --> ol.b = LearnableParameter() :  -> [2 x 1]
Validating --> ol.z = Plus (ol.z.PlusArgs[0], ol.b) : [2 x 1 x *], [2 x 1] -> [2 x 1 x *]
Validating --> ce = CrossEntropyWithSoftmax (labels, ol.z) : [2 x *], [2 x 1 x *] -> [1]
Validating --> errs = ErrorPrediction (labels, ol.z) : [2 x *], [2 x 1 x *] -> [1]

Validating network. 16 nodes to process in pass 2.


Validating network, final pass.


conv1.c: using GEMM convolution engine for geometry: Input: 64 x 64 x 3, Output: 64 x 64 x 1, Kernel: 5 x 5 x 3, Map: 1, Stride: 1 x 1 x 3, Sharing: (1), AutoPad: (1), LowerPad: 0, UpperPad: 0.
Validating --> conv1.c = Convolution (conv1.W.W, featScaled) : [16 x 75], [64 x 64 x 3 x *] -> [64 x 64 x 1 x *] FAILED


[CALL STACK]
    > Microsoft::MSR::CNTK::ComputationNetwork::  ValidateNode
    - Microsoft::MSR::CNTK::ComputationNetwork::  ValidateNodes
    - Microsoft::MSR::CNTK::ComputationNetwork::  ValidateNetwork
    - Microsoft::MSR::CNTK::ComputationNetwork::  CompileNetwork
    - Microsoft::MSR::CNTK::ComputationNetwork::  ConstructFromRoots
    - Microsoft::MSR::CNTK::ComputationNetwork::  ComputationNetwork
    - std::make_shared<Microsoft::MSR::CNTK::ComputationNetwork,std::shared_ptr<Microsoft::MSR::ScriptableObjects::IConfigRecord> const & __ptr64>  
    - Microsoft::MSR::ScriptableObjects::MakeRuntimeObject<Microsoft::MSR::CNTK::ComputationNetwork>  
    - <lambda_56a46857eb9bf0001b5ffab01c6f03ed>::  operator  ()
    - std::_Callable_obj<<lambda_56a46857eb9bf0001b5ffab01c6f03ed>,0>::_ApplyX<std::shared_ptr<Microsoft::MSR::ScriptableObjects::Object>,std::shared_ptr<Microsoft::MSR::ScriptableObjects::IConfigRecord>>  
    - std::_Func_impl<std::_Callable_obj<<lambda_56a46857eb9bf0001b5ffab01c6f03ed>,0>,std::allocator<std::_Func_class<std::shared_ptr<Microsoft::MSR::ScriptableObjects::Object>,std::shared_ptr<Microsoft::MSR::ScriptableObjects::IConfigRecord>>>,std::  shared_pt
    - std::_Func_class<std::shared_ptr<Microsoft::MSR::ScriptableObjects::Object>,std::shared_ptr<Microsoft::MSR::ScriptableObjects::IConfigRecord>>::  operator  ()
    - Microsoft::MSR::BS::  Evaluate
    - <lambda_a048d19f5114b6bccccaea8ea1203939>::  operator  ()
    - std::_Callable_obj<<lambda_a048d19f5114b6bccccaea8ea1203939>,0>::_ApplyX<Microsoft::MSR::ScriptableObjects::ConfigValuePtr>  
    - std::_Func_impl<std::_Callable_obj<<lambda_a048d19f5114b6bccccaea8ea1203939>,0>,std::allocator<std::_Func_class<Microsoft::MSR::ScriptableObjects::ConfigValuePtr>>,Microsoft::MSR::ScriptableObjects::ConfigValuePtr>::  _Do_call

EXCEPTION occurred: Convolution weight matrix conv1.W.W should have dimension [1, 75] which is [kernelCount, kernelWidth * kernelHeight * inputChannels]

Arminea on 19 Jul 2016

Can you share your conv1 definition and the macro associated with ?

The conv1.W.W is very strange ...

mfuntowicz on 19 Jul 2016

conv.bs file

# conv1
    kW1 = 5
    kH1 = 5
    cMap1 = 16 # number of feature maps
    inWCount1 = kW1 * kH1 * inputChannels
    hStride1 = 1
    vStride1 = 1
    wScale = 0.0043
    conv1 = ConvNDReLULayer(featScaled, kW1, kH1, inputChannels, inWCount1, cMap1, hStride1, vStride1, wScale, 1)

macros.bs file

ConvNDReLULayer(inp, kW, kH, inMap, inWCount, outMap, hStride, vStride, wScale, bValue) = [
    W = Parameter(outMap, inWCount, init = "uniform", initValueScale=wScale, initOnCPUOnly=true)
    b = ParameterTensor(1 : 1 : outMap) 
    c = Convolution(W, inp, (kW : kH : inMap), stride=(hStride : vStride : inMap),  imageLayout="cudnn")
    # sharing=(true : true : true),
    z = Plus(c, b);
    y = RectifiedLinear(z);
].y

Arminea on 19 Jul 2016

Hi, I've tried it again this morning and the strange second W just went away. It's EXCEPTION occurred: Convolution weight matrix conv1.W should have dimension [1, 75] which is [kernelCount, kernelWidth * kernelHeight * inputChannels] now. I have no idea why.
Here is my whole project if you want to see it https://1drv.ms/f/s!As_qGuBAm_dbhudr98UXaBKczYIw4g. You can run it yourself :)

Arminea on 20 Jul 2016

Anything new? :)

Arminea on 26 Jul 2016

The naming is a bit confusing here. The dimension of the weight matrices for convolution should be:

number of rows = output depth dimension

This is the number of outputs for each pixel = depth of feature map = kernel count = number of output channels.
Your convolution operation will run your input through this many filters. For each of the (width * height) pixel position, you will get this many values, stored in a tensor of shape [width x height x numOutputChannels].

number of cols = number of _parameters_ in each filter kernel

The filter kernel depends on two things: The number of pixels you want it to cover (kernelWidth, kernelHeight) and the depth of the feature map you are processing (inputChannels).

The filter kernel is really a rank-3 tensor, but w.r.t. the weight matrix, only the number of parameters must be specified. That value is the product of these three values kernelWidth * kernelHeight * inputChannels.

frankseide on 26 Jul 2016

I understand your explanation. But isn't it strange that in the log file is a line
Validating --> conv1.W = LearnableParameter() : -> [16 x 75]
and at the end there is an exception ?
Convolution weight matrix conv1.W should have dimension [1, 75] which is [kernelCount, kernelWidth * kernelHeight * inputChannels]

I suppose that conv1.W is the same matrix the whole time, so why is the size different?

Arminea on 26 Jul 2016

Ah. The validation process runs in multiple passes (which is required due to recurrent networks, which you are not using). Only the last pass sets up the actual convolution engine, and that's where the failure is detected.

I have trouble, though, to see what the problem is. A 5 x 5 x 3 kernel indeed has 75 parameters.

@Alexey-Kamenev, do you have an idea what might be wrong?

conv1.c: using GEMM convolution engine for geometry: Input: 64 x 64 x 3, Output: 64 x 64 x 1, Kernel: 5 x 5 x 3, Map: 1, Stride: 1 x 1 x 3, Sharing: (1), AutoPad: (1), LowerPad: 0, UpperPad: 0.
Validating --> conv1.c = Convolution (conv1.W.W, featScaled) : [16 x 75], [64 x 64 x 3 x *] -> [64 x 64 x 1 x *] FAILED

I would expect to see

...  -> [64 x 64 x *16* x *]

frankseide on 26 Jul 2016

In fact you need to specify the mapDims parameter in the a ND Convolution node.

Like that :

c = Convolution(W, inp, (kW:kH:inMap), mapDims=outMap, stride=(hStride:vStride:inMap))

Sorry for the delay :)
Morgan

mfuntowicz on 27 Jul 2016

❤1

I would take that as an opportunity for improvement. The weight matrix' row dimension should be sufficient to specify this.

frankseide on 27 Jul 2016

Making the change now. You can now leave out mapDims, it will default to the row dimension of the weight matirx. Will take a few days to land in master though. Thanks for the feedback!

frankseide on 27 Jul 2016

@mfuntowicz Thanks, that really help :) Now it's working without any exception. Is there any new documentation for BrainScript? I was looking for the Convolution function in the CNTK book and there is just an old example for NDL.

@frankseide Thanks to you too. It would be very helpful :) I can surely wait a few days :)

Arminea on 27 Jul 2016

Convolution() is documented here. I just updated it w.r.t. the change (which will be online soon).

frankseide on 27 Jul 2016

@frankseide Thanks :) ... I believe that's all for now. I'm closing this issue :)

Arminea on 27 Jul 2016

Hi i'm new to CNTK
If i want to change the dimension , should i rebuild it in Visual Studio?
Or if change the config files and ndl files and then run it from command prompt it works?