What I am trying to do is:
I have completed first 4 steps without any issue. On the 5th step, I am not sure what to use. I tried adapt command with a reference node of the criterion node I added using MEL in 4th step. However, it seems to backpropagate and update all the weights and biases. I am not sure if there is any way to prevent this. I thought of setting learningRateMultiplier to 0 using SetProperty in MEL, but it appears that this property is not supported.
Any suggestions? Or, are there other ways of achieving the same behavior?
PS: I saw a new top-level command here, named DoEncoderDecoder. Is that something I can use for this purpose?
Please expect a solution on Tuesday or Wednesday at latest.
An architectural change disabled the adapt command unfortunately. I will be back in office next week and fix if.
Sorry I cannot help earlier.
Get Outlook for iOShttps://aka.ms/o0ukef
On Fri, Jul 15, 2016 at 8:56 AM +0200, "Zp Bappi" <[email protected]notifications@github.com> wrote:
What I am trying to do is:
I have completed first 4 steps without any issue. On the 5th step, I am not sure what to use. I tried adapt command with a reference node of the criterion node I added using MEL in 4th step. However, it seems to backpropagate and update all the weights and biases. I am not sure if there is any way to prevent this. I thought of setting learningRateMultiplier to 0 using SetProperty in MEL, but it appears that this property is not supported.
Any suggestions? Or, are there other ways of achieving the same behavior?
PS: I saw a new top-level command herehttps://github.com/Microsoft/CNTK/wiki/Top-level-commands, named DoEncoderDecoder. Is that something I can use for this purpose?
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/CNTK/issues/672, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP5hliUgoPKIJKL9Vi16ouD696H6AEbxks5qVy8UgaJpZM4JNINc.
@frankseide thank you for the update. I will wait for the changes.
Hi, as usual, we ran into some unexpected complications. This is now under review and testing.
once trained, delete the later half of the network (the decoder part, after the bottleneck layer)
Once it is there, your code would look something like this. Define a new network, and in that definition, you'd have code like this:
featExtNetwork = BS.Network.Load ("YOUR_TRAINED_AE_MODEL")
featExt = BS.Network.CloneFunction (
featExtNetwork.input, # input node that AE model read data from
featExtNetwork.feat, # output node in AE model that holds the desired features
parameters="constant") # says to freeze that part of the network
# define your new network, using featExt() like any old BrainScript function. E.g.
input = Input (...)
features = featExt (input) # this will instantiate a clone of the above network
# and the rest is just BrainScript, e.g.
h = Sigmoid (W_hid * features + b_hid) # whatever your hidden layer looks like
z = W_out * h + b_out
ce = CrossEntropyWithSoftmax (labels, z)
criterionNodes = (ce)
The key is parameters="constant", which will lock all learnable parameters (setting learningRateMultiplier=0) inside so that they won't get updated during further training. It also locks BatchNormalization if you use it (that's where the unexpected complications came in).
Until the code is in master, you can already have a look at the documentation: [https://github.com/Microsoft/CNTK/wiki/CloneFunction]. And if you dare, you can try branch fseide/clonebs, but it may be premature.
@frankseide I dared and built from fseide/clonebs branch. It worked. :)
One finding- if I expose a node as output node from somewhere in the middle of the network, the CloneFunction cannot recognize the node from the loaded network. The network's bottleneck node I was interested in was displaying as L4.y in the log file, even before exposing as outputNodes=(L4.y) from the AE training section. However, after loading the network in another train action with CloneFunction, it was unable to find the node. What I had to do is:
out = Constant(1) .* L4 #L4 is the extracted features layer
#... rest of the code
#...then, at the end
outputNodes = (out)
Then I was able to load the network as CloneFunction(featExtNet.features, featExtNet.out, parameters="constant").
I am not sure whether it is not the proper way to expose a node from a network or is that a known issue, thought I should bring it into your attention. I am closing this issue as it solves the original problem I had.
Thank you very much for your time and very elaborate reply to all the questions I have asked. You rock. :+1:
Super!
The problem with L4.y is that saying network.L4.y is currently not traversed correctly, since the nodes inside a network are no longer true BrainScript records. This is non-trivial to do.
I did, however, create a (somewhat ugly) workaround for this case. Could you try saying L4_y? It matches all dots as _
Works like a charm. Thanks again for the tip. :)
I see a lot of helpful tips and workarounds in issues. For example, someone was trying to load an already trained model for further training with more data (#680). The suggestion by @dongyu888 was to rename the final model model.dnn to model.dnn.0, delete everything else and start training on new data. Amazingly simple solution. I think it would be easier to for people to find these tips if they are included in the documentation wiki or at least compiled as FAQ. It would be unfair to ask more of your time to do this. But this can be community-driven as well. I see a lot of CNTK users hitting a dead-end and finding a workaround from discussions. I believe, they will be more than happy to contribute.
Yes, we need a tips & tricks section, "how do it...".
Hi, this is now in master.
Hi, I created a little "How do I..." text out of this Issue. Thanks again.
Most helpful comment
Hi, as usual, we ran into some unexpected complications. This is now under review and testing.
Once it is there, your code would look something like this. Define a new network, and in that definition, you'd have code like this:
The key is
parameters="constant", which will lock all learnable parameters (settinglearningRateMultiplier=0) inside so that they won't get updated during further training. It also locks BatchNormalization if you use it (that's where the unexpected complications came in).Until the code is in master, you can already have a look at the documentation: [https://github.com/Microsoft/CNTK/wiki/CloneFunction]. And if you dare, you can try branch
fseide/clonebs, but it may be premature.