I followed the example in Neural Network with MXNet in Five Minutes tutorial:
https://github.com/apache/incubator-mxnet/blob/master/R-package/vignettes/fiveMinutesNeuralNetwork.Rmd
I tried the Regression example code using the Symbol system, and also I tried another routine using mx.mlp function. Based on the documentation, these two ways should be the same as mx.mlp is built on the Symbol system. However, the results are very different as shown by the error from testing data, i.e. 7.80 vs 23.86. The code is shown below.
I noticed an earlier issue: https://github.com/apache/incubator-mxnet/issues/1692. A similar problem was reported. The response suggested adding a extra hidden layer in the symbol system. Indeed, that made the two testing errors similar. A few questions here:
(1) I am not sure why adding a extra hidden layer is needed in the symbol system way, and why that extra layer made the model much worse (much bigger error)?
(2) how to implement the original regression example using mx.mlp instead of symbol system? As we saw above that calling mx.mlp with hidden_node=1is equivalent to adding a extra hidden layer in symbol system. We would need to remove that hidden layer by hidden_node=0, but mx.mlp function does not accept this and gave us error message.
I hope the MXNet authors or some expert users can help me on these issues. Thank you.
data(BostonHousing, package="mlbench")
train.ind <- seq(1, 506, 3)
train.x <- data.matrix(BostonHousing[train.ind, -14])
train.y <- BostonHousing[train.ind, 14]
test.x <- data.matrix(BostonHousing[-train.ind, -14])
test.y <- BostonHousing[-train.ind, 14]
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)
lro <- mx.symbol.LinearRegressionOutput(fc1)
mx.set.seed(0)
model <- mx.model.FeedForward.create(lro, X=train.x, y=train.y, ctx=mx.cpu(), num.round=50, array.batch.size=20, learning.rate=2e-6, momentum=0.9, eval.metric=mx.metric.rmse)
preds <- predict(model, test.x)
sqrt(mean((preds-test.y)^2))
sqrt(mean((preds-test.y)^2))
[1] 7.800502356
model0 <- mx.mlp(train.x, train.y, hidden_node=1, out_node=1, out_activation="rmse",num.round=50, array.batch.size=20, learning.rate=2e-6, momentum=0.9, eval.metric=mx.metric.rmse)
preds0 <- predict(model0, test.x)
sqrt(mean((preds0-test.y)^2))
sqrt(mean((preds0-test.y)^2))
[1] 23.85977502
You can notice the difference in the resulting models by looking at their graph representation:
graph.viz(model$symbol)
graph.viz(model0$symbol)
The resulting graphs highlight this extra hidden layer in the mx.mlp you are referring to.
The reason I see is that mx.mlp has been designed as an helper for MLP models and assumes that there is at least one hidden layer. The mx.mlp would need a minor adaptation to support 0 hidden layer, which would be equivalent to linear regression problem. I would however question the interest for the usage of a deep learning engine for a linear regression.
Why the error is larger is that the hidden layer having only 1 hidden unit, all variables are mapped to a single variable to which a tanh transform is applied. Then, a linear regression on that single tanh transformed variable is applied. Such setup happens to be more restrictive than applying a linear regression on the 13 variables directly.
Given the minimal effort to build the symbol of a MLP model, I personnally favor the usage of mx.model.Feedforward.create().
@jeremiedb is correct. The mx.mlp is a helper function around the symbol system to build MLP models with at least one hidden layer.
I have written some regression examples using MxNetR recently, and I will add other advanced examples later.
https://github.com/xup6fup/MxNetR-examples
I am closing this now. Feel free to reopen.
Most helpful comment
I have written some regression examples using MxNetR recently, and I will add other advanced examples later.
https://github.com/xup6fup/MxNetR-examples