This is not just a question of R, but I use R.
I hope to know how predict.MXFeedForwardModel work. I set three layers linear FeedForward neural networks, it like this :
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = num_hidden_first, name = "fc1")
act1 <- mx.symbol.Activation(fc1, act_type = "relu", name = "relu1")
fc2 <- mx.symbol.FullyConnected(act1, num_hidden = 28, name = "fc2")
act2 <- mx.symbol.Activation(fc2, act_type = "relu", name = "relu2")
fc3 <- mx.symbol.FullyConnected(act2, num_hidden = 1, name = "fc3")
mlp <- mx.symbol.LinearRegressionOutput(fc3, name = "mlp")
It outputs three sets variable coefficients. Each set contains weight
and bias
.
dim(model$arg.params$fc1_weight) `39*64`
length(model$arg.params$fc1_bias) `64`
dim(model$arg.params$fc2_weight) `39*28`
length(model$arg.params$fc2_bias) `28`
dim(model$arg.params$fc3_weight) `39*1`
dim(model$arg.params$fc3_bias) `1`
I try to manually substitution confficients to predict the y^, make the result equal to the function predict(model, data)
. Try the way like this:
mx.predict <- function(model, data) {
W1 <- as.array(model$arg.params$fc1_weight)
b1 <- as.array(model$arg.params$fc1_bias)
W2 <- as.array(model$arg.params$fc2_weight)
b2 <- as.array(model$arg.params$fc2_bias)
W3 <- as.array(model$arg.params$fc3_weight)
b3 <- as.array(model$arg.params$fc3_bias)
data <- as.matrix(data)
pred <- data %*% W1
for (i in length(b1)) {
pred[, i] <- pred[, i] + b1[i]
}
pred <- pred %*% W2
for (i in length(b2)) {
pred[, i] <- pred[, i] + b2[i]
}
pred <- pred %*% W3
for (i in length(b3)) {
pred[, i] <- pred[, i] + b3[i]
}
return(pred)
}
But the result is far greater than the predict(model, data)
. I know that wrong. But please tell me which way to use the coefficient, to get the correct results ?
I try to view the predict.MXFeedForwardModel code, but found that its core part seems to involve c ++, I am not familiar with c ++.
Who can help me understand how predict.MXFeedForwardModel works, and help me get the right result with the R function . Thanks!
I think you forgot the application of the activation function. Since you use "relu", you should take something like pmax(0, pred)
for act1 and act2.
Yes, I forgot that. I think that's a reason, and will try to add activation function. Thank you for your help!
@GuilongZh Please try the code below. I think 1.639859e-05
is acceptable.
library(mxnet)
data(BostonHousing, package="mlbench")
train.ind <- seq(1, 506, 3)
train.x <- data.matrix(BostonHousing[train.ind, -14])
train.y <- BostonHousing[train.ind, 14]
test.x <- data.matrix(BostonHousing[-train.ind, -14])
test.y <- BostonHousing[-train.ind, 14]
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 20, name = "fc1")
act1 <- mx.symbol.Activation(fc1, act_type = "relu", name = "relu1")
fc2 <- mx.symbol.FullyConnected(act1, num_hidden = 1, name = "fc2")
mlp <- mx.symbol.LinearRegressionOutput(fc2, name = "mlp")
mx.set.seed(0)
model <- mx.model.FeedForward.create(mlp, X=train.x, y=train.y,
ctx=mx.cpu(), num.round=50, array.batch.size=20,
learning.rate=2e-6, momentum=0.9, eval.metric=mx.metric.rmse)
preds <- predict(model, train.x)
dim(model$arg.params$fc1_weight)
dim(model$arg.params$fc1_bias)
dim(model$arg.params$fc2_weight)
dim(model$arg.params$fc2_bias)
W1 <- as.array(model$arg.params$fc1_weight)
b1 <- as.array(model$arg.params$fc1_bias)
W2 <- as.array(model$arg.params$fc2_weight)
b2 <- as.array(model$arg.params$fc2_bias)
data <- as.matrix(train.x)
pred <- data %*% W1
for (i in length(b1)) {
pred[, i] <- pred[, i] + b1[i]
}
pred = matrix(pmax(0, pred), nrow = nrow(pred))
pred <- pred %*% W2
for (i in length(b2)) {
pred[, i] <- pred[, i] + b2[i]
}
sum((pred - t(preds))^2)
# [1] 1.639859e-05
Most helpful comment
I think you forgot the application of the activation function. Since you use "relu", you should take something like
pmax(0, pred)
for act1 and act2.