If I run just 1 round of an MXNET model training with mx.model.FeedForward.create I get NaN as a training error. Is this for a purpose?
Why does one need to run just 1 round? In my case I make reinforcement learning with Q-function approximation, where I need to run a batch of examples through the network just one time.
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mxnet_0.10.1 pryr_0.1.3 quantregForest_1.3-6 RColorBrewer_1.1-2 randomForest_4.6-12 ggjoy_0.4.0 ggridges_0.4.1
[8] DT_0.2 caret_6.0-77 lattice_0.20-35 FSelector_0.21 scales_0.5.0 nnet_7.3-12 infotheo_1.2.0
[15] cluster_2.0.6 forecast_8.2 gridExtra_2.3 kableExtra_0.6.1 knitr_1.17 rmarkdown_1.8 markdown_0.8
[22] TTR_0.23-2 tseries_0.10-42 ggplot2_2.2.1 magrittr_1.5 data.table_1.10.4-3
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 class_7.3-14 rprojroot_1.2 rstudioapi_0.7 DRR_0.0.2 prodlim_1.6.1 lubridate_1.7.1 xml2_1.1.1
[9] codetools_0.2-15 splines_3.4.0 mnormt_1.5-5 robustbase_0.92-8 RcppRoll_0.2.2 jsonlite_1.5 entropy_1.2.1 rJava_0.9-9
[17] broom_0.4.3 ddalpha_1.3.1 kernlab_0.9-25 sfsmisc_1.1-1 DiagrammeR_0.9.2 readr_1.1.1 compiler_3.4.0 httr_1.3.1
[25] backports_1.1.1 assertthat_0.2.0 Matrix_1.2-9 lazyeval_0.2.1 visNetwork_2.0.1 htmltools_0.3.6 tools_3.4.0 bindrcpp_0.2
[33] igraph_1.1.2 gtable_0.2.0 glue_1.2.0 reshape2_1.4.2 dplyr_0.7.4 Rcpp_0.12.14 rgexf_0.15.3 fracdiff_1.4-2
[41] nlme_3.1-131 iterators_1.0.8 psych_1.7.8 lmtest_0.9-35 timeDate_3042.101 gower_0.1.2 stringr_1.2.0 rvest_0.3.2
[49] RWekajars_3.9.1-5 XML_3.98-1.9 DEoptimR_1.0-8 MASS_7.3-47 zoo_1.8-0 ipred_0.9-6 hms_0.4.0 parallel_3.4.0
[57] quantmod_0.4-11 curl_3.0 downloader_0.4 rpart_4.1-11 stringi_1.1.6 Rook_1.1-1 foreach_1.4.3 RWeka_0.4-36
[65] lava_1.5.1 rlang_0.1.4 pkgconfig_2.0.1 evaluate_0.10.1 purrr_0.2.4 bindr_0.1 recipes_0.1.1 htmlwidgets_0.9
[73] CVST_0.2-1 tidyselect_0.2.3 plyr_1.8.4 R6_2.2.2 dimRed_0.1.0 foreign_0.8-67 withr_2.1.0 xts_0.10-0
[81] survival_2.41-3 tibble_1.3.4 viridis_0.4.0 grid_3.4.0 influenceR_0.1.0 ModelMetrics_1.1.0 digest_0.6.12 tidyr_0.7.2
[89] brew_1.0-6 stats4_3.4.0 munsell_0.4.3 viridisLite_0.2.0 quadprog_1.5-5
Console:
Start training with 1 devices
[1] Train-mse=NaN
library(mxnet)
hidden_u_1 <- 10
activ_hidden_1 <- 'tanh'
hidden_u_2 <- 1
learn_rate <- 0.001
initializer <- mx.init.uniform(1)
optimizer <- 'rmsprop' #sgd
loss <- mx.metric.mse
device.cpu <- mx.cpu()
mini_batch <- 64 #8
rounds <- 1 #2
## data symbols
nn_data <- mx.symbol.Variable('data')
nn_label <- mx.symbol.Variable('label')
## first fully connected layer
fc1 <- mx.symbol.FullyConnected(data = nn_data
, num_hidden = hidden_u_1)
activ1 <- mx.symbol.Activation(data = fc1, act.type = activ_hidden_1)
## second fully connected layer
fc2 <- mx.symbol.FullyConnected(data = activ1, num_hidden = hidden_u_2)
q_func <- mx.symbol.LinearRegressionOutput(data = fc2, label = nn_label, name = 'regr')
# initialize NN
train.x <- matrix(rnorm(mini_batch * 10, 0, 1), ncol = 10)
train.y = rnorm(64, 0, 1)
nn_model <- mx.model.FeedForward.create(
symbol = q_func,
X = train.x,
y = train.y,
ctx = device.cpu,
num.round = rounds,
array.batch.size = mini_batch, #60
optimizer = optimizer,
eval.metric = loss,
learning.rate = learn_rate,
initializer = initializer
)
If I use 2 or more rounds, or minibatch of the size smaller than the number of samples in my dataset, I get a numeric value of train error.
Any updates on this question?
@jeremiedb would you help on this?
This looks like a bug. There should be a reset of the iterator prior to the forward-backward loop instead of the current reset that is made after exhausting the iterator. I can fix this in the current opened pull request.
@alexmosc : this bug correction will however be applied on current master version, for which there's currently no recent Windows package built. Building from source on Windows is a bit of a pain, so easiest might be to run through Linux or use a docker image on Windows.
Building from source on Windows is a bit of a pain, so easiest might be to run through Linux or use a docker image on Windows.
I can definitely try a virtual machine with Ubuntu at my PC. Thank you.
@sandeep-krishnamurthy Please add the labels - R, Bug.
@jeremiedb is the fix for this bug merged? Can you please link the PR with the bug fix to this issue.
Bug to be fixed by this open PR #9803
@alexmosc can you verify the bug fix and close the issue accordingly.
Thank you!
This is going to be closed as fixed.
Alexey
Most helpful comment
This looks like a bug. There should be a reset of the iterator prior to the forward-backward loop instead of the current reset that is made after exhausting the iterator. I can fix this in the current opened pull request.
@alexmosc : this bug correction will however be applied on current master version, for which there's currently no recent Windows package built. Building from source on Windows is a bit of a pain, so easiest might be to run through Linux or use a docker image on Windows.