Incubator-mxnet: Why does running 1 round of an MXNET model training produce Train-mse=NaN?

Created on 9 Jan 2018 · 8Comments · Source: apache/incubator-mxnet

If I run just 1 round of an MXNET model training with mx.model.FeedForward.create I get NaN as a training error. Is this for a purpose?

Why does one need to run just 1 round? In my case I make reinforcement learning with Q-function approximation, where I need to run a batch of examples through the network just one time.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mxnet_0.10.1         pryr_0.1.3           quantregForest_1.3-6 RColorBrewer_1.1-2   randomForest_4.6-12  ggjoy_0.4.0          ggridges_0.4.1      
 [8] DT_0.2               caret_6.0-77         lattice_0.20-35      FSelector_0.21       scales_0.5.0         nnet_7.3-12          infotheo_1.2.0      
[15] cluster_2.0.6        forecast_8.2         gridExtra_2.3        kableExtra_0.6.1     knitr_1.17           rmarkdown_1.8        markdown_0.8        
[22] TTR_0.23-2           tseries_0.10-42      ggplot2_2.2.1        magrittr_1.5         data.table_1.10.4-3 

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2   class_7.3-14       rprojroot_1.2      rstudioapi_0.7     DRR_0.0.2          prodlim_1.6.1      lubridate_1.7.1    xml2_1.1.1        
 [9] codetools_0.2-15   splines_3.4.0      mnormt_1.5-5       robustbase_0.92-8  RcppRoll_0.2.2     jsonlite_1.5       entropy_1.2.1      rJava_0.9-9       
[17] broom_0.4.3        ddalpha_1.3.1      kernlab_0.9-25     sfsmisc_1.1-1      DiagrammeR_0.9.2   readr_1.1.1        compiler_3.4.0     httr_1.3.1        
[25] backports_1.1.1    assertthat_0.2.0   Matrix_1.2-9       lazyeval_0.2.1     visNetwork_2.0.1   htmltools_0.3.6    tools_3.4.0        bindrcpp_0.2      
[33] igraph_1.1.2       gtable_0.2.0       glue_1.2.0         reshape2_1.4.2     dplyr_0.7.4        Rcpp_0.12.14       rgexf_0.15.3       fracdiff_1.4-2    
[41] nlme_3.1-131       iterators_1.0.8    psych_1.7.8        lmtest_0.9-35      timeDate_3042.101  gower_0.1.2        stringr_1.2.0      rvest_0.3.2       
[49] RWekajars_3.9.1-5  XML_3.98-1.9       DEoptimR_1.0-8     MASS_7.3-47        zoo_1.8-0          ipred_0.9-6        hms_0.4.0          parallel_3.4.0    
[57] quantmod_0.4-11    curl_3.0           downloader_0.4     rpart_4.1-11       stringi_1.1.6      Rook_1.1-1         foreach_1.4.3      RWeka_0.4-36      
[65] lava_1.5.1         rlang_0.1.4        pkgconfig_2.0.1    evaluate_0.10.1    purrr_0.2.4        bindr_0.1          recipes_0.1.1      htmlwidgets_0.9   
[73] CVST_0.2-1         tidyselect_0.2.3   plyr_1.8.4         R6_2.2.2           dimRed_0.1.0       foreign_0.8-67     withr_2.1.0        xts_0.10-0        
[81] survival_2.41-3    tibble_1.3.4       viridis_0.4.0      grid_3.4.0         influenceR_0.1.0   ModelMetrics_1.1.0 digest_0.6.12      tidyr_0.7.2       
[89] brew_1.0-6         stats4_3.4.0       munsell_0.4.3      viridisLite_0.2.0  quadprog_1.5-5

Console:

Start training with 1 devices
[1] Train-mse=NaN

library(mxnet)

hidden_u_1 <- 10
activ_hidden_1 <- 'tanh'

hidden_u_2 <- 1

learn_rate <- 0.001

initializer <- mx.init.uniform(1)

optimizer <- 'rmsprop' #sgd

loss <- mx.metric.mse

device.cpu <- mx.cpu()

mini_batch <- 64 #8

rounds <- 1 #2


## data symbols

nn_data <- mx.symbol.Variable('data')
nn_label <- mx.symbol.Variable('label')


## first fully connected layer

fc1 <- mx.symbol.FullyConnected(data = nn_data
                                , num_hidden = hidden_u_1)

activ1 <- mx.symbol.Activation(data = fc1, act.type = activ_hidden_1)

## second fully connected layer

fc2 <- mx.symbol.FullyConnected(data = activ1, num_hidden = hidden_u_2)

q_func <- mx.symbol.LinearRegressionOutput(data = fc2, label = nn_label, name = 'regr')


# initialize NN

train.x <- matrix(rnorm(mini_batch * 10, 0, 1), ncol = 10)

train.y = rnorm(64, 0, 1)

nn_model <- mx.model.FeedForward.create(
     symbol = q_func,
     X = train.x,
     y = train.y,
     ctx = device.cpu,
     num.round = rounds,
     array.batch.size = mini_batch, #60
     optimizer = optimizer,
     eval.metric = loss,
     learning.rate = learn_rate,
     initializer = initializer
)

What have you tried to solve it?

If I use 2 or more rounds, or minibatch of the size smaller than the number of samples in my dataset, I get a numeric value of train error.

Bug R

Source

alexmosc

Most helpful comment

This looks like a bug. There should be a reset of the iterator prior to the forward-backward loop instead of the current reset that is made after exhausting the iterator. I can fix this in the current opened pull request.

@alexmosc : this bug correction will however be applied on current master version, for which there's currently no recent Windows package built. Building from source on Windows is a bit of a pain, so easiest might be to run through Linux or use a docker image on Windows.