Incubator-mxnet: Got random results when training cifar10 in R

Created on 10 Mar 2016 · 4Comments · Source: apache/incubator-mxnet

I trained the cifar10 in R according to the implementation of python in:
https://github.com/dmlc/mxnet/tree/master/example/image-classification.

My code is in:
https://github.com/ziyeqinghan/mxnet/tree/master/R-package/demo/image-classification

However, every time I run the command Rscript train_cifar10.R --gpus 0, the performance would be very different. The following is some of different performance.

The Train-accuracy and Validation-accuracy were low in the begining and didn't change much after several iterations. Part of the result is:

Batch [50] Train-accuracy=0.09546875
...
Batch [350] Train-accuracy=0.0966294642857143
[10] Train-accuracy=0.0969269501278772
[10] Validation-accuracy=0.0999599358974359

The Train-accuracy went higher after several iterations, but the Validation-accuracy went lower. Part of the result is:

[5] Train-accuracy=0.820612212276215
[5] Validation-accuracy=0.478866185897436
...
[6] Train-accuracy=0.839002403846154
[6] Validation-accuracy=0.328826121794872

The Train-accuracy and Validation-accuracy both went high after several iterations, which is expected. Part of the result is:

Batch [50] Train-accuracy=0.92484375                                                                                                                                      
...                                                                                                                         
Batch [350] Train-accuracy=0.928973214285714                                                                                                                              
[18] Train-accuracy=0.929008152173913                                                                                                                                     
[18] Validation-accuracy=0.837740384615385

Also I saved the trained model in python. Then, I loaded the the model and continued training it in R and found the similar problem.

Since the python code works fine, I doubt there maybe some bugs in the implementation of R codes. I will appreciate it if anyone can give me some help or suggestions.

Thanks so much!

Source

ziyeqinghan

All 4 comments

That is why I am using the python interfaces now for my project,sigh!

firearasi on 11 Mar 2016

👎3

Working on a paper for WABI conference.

I will come back to this after paper submission. You have my word.

thirdwing on 11 Mar 2016

👍1

yeah same with C/C++ interfaces too. I tried with it and just started python interface. even most popular rmsprop optimizer is not implemented and I had to reverse engineering python interface to get exactly same functionality with it and dmlc/mxnetcpp is practically unusable :( lack of support for those makes mxnet users will start looking for alternative solution. even for contributors.

christallire on 16 Mar 2016

I have pushed the cifar 10 example. The results using lenet is as follow:

Rscript train_cifar10.R --cpu=TRUE --network=lenet 
Loading required package: mxnet
Loading required package: methods
Loading required package: argparse
Loading required package: proto
[1] "Network used: lenet"
[01:06:27] src/io/iter_image_recordio.cc:209: ImageRecordIOParser: data/cifar10/train.rec, use 1 threads for decoding..
[01:06:27] src/io/./iter_normalize.h:103: Load mean image from data/cifar10/mean.bin
[01:06:27] src/io/iter_image_recordio.cc:68: Loaded ImageList from data/cifar10/test.lst 10000 Image records
[01:06:27] src/io/iter_image_recordio.cc:209: ImageRecordIOParser: data/cifar10/test.rec, use 1 threads for decoding..
[01:06:27] src/io/./iter_normalize.h:103: Load mean image from data/cifar10/mean.bin
[1] "Computing with CPU"
Start training with 1 devices
Batch [50] Train-accuracy=0.25234375
Batch [100] Train-accuracy=0.284609375
Batch [150] Train-accuracy=0.302239583333333
Batch [200] Train-accuracy=0.3121875
Batch [250] Train-accuracy=0.32134375
Batch [300] Train-accuracy=0.328046875
Batch [350] Train-accuracy=0.332924107142857
[1] Train-accuracy=0.336237980769231
[1] Validation-accuracy=0.358880537974684

thirdwing on 15 Oct 2016

Was this page helpful?

0 / 5 - 0 ratings