Cntk: Shorten training time as to Fast/Faster R-CNN without any changes on algorithms.

Created on 15 Sep 2017  路  12Comments  路  Source: microsoft/CNTK

We'd like to use Fast/Faster R-CNN, and it takes about 30 minutes for a bunch of images under NC6 Azure environment.

As far as I checked, there seemed to be no tremendous improvements regarding training time only when I changed the environment(NCxx). i.e. It also took about 30 minutes with those kind of procedures under NC12 or NC24.

Questions:

  • [Scale-up strategy] If we want to shorten the training time for those procedures under NC12 or NC24, what kind of parameter setting is needed?
  • [Other strategies] I wonder if we can shorten training time with other kind of setting etc..
pull in progress

Most helpful comment

We know Faster R-CNN's speed can be improved by writing custom C++ layers rather than Python layers, and use GPU implementation for non-max suppression. This is on-going work and we will gradually integrate.

All 12 comments

We know Faster R-CNN's speed can be improved by writing custom C++ layers rather than Python layers, and use GPU implementation for non-max suppression. This is on-going work and we will gradually integrate.

@cha-zhang Thanks. Can we use multi-GPU processes with higher NCxx at present? Or, should we wait for the above implementation?

Multi-GPU would certainly help if you need it immediately. NCCL 2 is integrated in v2.2 (releasing today), so multi-machine training should work well.

@cha-zhang Really?:) Once the releasing is completed, can you share the tutorial link here?

Will post release notes on main page once it's out. Or, follow us on Twitter @mscntk.

@cha-zhang @pkranen I tried to train Fast R-CNN as follows:

mpiexec -n 2 python A2_RunWithPyModel.py

If NC24 is selected, 2-GPU are used then, but the processing time is almost same as normal processing:
python A2_RunWithPyModel.py
In looking at the console log, each GPU seemed to calculate the same process in parallel, and they are just parallel-simulation, and could be waste of resource?

1) Anyway, can we shorten processing time with mpiexec command for this python module?
2) Also, the situation is same for FasterRCNN_train.py ?

This script is not ready for distributed learning. Check scripts like this one:
https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Classification/ResNet/Python/TrainResNet_CIFAR10_Distributed.py
to see how to make things distributed.

In accordance with the comment as above, I tried to proceed Fast R-CNN with distributed version, at first. Here are the conditions. Related script is this

Conditions:

  • Environment: NC24(w/4 GPU), Windows sv in Azure
  • original script in Fast R-CNN: A2_RunWithPyModel.py
## original learner
learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
  • changed script in Fast R-CNN: A2_RunWithPyModel_distributed.py
## preparation of distributed learning
from cntk import distributed
 :
## original learner was re-named to local_learner, which is taken over into data_parallel_distributed_learner
local_learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
learner = distributed.data_parallel_distributed_learner(local_learner, num_quantization_bits=1, distributed_after=1)
  • samples sizes(Grocery: default sample in this module)

    • training samples: 25

    • test samples: 5

  • epoch numbers: 20

  • command: mpiexec -n 4 python A2_RunWithPyModel_distributed.py

My Questions:

1) As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.
2) Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?
3) Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);
  • Original log
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) 位 python A2_RunWithPyModel.py
--------------------------------------------------------------
2017-09-23 12:49:33
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Selected GPU[1] Tesla K80 as the process wide default device.
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Learning rate per 1 samples: 1e-05
Momentum per 1 samples: 0.9048374180359595
Finished Epoch[1 of 20]: [Training] loss = 3153.350937 * 25, metric = 21.16% * 25 10.572s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 251.568496 * 25, metric = 2.41% * 25 6.971s (  3.6 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 147.409863 * 25, metric = 1.94% * 25 6.982s (  3.6 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 101.552354 * 25, metric = 1.70% * 25 6.965s (  3.6 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 79.782490 * 25, metric = 1.37% * 25 6.994s (  3.6 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 68.687617 * 25, metric = 1.25% * 25 6.964s (  3.6 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 60.549863 * 25, metric = 1.11% * 25 6.986s (  3.6 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 54.716392 * 25, metric = 0.99% * 25 6.976s (  3.6 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 50.048423 * 25, metric = 0.97% * 25 7.013s (  3.6 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 40.542100 * 25, metric = 0.71% * 25 7.001s (  3.6 samples/s);
Learning rate per 1 samples: 1e-06
Finished Epoch[11 of 20]: [Training] loss = 35.926621 * 25, metric = 0.60% * 25 6.995s (  3.6 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.639031 * 25, metric = 0.56% * 25 7.010s (  3.6 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 33.962507 * 25, metric = 0.54% * 25 6.990s (  3.6 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 33.616445 * 25, metric = 0.55% * 25 7.003s (  3.6 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 33.219561 * 25, metric = 0.53% * 25 7.016s (  3.6 samples/s);
Learning rate per 1 samples: 1e-07
Finished Epoch[16 of 20]: [Training] loss = 32.881428 * 25, metric = 0.53% * 25 7.012s (  3.6 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 32.816619 * 25, metric = 0.53% * 25 7.009s (  3.6 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 32.781428 * 25, metric = 0.52% * 25 6.999s (  3.6 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 32.750801 * 25, metric = 0.52% * 25 7.003s (  3.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 32.719143 * 25, metric = 0.52% * 25 7.016s (  3.6 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) 位 python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:52:19
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:813: RuntimeWarning: overflow encountered in exp
  e = np.exp(w)
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:814: RuntimeWarning: invalid value encountered in true_divide
  dist = e / np.sum(e, axis=1)[:, np.newaxis]
Number of rois before non-maxima surpression: 3183
Number of rois  after non-maxima surpression: 461
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 0.5000
AP for         mustard = 1.0000
Mean AP = 0.8524
DONE.
  • Distributed log
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) 位 mpiexec -n 4 python A2_RunWithPyModel_distributed.py
Selected GPU[0] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[2] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[3] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[1] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.065s ( 13.1 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.004s ( 13.3 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.056s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.000s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.035s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.052s ( 13.1 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.040s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.768s (  2.3 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.038s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.043s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.007s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 11.277s (  2.2 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.060s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.047s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.064s ( 13.1 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 9.772s (  2.6 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.070s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) 位 python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:48:18
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
Number of rois before non-maxima surpression: 3184
Number of rois  after non-maxima surpression: 487
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 1.0000
AP for         mustard = 1.0000
Mean AP = 0.8837
DONE.

@kyoro1 Thanks for the detailed info. To answer your questions:

As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.

It is indeed a little bit strange, although your result seems ok.

Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?

First epoch CNTK does auto-tuning of convolution algorithms, plus overhead of allocating buffers, validate model architecture, etc.

Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

With such a small data set, fluctuation is normal.

@cha-zhang Thanks for your comment. Then, trial as above is almost expected except the log architecture, isn't it?
Also, do you have a plan to develop distributed Fast R-CNN scripts in near future? Or, should I send a pull-request to master?

Not in the short term that we will work on this. It would be great if you could send us a PR. :)

Here is the 1st step for Fast R-CNN with distributed learning. https://github.com/Microsoft/CNTK/commit/1312bf83574be29d5ee882165a4a7605a99ad7cc

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Arminea picture Arminea  路  19Comments

GuntaButya picture GuntaButya  路  16Comments

youssefhb picture youssefhb  路  27Comments

ddobric picture ddobric  路  15Comments

haryngod picture haryngod  路  17Comments