CNTK 🚀 - Shorten training time as to Fast/Faster R-CNN without any changes on algorithms.

We know Faster R-CNN's speed can be improved by writing custom C++ layers rather than Python layers, and use GPU implementation for non-max suppression. This is on-going work and we will gradually integrate.

cha-zhang on 15 Sep 2017

👍3

@cha-zhang Thanks. Can we use multi-GPU processes with higher NCxx at present? Or, should we wait for the above implementation?

kyoro1 on 15 Sep 2017

Multi-GPU would certainly help if you need it immediately. NCCL 2 is integrated in v2.2 (releasing today), so multi-machine training should work well.

cha-zhang on 15 Sep 2017

@cha-zhang Really?:) Once the releasing is completed, can you share the tutorial link here?

kyoro1 on 15 Sep 2017

Will post release notes on main page once it's out. Or, follow us on Twitter @mscntk.

cha-zhang on 15 Sep 2017

👍1

@cha-zhang @pkranen I tried to train Fast R-CNN as follows:

mpiexec -n 2 python A2_RunWithPyModel.py

If NC24 is selected, 2-GPU are used then, but the processing time is almost same as normal processing:
python A2_RunWithPyModel.py
In looking at the console log, each GPU seemed to calculate the same process in parallel, and they are just parallel-simulation, and could be waste of resource?

1) Anyway, can we shorten processing time with mpiexec command for this python module?
2) Also, the situation is same for FasterRCNN_train.py ?

kyoro1 on 21 Sep 2017

This script is not ready for distributed learning. Check scripts like this one:
https://github.com/Microsoft/CNTK/blob/master/Examples/Image/Classification/ResNet/Python/TrainResNet_CIFAR10_Distributed.py
to see how to make things distributed.

cha-zhang on 21 Sep 2017

In accordance with the comment as above, I tried to proceed Fast R-CNN with distributed version, at first. Here are the conditions. Related script is this

Conditions:

Environment: NC24(w/4 GPU), Windows sv in Azure
original script in Fast R-CNN: A2_RunWithPyModel.py

## original learner
learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)

changed script in Fast R-CNN: A2_RunWithPyModel_distributed.py

## preparation of distributed learning
from cntk import distributed
 :
## original learner was re-named to local_learner, which is taken over into data_parallel_distributed_learner
local_learner = momentum_sgd(frcn_output.parameters, lr_schedule, mm_schedule, l2_regularization_weight=l2_reg_weight)
learner = distributed.data_parallel_distributed_learner(local_learner, num_quantization_bits=1, distributed_after=1)

samples sizes(Grocery: default sample in this module)
- training samples: 25
- test samples: 5
epoch numbers: 20
command: mpiexec -n 4 python A2_RunWithPyModel_distributed.py

My Questions:

1) As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.
2) Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?
3) Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);

Original log

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A2_RunWithPyModel.py
--------------------------------------------------------------
2017-09-23 12:49:33
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Selected GPU[1] Tesla K80 as the process wide default device.
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Learning rate per 1 samples: 1e-05
Momentum per 1 samples: 0.9048374180359595
Finished Epoch[1 of 20]: [Training] loss = 3153.350937 * 25, metric = 21.16% * 25 10.572s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 251.568496 * 25, metric = 2.41% * 25 6.971s (  3.6 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 147.409863 * 25, metric = 1.94% * 25 6.982s (  3.6 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 101.552354 * 25, metric = 1.70% * 25 6.965s (  3.6 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 79.782490 * 25, metric = 1.37% * 25 6.994s (  3.6 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 68.687617 * 25, metric = 1.25% * 25 6.964s (  3.6 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 60.549863 * 25, metric = 1.11% * 25 6.986s (  3.6 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 54.716392 * 25, metric = 0.99% * 25 6.976s (  3.6 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 50.048423 * 25, metric = 0.97% * 25 7.013s (  3.6 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 40.542100 * 25, metric = 0.71% * 25 7.001s (  3.6 samples/s);
Learning rate per 1 samples: 1e-06
Finished Epoch[11 of 20]: [Training] loss = 35.926621 * 25, metric = 0.60% * 25 6.995s (  3.6 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.639031 * 25, metric = 0.56% * 25 7.010s (  3.6 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 33.962507 * 25, metric = 0.54% * 25 6.990s (  3.6 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 33.616445 * 25, metric = 0.55% * 25 7.003s (  3.6 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 33.219561 * 25, metric = 0.53% * 25 7.016s (  3.6 samples/s);
Learning rate per 1 samples: 1e-07
Finished Epoch[16 of 20]: [Training] loss = 32.881428 * 25, metric = 0.53% * 25 7.012s (  3.6 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 32.816619 * 25, metric = 0.53% * 25 7.009s (  3.6 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 32.781428 * 25, metric = 0.52% * 25 6.999s (  3.6 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 32.750801 * 25, metric = 0.52% * 25 7.003s (  3.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 32.719143 * 25, metric = 0.52% * 25 7.016s (  3.6 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:52:19
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:813: RuntimeWarning: overflow encountered in exp
  e = np.exp(w)
C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\cntk_helpers.py:814: RuntimeWarning: invalid value encountered in true_divide
  dist = e / np.sum(e, axis=1)[:, np.newaxis]
Number of rois before non-maxima surpression: 3183
Number of rois  after non-maxima surpression: 461
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 0.5000
AP for         mustard = 1.0000
Mean AP = 0.8524
DONE.

Distributed log

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ mpiexec -n 4 python A2_RunWithPyModel_distributed.py
Selected GPU[0] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[2] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[3] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
Selected GPU[1] Tesla K80 as the process wide default device.
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
Build info:

                Built time: Sep 15 2017 07:42:32
                Last modified date: Fri Sep 15 04:28:56 2017
                Build type: Release
                Build target: GPU
                With 1bit-SGD: yes
                With ASGD: yes
                Math lib: mkl
                CUDA version: 8.0.0
                CUDNN version: 6.0.21
                Build Branch: HEAD
                Build SHA1: 23878e5d1f73180d6564b6f907b14fe5f53513bb
                MPI distribution: Microsoft MPI
                MPI version: 7.0.12437.6
-------------------------------------------------------------------
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.268s (  2.4 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.110s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.065s ( 13.1 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.004s ( 13.3 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.056s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.000s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.035s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.052s ( 13.1 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.040s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 10.768s (  2.3 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.020s ( 13.2 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.038s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.033s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.043s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.069s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.007s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 11.277s (  2.2 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.992s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.060s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.047s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.064s ( 13.1 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.
--------------------------------------------------------------
2017-09-23 12:46:04
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Training Fast R-CNN model for 20 epochs.
Training 54603793 parameters in 6 parameter tensors.
Finished Epoch[1 of 20]: [Training] loss = 4711.033750 * 25, metric = 21.03% * 25 9.772s (  2.6 samples/s);
Finished Epoch[2 of 20]: [Training] loss = 401.341235 * 40, metric = 2.37% * 40 3.016s ( 13.3 samples/s);
Finished Epoch[3 of 20]: [Training] loss = 232.146558 * 40, metric = 1.55% * 40 3.108s ( 12.9 samples/s);
Finished Epoch[4 of 20]: [Training] loss = 201.895728 * 40, metric = 2.42% * 40 3.039s ( 13.2 samples/s);
Finished Epoch[5 of 20]: [Training] loss = 145.298535 * 40, metric = 1.85% * 40 3.032s ( 13.2 samples/s);
Finished Epoch[6 of 20]: [Training] loss = 140.297620 * 40, metric = 2.07% * 40 3.055s ( 13.1 samples/s);
Finished Epoch[7 of 20]: [Training] loss = 79.770465 * 40, metric = 1.12% * 40 3.001s ( 13.3 samples/s);
Finished Epoch[8 of 20]: [Training] loss = 92.765979 * 40, metric = 1.63% * 40 3.057s ( 13.1 samples/s);
Finished Epoch[9 of 20]: [Training] loss = 55.900171 * 40, metric = 1.08% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[10 of 20]: [Training] loss = 72.423962 * 40, metric = 1.37% * 40 2.991s ( 13.4 samples/s);
Finished Epoch[11 of 20]: [Training] loss = 48.268195 * 40, metric = 0.89% * 40 3.036s ( 13.2 samples/s);
Finished Epoch[12 of 20]: [Training] loss = 34.338052 * 40, metric = 0.64% * 40 2.990s ( 13.4 samples/s);
Finished Epoch[13 of 20]: [Training] loss = 40.116125 * 40, metric = 0.73% * 40 3.024s ( 13.2 samples/s);
Finished Epoch[14 of 20]: [Training] loss = 38.849127 * 40, metric = 0.66% * 40 3.061s ( 13.1 samples/s);
Finished Epoch[15 of 20]: [Training] loss = 37.413214 * 40, metric = 0.77% * 40 3.042s ( 13.1 samples/s);
Finished Epoch[16 of 20]: [Training] loss = 40.390021 * 40, metric = 0.77% * 40 3.070s ( 13.0 samples/s);
Finished Epoch[17 of 20]: [Training] loss = 25.550015 * 40, metric = 0.48% * 40 3.008s ( 13.3 samples/s);
Finished Epoch[18 of 20]: [Training] loss = 21.753720 * 40, metric = 0.43% * 40 3.006s ( 13.3 samples/s);
Finished Epoch[19 of 20]: [Training] loss = 35.532837 * 40, metric = 0.62% * 40 3.172s ( 12.6 samples/s);
Finished Epoch[20 of 20]: [Training] loss = 26.353195 * 40, metric = 0.54% * 40 3.041s ( 13.2 samples/s);
Stored trained model at C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\Output\frcn_py.model
Evaluating Fast R-CNN model for 5 images.

C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript (master)
(ver2.2sgd) λ python A3_ParseAndEvaluateOutput.py
--------------------------------------------------------------
2017-09-23 12:48:18
PARAMETERS: datasetName = Grocery
PARAMETERS: cntk_nrRois = 2000
Parsing CNTK output for image set: test
Parsing cntk output file, image 0 of 5
Parsing cntk output file, image 1 of 5
Parsing cntk output file, image 2 of 5
Parsing cntk output file, image 3 of 5
Parsing cntk output file, image 4 of 5
test.cache ss roidb loaded from C:\git\ver2.2\CNTK\Examples\Image\Detection\FastRCNN\BrainScript\proc\Grocery_2000\cntkFiles\test.cache_selective_search_roidb.pkl
   Processing image 0 of 5..
Number of rois before non-maxima surpression: 3184
Number of rois  after non-maxima surpression: 487
Evaluating detections
AP for         avocado = 0.5556
AP for          orange = 1.0000
AP for          butter = 1.0000
AP for       champagne = 1.0000
AP for          eggBox = 0.7500
AP for          gerkin = 1.0000
AP for         joghurt = 0.6667
AP for         ketchup = 0.6667
AP for     orangeJuice = 1.0000
AP for           onion = 1.0000
AP for          pepper = 1.0000
AP for          tomato = 1.0000
AP for           water = 0.5000
AP for            milk = 1.0000
AP for         tabasco = 1.0000
AP for         mustard = 1.0000
Mean AP = 0.8837
DONE.

kyoro1 on 23 Sep 2017

@kyoro1 Thanks for the detailed info. To answer your questions:

As seen in distributed log, there are similar 4 blocks. Is it usual results? I imagined that data-parallel is a kind of architecture of dividing data-set in each GPU and aggregating them. Log should be aggregated in 1 block, shouldn't it? I wonder if this log structure is correct.

It is indeed a little bit strange, although your result seems ok.

Sample size was over 40 except the 1st epoch(=25). In usual cases, it should be the number of training samples. What causes the difference between 25 and 40(from the 2nd epoch to the final)?

First epoch CNTK does auto-tuning of convolution algorithms, plus overhead of allocating buffers, validate model architecture, etc.

Mean AP was different from original(Mean AP = 0.8837) and distributed(Mean AP = 0.8837). I assumed that the only difference is distribution, and Mean AP should be the same. Is it correct setting?

With such a small data set, fluctuation is normal.

cha-zhang on 26 Sep 2017

@cha-zhang Thanks for your comment. Then, trial as above is almost expected except the log architecture, isn't it?
Also, do you have a plan to develop distributed Fast R-CNN scripts in near future? Or, should I send a pull-request to master?

kyoro1 on 26 Sep 2017

Not in the short term that we will work on this. It would be great if you could send us a PR. :)

cha-zhang on 26 Sep 2017

Here is the 1st step for Fast R-CNN with distributed learning. https://github.com/Microsoft/CNTK/commit/1312bf83574be29d5ee882165a4a7605a99ad7cc

kyoro1 on 3 Nov 2017

Cntk: Shorten training time as to Fast/Faster R-CNN without any changes on algorithms.

Most helpful comment

All 12 comments

Conditions:

My Questions:

Related issues