Q1. I confirmed there are many hyper-parameters related to data augmentation in config file.
For example, (saturation=1.5, exposure=1.5, hue=.1, jitter=0.3, scales=.1,.1 and so on).
But, I don't know exactly what the values (1.5, .1, 0.3, .1,.1) mean. Could you explain the value or let me know description(file) explaining the value?
Q2. I don't know whether training dataset is doubled, triple or 20%, through data augmentation. Also, how do I set the ratio of crop, flip, and other methods? For example, if I have 6 training data (img1, img2, img3, img4, img5, img6), decide to make training dataset double through data augmentation, and set crop=0.5 and flip=0.5, then I get 12 dataset (img1, img2, img3, img4, img5, img6, crop_img1, crop_img2, crop_img3, filp_img4, flip_img5, flip_img6). In this case, I just want to know where I set the ratio of crop and flip (crop=0.5, flip=0.5).
If you don't understand my question, please let me know! Thanks!
saturation=1.5 then it will be changes saturation= init_value * rand(1/1.5, 1.5)exposure=1.5 then it will be changes exposure= init_value * rand(1/1.5, 1.5)hue=0.1 then it will be changes hue= init_value + rand(-0.1, 0.1)Also about cfg-parameters: https://github.com/AlexeyAB/darknet/issues/279#issuecomment-347002399
jitter instead of cropdata augmentation will generate infinite number of augmented (changed) images
flip can be 0 or 1, if flip=1 (by default) then will be randomly used horizontal flipping, if flip=0 then willn't be used
jitter can be [0.0 to 1.0] - how does it work: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/data.c#L679-L697
Thanks your answer! I have other two questions.
Q1. Does one file explaining(or describing) training hyper-parameters or data augmentation hyper-parameters exist?
Q2. Tensorflow object detection supports the probability of data augmentation. For example, "The probability of flipping the image is 50%."
Can I set the probability of flipping or jitter ?
There is no any document with explanation of hyper parameters.
if flip=0 then probability of flipping = 0%, if flip=1 then probability of flipping = 50%
Probability of jitter is 100%, you can only change maximum volume of random coordinat changing for cropping image according to this code: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/data.c#L679-L697
Q1. I can see "flip" in data.c, but not in config flie (yolov3.cfg). Although there isn't flip hyper-parameter in yolov3.cfg, flip data augmentation is applied through above data.c code?
Q2. I don't know what scale =.1,.1 means. is it scale range = 0.1~0.1 ??
Hi @AlexeyAB ,
Can I know how many times is each image augmented for each augmentation parameter set?
For example, I would like to understand approximately, if my training data is say 1000 images, then how many times does these 1000 images get augmented when the batch size is set as 64 and number of iterations that I ran is 1000 ?
And does one image in a batch gets augmented with only one type at a time OR can it get augmented with a combination of types at a time ( types being color, flip , hue, crop, saturation etc., etc.,).
And what sampling strategy is followed to pick up the batch of 64 images from 1000 images in the above example - Is it sampling with replacement or sampling without replacement ? I am assuming the later because that would guarantee the walk through of all images ?
@kmsravindra Hi,
batch*iterations = 64*1000 = 64 000 times random images will be loaded. So approximately each of your 1000 images will be loaded 64 times. Then each of your images will be randomly augmented approximately 64 times.
Each image always augmented by using all augmentations: color, flip , hue, crop, saturation etc., etc.,.
Just random. There is no guarantee the walk through of all images
@89douner
flip=1 by default, even if it is absent in cfg-file
scale =.1,.1 - multipliers at which will be multiplied learning_rate when iteration number will reach numbers in steps= (parameter in cfg-file)
@kmsravindra Hi,
1. `batch*iterations = 64*1000 = 64 000` times random images will be loaded. So approximately each of your 1000 images will be loaded 64 times. Then each of your images will be randomly augmented approximately 64 times. 2. Each image always augmented by using all augmentations: color, flip , hue, crop, saturation etc., etc.,. 3. Just random. There is no guarantee the walk through of all images
Hi @AlexeyAB
From your answer to the first question, can i say one image is augmented 64 times (according to the batch size).
To be clear, in a batch size of 64 and say sub-division of 8,
1) 64 images from the training set is loaded, 8 images are passed to GPU, now where do the data augmentation part happens ?
2) Do these 8 images in a sub-division are augmented and sent which means, network is trained on 8*64 augmented images after next batch is sent?
Really in need of your reply, please reply asap.
@prateekgupta891
64 images loaded, then all 64 images augmented, then in loop 8 times will be loaded by 8 images to GPU for processing
The model is trained for 64 images per iteration.
@prateekgupta891
1. 64 images loaded, then all 64 images augmented, then in loop 8 times will be loaded by 8 images to GPU for processing 2. The model is trained for 64 images per iteration.
So, 1 image (from the training set) after being loaded get augmented how many times? 1 time only or equal to the batch ?
1 time only
@kmsravindra Hi,
1. `batch*iterations = 64*1000 = 64 000` times random images will be loaded. So approximately each of your 1000 images will be loaded 64 times. Then each of your images will be randomly augmented approximately 64 times.
I am a little bit confuse here. You previously said "that each of your images will be randomly augmented approximately 64 times".
However, recently you said "1 time only".
So which one is correct?
@Ujang24 Hi,
What he was trying to say was, a batch of images are loaded according to your batch size ( which here is 64, and all the images are different), now they will be augmented and passed to the GPU in sub-batches of 8 ( sub-division is 8, so total 8 times will be loaded).
So one image is augmented once only (no copies were created), just it meant that 64 images were loaded in a batch and all of them were augmented.
Hope that clears it up!
@Ujang24 Hi,
What he was trying to say was, a batch of images are loaded according to your batch size ( which here is 64, and all the images are different), now they will be augmented and passed to the GPU in sub-batches of 8 ( sub-division is 8, so total 8 times will be loaded).
So one image is augmented once only (no copies were created), just it meant that 64 images were loaded in a batch and all of them were augmented.
Hope that clears it up!
Yes, I can understand clearly what you've said. Thank you.
Btw, could you please also clarify, what he said that "data augmentation will generate infinite number of augmented (changed) images". That is the answer for @89douner Q2. I don't know whether training dataset is doubled, triple or 20%, through data augmentation. Also, how do I set the ratio of crop, flip, and other methods?
https://github.com/AlexeyAB/darknet/issues/1842#issuecomment-433918329
Thanks
Look up for online data augmentation. Because that is what it is doing!
In every epoch due to the augmentations, the whole set looks like a new one (hence infinite!)
There is a cfg file for network definition. There you need to set the values.
AlexeyAB repo helps a lot.
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
Also refer to the last 10 line of this .cfg file
https://github.com/AlexeyAB/darknet/blob/3d2d0a7c98dbc8923d9ff705b81ff4f7940ea6ff/cfg/yolov3.cfg#L17
Most helpful comment
saturation=1.5then it will be changessaturation= init_value * rand(1/1.5, 1.5)if
exposure=1.5then it will be changesexposure= init_value * rand(1/1.5, 1.5)if
hue=0.1then it will be changeshue= init_value + rand(-0.1, 0.1)Also about cfg-parameters: https://github.com/AlexeyAB/darknet/issues/279#issuecomment-347002399
jitterinstead ofcropdata augmentation will generate infinite number of augmented (changed) images
flipcan be 0 or 1, ifflip=1(by default) then will be randomly used horizontal flipping, ifflip=0then willn't be usedjittercan be [0.0 to 1.0] - how does it work: https://github.com/AlexeyAB/darknet/blob/5a2efd5e5327c56a362442dce70bb3e46201cb89/src/data.c#L679-L697