Hi folks, I experimented with OSOD on nano tiny data sets (1-2 image) and it takes time to prepare the dataset by augmenting each picture from input on ~900 backgrounds. I also can see that that being done on the CPU, regardless of GPU configuration.
So the question is can I prepare (augment) dataset, on CPU optimized machine, save it into SFrame and then use this SFrame to perform model creation on GPU heavy machine?
the reason behind that is that machines with GPU cost significantly more expensive than CPU, so it's kinda waste of money. From what I can understand it's already being done in two steps, but I can't see how to properly extract a result of the first step.
Hi @MaximBazarov, you're correct in that there's currently no way to save off the augmented images between the augmentation and training steps. I know you said that the primary reason you wanted to save off the augmented SFrame is because re-augmenting via the CPU is slower than what would be possible with GPU, are there any other reasons you'd like to save it away?
hi @syoutsey, it's not about slower/faster in general, tbh I'm not sure whether it's going to be faster on GPU, my main concern is that if it's done on CPU it might make a lot of sense to do it on the CPU optimized machine (which is cheaper) and the second part on the GPU optimized machine.
Yeah that's the only concern I have.
Got it. Thanks for the feedback! We'll consider this as part of future One-Shot Object Detection work.
Hi @MaximBazarov , while there is no way to directly save off the augmented data and load it using the One Shot Object Detection toolkit, you could do so with the Object Detection toolkit. Here is a workaround that could work for now:
# On CPU optimized machine
import turicreate as tc
starter_images = ... fill in here ...
target = ... fill in here ...
backgrounds = ... fill in here if you have any (optional) ...
synthetic_data_path = ... fill in here ...
synthetic_data = tc.one_shot_object_detector.util._augmentation.preview_synthetic_training_data(starter_images, target, backgrounds=backgrounds)
synthetic_data.save(synthetic_data_path)
# On GPU optimized machine
import turicreate as tc
synthetic_data_path = ... fill in here ...
synthetic_data = tc.SFrame(synthetic_data_path)
batch_size = ... fill in here (optional) ...
max_iterations = ... fill in here (optional) ...
model = tc.object_detector.create(synthetic_data, batch_size=batch_size, max_iterations=max_iterations)
thank you @shantanuchhabra will try that