Describe the bug
fastai code runs very slowly on a CPU.
To Reproduce
model = fastai.vision.models.WideResNet(num_groups=3,
N=3,
num_classes=10,
k=6,
drop_p=0.)
path = fastai.untar_data(fastai.URLs.MNIST_TINY)
data = fastai.vision.ImageDataBunch.from_folder(path, bs=10)
if data.device.type == 'cpu':
learn = fastai.Learner(data, model, metrics=fastai.accuracy)
else: # GPU:
learn = fastai.Learner(data, model, metrics=fastai.accuracy).to_fp16()
learn.fit_one_cycle(1, 3e-3, wd=0.4, div_factor=10, pct_start=0.5)
Expected behavior
I would expect that I could handle a dataset with at least a few hundred patterns, but I have to trim this down to a few dozen (or less) to be able to get something I can test on a CPU.
Am I doing something wrong? Or is the code on the CPU just really slow (much more slow than keras, for example)? Is this a torch issue?
Could you show code and timings for keras and fastai for this example please? I haven't seen any documented examples of pytorch being slower than keras, so I'm sure the pytorch team would be very interested to see if that's the case.
Also, when providing timings, also mention what BLAS you're using for each library, what keras backend you're using, and what CPU you have. On CPU the main source of speed differences tends to be from the BLAS lib you've linked with.
There may also be a problem with num OPENMP threads being set incorrectly leading to only one CPU getting used.
Anyway - certainly interested in digging in to this question with you! :)
@dsblank any update on this?
I'm getting together some reproducible code, and data about the environments I have tested. Should have an update in a few days.
as this is not being followed up on, I'm closing this. If you have the problem still and have code to help us reproduce it please re-open.
Most helpful comment
Could you show code and timings for keras and fastai for this example please? I haven't seen any documented examples of pytorch being slower than keras, so I'm sure the pytorch team would be very interested to see if that's the case.
Also, when providing timings, also mention what BLAS you're using for each library, what keras backend you're using, and what CPU you have. On CPU the main source of speed differences tends to be from the BLAS lib you've linked with.
There may also be a problem with num OPENMP threads being set incorrectly leading to only one CPU getting used.
Anyway - certainly interested in digging in to this question with you! :)