Thanks for your wonderful work! But when I train shufflenetv2 1.0x, the accuracy will decrease if I use DALI, Here are my details:
Accuracy in paper: 69.40%
Accuracy if using DALI: 68.29%
Accuracy without DALI: 68.86% (This is beacuse some images in my ImageNet are broken)
Data augmentation code(using DALI):
class HybridTrainPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, dali_cpu=False, local_rank=4, world_size=1):
super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed=12)
dali_device = "gpu"
self.input = ops.FileReader(file_root=data_dir, shard_id=device_id, num_shards=world_size,
shuffle_after_epoch=True)
self.decode = ops.ImageDecoder(device="mixed", output_type=types.RGB)
self.res = ops.RandomResizedCrop(device="gpu", size=crop, random_area=[0.08, 1])
self.cmnp = ops.CropMirrorNormalize(device="gpu",
crop=(224, 224),
output_dtype=types.FLOAT,
output_layout=types.NCHW,
image_type=types.RGB,
# mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
# std=[0.229 * 255, 0.224 * 255, 0.225 * 255]
)
self.color = ops.ColorTwist(device='gpu', brightness=uniform(0.6, 1.4))
self.contrast = ops.Contrast(device='gpu', contrast=uniform(0.6, 1.4))
self.saturation = ops.Saturation(device='gpu', saturation=uniform(0.6, 1.4))
self.coin = ops.CoinFlip(probability=0.5)
print('DALI "{0}" variant'.format(dali_device))
class HybridValPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size, local_rank=0, world_size=1):
super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed=12 + device_id,prefetch_queue_depth=1)
self.input = ops.FileReader(file_root=data_dir, shard_id=device_id, num_shards=world_size,random_shuffle=False)
self.decode = ops.ImageDecoder(device="mixed", output_type=types.RGB)
self.res = ops.Resize(device="gpu", resize_shorter=size, interp_type=types.INTERP_TRIANGULAR)
self.cmnp = ops.CropMirrorNormalize(device="gpu",output_dtype=types.FLOAT,output_layout=types.NCHW,crop=(crop, crop),image_type=types.RGB)
def define_graph(self):
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
output = self.cmnp(images)
return [output, self.labels]
Data augmentation code(without DALI):
class OpencvResize(object):
def __init__(self, size=256):
self.size = size
def __call__(self, img):
assert isinstance(img, PIL.Image.Image)
img = np.asarray(img) # (H,W,3) RGB
img = img[:,:,::-1] # 2 BGR
img = np.ascontiguousarray(img)
H, W, _ = img.shape
target_size = (int(self.size/H * W + 0.5), self.size) if H < W else (self.size, int(self.size/W * H + 0.5))
img = cv2.resize(img, target_size, interpolation=cv2.INTER_LINEAR)
img = img[:,:,::-1] # 2 RGB
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
return img
class ToBGRTensor(object):
def __call__(self, img):
assert isinstance(img, (np.ndarray, PIL.Image.Image))
if isinstance(img, PIL.Image.Image):
img = np.asarray(img)
img = img[:,:,::-1] # 2 BGR
img = np.transpose(img, [2, 0, 1]) # 2 (3, H, W)
img = np.ascontiguousarray(img)
img = torch.from_numpy(img).float()
return img
assert os.path.exists(args.train_dir)
train_dataset = datasets.ImageFolder(
args.train_dir,
transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
transforms.RandomHorizontalFlip(0.5),
ToBGRTensor(),
])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=True,
num_workers=1, pin_memory=use_gpu)
train_dataprovider = DataIterator(train_loader)
assert os.path.exists(args.val_dir)
val_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(args.val_dir, transforms.Compose([
OpencvResize(256),
transforms.CenterCrop(224),
ToBGRTensor(),
])),
batch_size=200, shuffle=False,
num_workers=4, pin_memory=use_gpu
)
val_dataprovider = DataIterator(val_loader)
I think these two data augmentation code are the same, but they have different accuracy.
Another problem is that if I use DALI, CUDA_VISIBLE_DEVICES will loss its effectiveness. For example. If I set os.environ["CUDA_VISIBLE_DEVICES"] = "3,4,5,6,7" and model = nn.DataParallel(model,device_ids='1,2,3'), the model will still training on GPU 1,2,3, instead of GPU 4,5,6. Can you help solve these two problems?
Hi,
Regarding "CUDA_VISIBLE_DEVICES" it is rather a question to PyTorch community not related to DALI - I see some thread about it here.
Regarding data pipeline I can share a couple of observations:
transforms.RandomResizedCrop for training but OpencvResize for validation. I would strongly recommend using transforms.Resize for validation. The problem might be that OpenCV and torch vision may have a different definition of pixel center, and what is most important torchvision uses a triangular window for interpolation when it scales down (even you ask for bilinear) while OpenCV uses bilinear. That ends up in different resize methodology applied during the training and validation and the DL network is sensitive for this kind of anomalies that may come our from resize artifacts (like aliasing)transforms.Resize) while for the training you use the default value which is INTERP_LINEARHi,
Regarding "CUDA_VISIBLE_DEVICES" it is rather a question to PyTorch community not related to DALI - I see some thread about it here.
Regarding data pipeline I can share a couple of observations:* your original pipeline uses `transforms.RandomResizedCrop` for training but `OpencvResize` for validation. I would strongly recommend using `transforms.Resize` for validation. The problem might be that OpenCV and torch vision may have a different definition of pixel center, and what is most important torchvision uses a triangular window for interpolation when it scales down (even you ask for bilinear) while OpenCV uses bilinear. That ends up in different resize methodology applied during the training and validation and the DL network is sensitive for this kind of anomalies that may come our from resize artifacts (like aliasing) * when you use DALI you utilize INTERP_TRIANGULAR for validation pipeline (what resembles the default behavior of `transforms.Resize`) while for the training you use the default value which is INTERP_LINEAR * small nitpick - ColorTwist can cover saturation, contrast, and brightness in one go, no need to use a separate operator for saturation and contrast augmentation in this case * a similar question was raised in #400 - you can the discussion there
Thanks for your timely relpy! I will follow your suggestions and train my network again right now.
Hi,
Regarding "CUDA_VISIBLE_DEVICES" it is rather a question to PyTorch community not related to DALI - I see some thread about it here.
Regarding data pipeline I can share a couple of observations:* your original pipeline uses `transforms.RandomResizedCrop` for training but `OpencvResize` for validation. I would strongly recommend using `transforms.Resize` for validation. The problem might be that OpenCV and torch vision may have a different definition of pixel center, and what is most important torchvision uses a triangular window for interpolation when it scales down (even you ask for bilinear) while OpenCV uses bilinear. That ends up in different resize methodology applied during the training and validation and the DL network is sensitive for this kind of anomalies that may come our from resize artifacts (like aliasing) * when you use DALI you utilize INTERP_TRIANGULAR for validation pipeline (what resembles the default behavior of `transforms.Resize`) while for the training you use the default value which is INTERP_LINEAR * small nitpick - ColorTwist can cover saturation, contrast, and brightness in one go, no need to use a separate operator for saturation and contrast augmentation in this case * a similar question was raised in #400 - you can the discussion thereThanks for your timely relpy! I will follow your suggestions and train my network again right now.
Don't worry. In my case, the both results of shufflenet V1 and V2 on dali are higher than original paper. The reason for this is that the OpenCVResize only used in val set and it will be different from the train set.
Hi,
Regarding "CUDA_VISIBLE_DEVICES" it is rather a question to PyTorch community not related to DALI - I see some thread about it here.
Regarding data pipeline I can share a couple of observations:* your original pipeline uses `transforms.RandomResizedCrop` for training but `OpencvResize` for validation. I would strongly recommend using `transforms.Resize` for validation. The problem might be that OpenCV and torch vision may have a different definition of pixel center, and what is most important torchvision uses a triangular window for interpolation when it scales down (even you ask for bilinear) while OpenCV uses bilinear. That ends up in different resize methodology applied during the training and validation and the DL network is sensitive for this kind of anomalies that may come our from resize artifacts (like aliasing) * when you use DALI you utilize INTERP_TRIANGULAR for validation pipeline (what resembles the default behavior of `transforms.Resize`) while for the training you use the default value which is INTERP_LINEAR * small nitpick - ColorTwist can cover saturation, contrast, and brightness in one go, no need to use a separate operator for saturation and contrast augmentation in this case * a similar question was raised in #400 - you can the discussion thereThanks for your timely relpy! I will follow your suggestions and train my network again right now.
Don't worry. In my case, the both results of shufflenet V1 and V2 on dali are higher than original paper. The reason for this is that the OpenCVResize only used in val set and it will be different from the train set.
Yeah, Now my accuracy is 69.20%, higher than offical code.
I guess the main reason is the color jitter is not random. I refered https://github.com/NVIDIA/DALI/issues/336 and edited my code. Of course, differnet resize between train and val dataset is also a reason.
I guess the main reason is the color jitter is not random. I refered https://github.com/NVIDIA/DALI/issues/336 and edited my code. Of course, differnet resize between train and val dataset is also a reason.
Could you post your data augmentation code right here? Thanks in advance.
I guess the main reason is the color jitter is not random. I refered https://github.com/NVIDIA/DALI/issues/336 and edited my code. Of course, differnet resize between train and val dataset is also a reason.
Could you post your data augmentation code right here? Thanks in advance.
I will post it after I come back to school. But it will be a few days, maybe 15 days or more.
Most helpful comment
Don't worry. In my case, the both results of shufflenet V1 and V2 on dali are higher than original paper. The reason for this is that the OpenCVResize only used in val set and it will be different from the train set.