Apex: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Created on 18 Mar 2020  路  5Comments  路  Source: NVIDIA/apex

Hi,

I receive the error shown below when I try FP16 training (opt_level="03"). When training on opt_level="01", everything seems to be working fine. I have attached a snippet of the code with relevant parts. I believe I have followed your documentation but maybe I am missing something.

Thanks for help

PyTorch: 1.4.0
Cuda: 10.1

Code snippet:
`

...
from apex.fp16_utils import *
from apex import amp, optimizers 
...

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(),lr=0.01, eps=10**-7)

model, optimizer = amp.initialize(model, optimizer, opt_level="O3")

for epoch in range(nb_epoch):

    optimizer = lr_scheduler(optimizer, epoch)

    for i, inputs in enumerate(train_loader):

        inputs = inputs.permute(0, 1, 4, 2, 3)
        inputs = inputs.cuda()
        errors = model(inputs)
        errors = errors.float()
        loc_batch = errors.size(0)
        errors = torch.mm(errors.view(-1, nt), time_loss_weights)
        errors = torch.mm(errors.view(loc_batch, -1), layer_loss_weights)
        errors = torch.mean(errors)
        optimizer.zero_grad()
        with amp.scale_loss(errors, optimizer) as scaled_loss:
            scaled_loss.backward()
            #errors.backward()
        optimizer.step()

'
Error:

Selected optimization level O3:  Pure FP16 training.
Defaults for this optimization level are:
enabled                : True
opt_level              : O3
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : False
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O3
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : False
master_weights         : False
loss_scale             : 1.0
Traceback (most recent call last):
  File "train_t_1.py", line 96, in <module>
    errors = model(inputs)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/apex/amp/_initialize.py", line 197, in new_fwd
    **applier(kwargs, input_caster))
  File "/home/bernard/Projects/PrednetConvLSTMPytorch/PredNetOriginal.py", line 141, in forward
    Rep, Cell = cell(tmp, Cell)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernard/Projects/PrednetConvLSTMPytorch/ConvLSTMCellPredNet.py", line 41, in forward
    i_t = torch.sigmoid(self.W_i(inputs)) #Bias included in self.W_.. initialization
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/bernard/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

`

Most helpful comment

Facing the same error while using 'O2', have you found the solution?@BenQLange
I found this error only appeared when I run code with multiple GPUs, it is ok while using single GPU. And while using multiple GPUs, even with 'O1', it seems that the memory usage would be much higher than using single GPU.

All 5 comments

Facing the same error while using 'O2', have you found the solution?@BenQLange
I found this error only appeared when I run code with multiple GPUs, it is ok while using single GPU. And while using multiple GPUs, even with 'O1', it seems that the memory usage would be much higher than using single GPU.

In my case, the reason is that I used a non-official SyncBatchNorm implementation and it seems apex couldn't deal with it.

when global wheat (kaggle) trained weight feed to predict, faced same error message.

I encountered totally the same error, does anyone has a good solution? Thanks!

im also facing the same issue, I trained pytorch yolo V5 model, and then tried to integrate with flask API,

class Model(object):

    def __init__(self, model):

        self.device = torch_utils.select_device()
        print(self.device)
        model = torch.load(model, map_location=self.device)['model']

        self.half = False and self.device.type != 'cpu'
        print('half = ' + str(self.half))

        if self.half:
            model.half()

        model  = model.to(self.device).eval()

        self.loaded_model = model

    def predict(self, img):
        global session
        # img = torch.zeros((1, 3, 640, 640), device=self.device)
        img1 = torch.from_numpy(img).to(self.device)
        img = img1.reshape(1, 3, 640, 640)
        img = img.half() if self.half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        print(img.ndimension())
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        print(self.loaded_model)
        img = img.to(self.device)

        self.preds = self.loaded_model(img, augment=False)[0]
        return  self.preds

in my camera.py file I tried to read frame by frame and get prediction as below

model = FacecoverDetectModel("weights/best.pt")

class Camera(object):
    def __init__(self):
        self.video = cv2.VideoCapture(0)

    def __del__(self):
        self.video.release()

    def get_frame(self):
        _, fr = self.video.read()
        loader = transforms.Compose([transforms.ToTensor()])

        image = cv2.resize(fr, (640, 640), interpolation=cv2.INTER_AREA)
        input_im = image.reshape(1, 640, 640, 3)

        pil_im = Image.fromarray(fr)
        image = loader(pil_im).float()
        # image = Variable(image, requires_grad=True)
        image = image.unsqueeze(0)


        pred = model.predict(input_im)
        print(pred)
        _, jpeg = cv2.imencode('.jpg', fr)
        return jpeg.tobytes()

any ideas please

Was this page helpful?
0 / 5 - 0 ratings