Pytorch-lightning: Data is on GPU, some of the nn.Module still on CPU

Created on 16 Jul 2020 · 2Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

A pytorch module consisting of a Python list containing multiple nn.Conv1d objects and 3 Fully-Connected layers are raising the following error:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

When trying to do a forward pass on the first nn.Conv1d contained in the Python List.

To Reproduce

Pytorch Module

class Conv1DClassifier(nn.Module):

    def __init__(self, num_classes):
        super(Conv1DClassifier, self).__init__()
        encoder_out_channels = 256
        encoder_kernel_size = 64
        encoder_stride = 64
        n_layers = int(math.log2(encoder_out_channels))
        self.conv_layers = []
        for layer_idx in range(n_layers):
            self.conv_layers.append(
                nn.Conv1d(
                    in_channels=2 ** layer_idx,
                    out_channels=2 ** (layer_idx + 1),
                    kernel_size=encoder_kernel_size,
                    stride=encoder_stride
                )
            )
        self.fc1 = nn.Linear(encoder_out_channels, 128)  
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, num_classes)

    def forward(self, x):

        for conv_layer in self.conv_layers:
            # on the first pass, the error is raised here!
            x = conv_layer(x)
        x, _ = x.max(2)  # max pooling over the sequence dim; drop sequence axis
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

The Lightning Module i used it's just the Lit Module presented on the tutorial.

Steps to reproduce the behavior:

Try to do a forward pass on the presented pytorch module with GPU.
Watch the console log to notice that the data is in GPU but the (some of) model parameters are still on CPU.

I think it has something to do with the python list that contains the many convolutions layers that should be transfered to the GPU but it doesnt. As the CPU to GPU and vice-versa data transfers are handled by Lightning i think i can get help here.

bug / fix help wanted

Source

Vichoko

Most helpful comment

Use ModuleList. It's pytorch specific.
Try:

model = Conv1DClassifier(num_classes=2)
print(model)

You won't see the conv layers in the model.

Change it to self.conv_layers = nn.ModuleList()

rohitgr7 on 16 Jul 2020

👍2

All 2 comments

Use ModuleList. It's pytorch specific.
Try:

model = Conv1DClassifier(num_classes=2)
print(model)

You won't see the conv layers in the model.

Change it to self.conv_layers = nn.ModuleList()

rohitgr7 on 16 Jul 2020

👍2

Just for further information i unwrapped the Python List in a series of nn.Conv1d objects and it worked so the issue has to be the Python List. I'll try @rohitgr7 solution in a minute.

Edit:
I reemplaced the Python List object with the nn.ModuleList() and it worket as intended.
Thank you so much!

The final architecture looks as follows:

class Conv1DClassifier(nn.Module):
    def __init__(self, num_classes):
        super(Conv1DClassifier, self).__init__()
        encoder_out_channels = 256
        encoder_kernel_size = 64
        encoder_stride = 2
        n_layers = int(math.log2(encoder_out_channels))
        self.conv_layers = nn.ModuleList()
        for layer_idx in range(n_layers):
            self.conv_layers.append(
                nn.Conv1d(
                    in_channels=2 ** layer_idx,
                    out_channels=2 ** (layer_idx + 1),
                    kernel_size=encoder_kernel_size,
                    stride=encoder_stride
                )
            )
        self.fc1 = nn.Linear(encoder_out_channels, 128)  
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, num_classes)

    def forward(self, x):
        for conv_layer in self.conv_layers:
            x = conv_layer(x)
        x, _ = x.max(2)  # max pooling over the sequence dim; drop sequence axis
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Vichoko on 16 Jul 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings