Pytorch-lightning: Enable pytorch lightning with non-trainable models?

Created on 14 May 2020 · 10Comments · Source: PyTorchLightning/pytorch-lightning

I'm trying to use pl infrastructure with a non-deep learning model.
As I'm validating multiple models, and some are "classic" but all use the same trainers, preprocessing, GPUs, etc, it would really benefit me to use the same code for all models:)

If I simply run a model with no weights I get the following error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

As pl doesn't allow to not pass the configure_optimizers function,
I tried overriding optimizer_step as well

def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_i, second_order_closure=None):
        pass

But got the same error.
Overriding backward did not help either.

If someone has a solution, even a hacky one, I'll be happy to hear.

discussion enhancement question won't fix

Source

tdvginz

All 10 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 14 May 2020

how would you expect this to work?

can you give me pseudocode of what you want to do? is this for something like an svm?

williamFalcon on 15 May 2020

It should be possible to just return None from configure_optimizers (https://pytorch-lightning.readthedocs.io/en/stable/lightning-module.html#lightningmodule-class). Does that solve the grad error?

ethanwharris on 15 May 2020

Another hacky solution: add dummy Linear layer to your module (so model.parameters() won't be empty) and in the forward() just return torch.tensor(0.0, requires_grad=True)

festeh on 15 May 2020

@ethanwharris - returning None from configure_optimizers does not help as loss.backward is still called

@williamFalcon - svm but not just, for example fitting a gmm after using a pretrained pipeline with fixed weights from earlier training,
In this scenario, for example, I would set the max_epochs to be 1, fit on the training data, and test of the validation.
As 99% of the infra stays the same, including the loaders, etc. using the same code for that would be preferable for me, even if pl is not natively for that purpose:)

@festeh Liked the idea, currently getting
assert all(map(lambda i: i.is_cuda, inputs)) when doing so, I'll continue to investigate why. Probably a shape thing..

dvirginz on 16 May 2020

Probably the award-winning hack of the year:
I'm setting
accumulate_grad_batches=<Any number bigger than the true size of the batch>
and inside the model

    def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_i, second_order_closure=None):
        pass
    def backward(self, use_amp, loss, optimizer,another):
        pass

dvirginz on 16 May 2020

👍1

@dvirginz love the hack haha. What would be a good way for lightning to support this by default? We could add support for losses to be None? Would be nice if this use case was better supported :)

ethanwharris on 16 May 2020

❤1

maybe add a flag
differentiate = False?

williamFalcon on 16 May 2020

Ok, i think this is super interesting. My suggestion is to add a flag:

differentiable=False

which should be used for non differentiable models.
Then we turn off all the gradient stuff

@tdvginz want to submit a PR? we can help test and merge

williamFalcon on 18 May 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.