Pytorch-lightning: Enable pytorch lightning with non-trainable models?

Created on 14 May 2020  路  10Comments  路  Source: PyTorchLightning/pytorch-lightning

I'm trying to use pl infrastructure with a non-deep learning model.
As I'm validating multiple models, and some are "classic" but all use the same trainers, preprocessing, GPUs, etc, it would really benefit me to use the same code for all models:)

If I simply run a model with no weights I get the following error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

As pl doesn't allow to not pass the configure_optimizers function,
I tried overriding optimizer_step as well

def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_i, second_order_closure=None):
        pass

But got the same error.
Overriding backward did not help either.

If someone has a solution, even a hacky one, I'll be happy to hear.

discussion enhancement question won't fix

All 10 comments

Hi! thanks for your contribution!, great first issue!

how would you expect this to work?

can you give me pseudocode of what you want to do? is this for something like an svm?

It should be possible to just return None from configure_optimizers (https://pytorch-lightning.readthedocs.io/en/stable/lightning-module.html#lightningmodule-class). Does that solve the grad error?

Another hacky solution: add dummy Linear layer to your module (so model.parameters() won't be empty) and in the forward() just return torch.tensor(0.0, requires_grad=True)

@ethanwharris - returning None from configure_optimizers does not help as loss.backward is still called

@williamFalcon - svm but not just, for example fitting a gmm after using a pretrained pipeline with fixed weights from earlier training,
In this scenario, for example, I would set the max_epochs to be 1, fit on the training data, and test of the validation.
As 99% of the infra stays the same, including the loaders, etc. using the same code for that would be preferable for me, even if pl is not natively for that purpose:)

@festeh Liked the idea, currently getting
assert all(map(lambda i: i.is_cuda, inputs)) when doing so, I'll continue to investigate why. Probably a shape thing..

Probably the award-winning hack of the year:
I'm setting
accumulate_grad_batches=<Any number bigger than the true size of the batch>
and inside the model

    def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_i, second_order_closure=None):
        pass
    def backward(self, use_amp, loss, optimizer,another):
        pass

@dvirginz love the hack haha. What would be a good way for lightning to support this by default? We could add support for losses to be None? Would be nice if this use case was better supported :)

maybe add a flag
differentiate = False?

Ok, i think this is super interesting. My suggestion is to add a flag:

differentiable=False

which should be used for non differentiable models.
Then we turn off all the gradient stuff

@tdvginz want to submit a PR? we can help test and merge

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

versatran01 picture versatran01  路  3Comments

justusschock picture justusschock  路  3Comments

williamFalcon picture williamFalcon  路  3Comments

monney picture monney  路  3Comments

polars05 picture polars05  路  3Comments