Pytorch-lightning: Predict method to label new data

Created on 16 May 2020 · 6Comments · Source: PyTorchLightning/pytorch-lightning

🚀 Feature

A method to make predictions on new data.

Motivation

In a machine learning project, once the model has been trained and evaluated (using the validation_step and the test_step), it would be useful to have a method to make predictions on new, unlabelled, data.

Making predictions for just 1 observation is straight-forward by calling model(new_data). However, to predict on a large dataset, we need to create a dataloader and loop through it while concatenating the outputs. It would be great to integrate that to PyTorch Lightning to take advantage of the ease of implementation, especially regarding multi-gpus.

Pitch

Load a pre-trained model and use it for prediction by calling something like:

model = myLightning_model.load_from_checkpoint(path/to/checkpoint )
predicted_labels = model.predict(newdata_dataloader)

Alternatives

The standard PyTorch way to do it, with the usual issues with managing devices and parallel processing:

prediction_list = []
for i, batch in enumerate(dataloader):
    output = model(batch)
    output = model.proba(output) # if not part of forward already
    prediction_list.append(output)

Additional context

Important discussion enhancement help wanted won't fix

Source

Llannelongue

👍3

All 6 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 16 May 2020

Sounds really useful. My random thoughts:

In certain contexts, "predict" may not be a good name, since not every task is about classification. However, it's probably as close as we can get, I don't have a better name :)
it should set the model to eval mode
it should not compute grads
it should reset the mode to what it was before (train or eval)
when it comes to multi-gpu and all that stuff, we would need the Trainer class. The module itself cannot do this (yet?)

awaelchli on 17 May 2020

👍1

I needed something similar recently, should say it was pretty easy to implement for my purposes but due to issue #1243 I had to hack on_test_step_end method in my model. The way I wanted it to implement is to create a new Trainer with a Callback that gathers prediction results in it's state. Then one needs to just call test method and get the results.

festeh on 17 May 2020

My $0.02 on

in certain contexts, "predict" may not be a good name

In my limited point of view, I can use the word predict for all my use-cases. Despite the fact, that it may be sometimes slightly inaccurate. Use-cases:

classification: predict - hard decision on label typically argmax(softmax)
regression: predict - predict value (or interval)
reinforcement learning
- action prediction - same as classification
- value function estimation - I interpret it as predicting one of many functions (still usable)
using model as feature extractor: I am ok with "I predict the features"
- E.g. use first layers of auto-encoder to encode input
- Valid for using the model to embed input into fixed vector representation

These are all the use-cases I can think of when having a single input and wanting corresponding output from the model.

"estimate" vs predict:

I consider to name the function "estimate" instead of "predict". It make sense to write m.estimate(x) instead of predict(x) e.g. for reinforcement value function to estimate the random parameter of the RL model.

However, I concluded I can always say I predict random variable distribution Y if I ignore how it is further use in more complicated model M. If I talk with respect to model M - I will say I estimated its parameter Y That's what I understood from https://stats.stackexchange.com/a/17789/79340

"infer" vs predict:
I just feel that infer is much more vague word than predict. I do not like it.

oplatek on 20 May 2020

👍1

Another benefit of using "predict" would be to be consistent with machine learning frameworks like sklearn.

One small concern with "estimate" (for stats users): it could suggest that we are estimating the parameters, ie training the model again.

Llannelongue on 20 May 2020

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.