A method to make predictions on new data.
In a machine learning project, once the model has been trained and evaluated (using the validation_step and the test_step), it would be useful to have a method to make predictions on new, unlabelled, data.
Making predictions for just 1 observation is straight-forward by calling model(new_data). However, to predict on a large dataset, we need to create a dataloader and loop through it while concatenating the outputs. It would be great to integrate that to PyTorch Lightning to take advantage of the ease of implementation, especially regarding multi-gpus.
Load a pre-trained model and use it for prediction by calling something like:
model = myLightning_model.load_from_checkpoint(path/to/checkpoint )
predicted_labels = model.predict(newdata_dataloader)
The standard PyTorch way to do it, with the usual issues with managing devices and parallel processing:
prediction_list = []
for i, batch in enumerate(dataloader):
output = model(batch)
output = model.proba(output) # if not part of forward already
prediction_list.append(output)
Hi! thanks for your contribution!, great first issue!
Sounds really useful. My random thoughts:
I needed something similar recently, should say it was pretty easy to implement for my purposes but due to issue #1243 I had to hack on_test_step_end method in my model. The way I wanted it to implement is to create a new Trainer with a Callback that gathers prediction results in it's state. Then one needs to just call test method and get the results.
My $0.02 on
in certain contexts, "predict" may not be a good name
In my limited point of view, I can use the word predict for all my use-cases. Despite the fact, that it may be sometimes slightly inaccurate. Use-cases:
These are all the use-cases I can think of when having a single input and wanting corresponding output from the model.
"estimate" vs predict:
I consider to name the function "estimate" instead of "predict". It make sense to write m.estimate(x) instead of predict(x) e.g. for reinforcement value function to estimate the random parameter of the RL model.
However, I concluded I can always say I predict random variable distribution Y if I ignore how it is further use in more complicated model M. If I talk with respect to model M - I will say I estimated its parameter Y That's what I understood from https://stats.stackexchange.com/a/17789/79340
"infer" vs predict:
I just feel that infer is much more vague word than predict. I do not like it.
Another benefit of using "predict" would be to be consistent with machine learning frameworks like sklearn.
One small concern with "estimate" (for stats users): it could suggest that we are estimating the parameters, ie training the model again.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.