Following the discussion in #168, here idea is to provide a base structure to compute a metric cumulating model's output during whole epoch.
However, model's output and targets are restricted :
(batch_size, n_classes) or (batch_size, ) <=> avoid to store a lot of data(batch_size, n_classes) and n_classes > 1 than it should be binary: e.g. [[0, 1, 0, 1], ] <=> a use-case for mAP The implementation is rather straight forward, I think this should be an abstract class like Metric but it implements the method update:
class EpochMetric(Metric):
def reset(self):
self._predictions = torch.tensor([], dtype=torch.float32)
self._targets = torch.tensor([], dtype=torch.long)
def update(self, output):
y_pred, y = output
assert 1 <= y_pred.ndimension() <= 2, "Predictions should be of shape (batch_size, n_classes)"
assert 1 <= y.ndimension() <= 2, "Targets should be of shape (batch_size, n_classes)"
if y.ndimension() == 2:
assert torch.equal(y ** 2, y), 'Targets should be binary (0 or 1)'
if y_pred.ndimension() == 2 and y_pred.shape[1] == 1:
y_pred = y_pred.squeeze(dim=-1)
if y.ndimension() == 2 and y.shape[1] == 1:
y = y.squeeze(dim=-1)
y_pred = y_pred.type_as(self._predictions)
y = y.type_as(self._targets)
self._predictions = torch.cat([self._predictions, y_pred], dim=0)
self._targets = torch.cat([self._targets, y], dim=0)
@abstractmethod
def compute(self):
pass
and user can implement something like this:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import average_precision_score
class AUC(EpochMetric):
def compute(self):
y_true = self._targets.numpy()
y_pred = self._predictions.numpy()
return roc_auc_score(y_true, y_pred)
class mAP(EpochMetric):
def compute(self):
y_true = self._targets.numpy()
y_pred = self._predictions.numpy()
return average_precision_score(y_true, y_pred)
Not sure about mAP and average_precision_score but anyway.
What do you guys think about this ?
cc @pkdogcom
I wonder if it might even be worth have EpochMetric take a compute_fn at init time. That way, users wouldn't have to subclass it every time. For example, AUC would just become EpochMetric(compute_fn= roc_auc_score). What do you think?
@jasonkriss great idea! Sure that this could avoid user to create custom classes.
If we provide compute_fn, what would be a signature of the function ?
As example of compute_fn can accept torch.tensor that are cumulated during the epoch:
self._predictions ~ torch.tensor([], dtype=torch.float32)
self._targets ~ torch.tensor([], dtype=torch.long)
self.compute_fn(self._predictions, self._targets)
Also, we probably need to check the signature before starting accumulation, otherwise if user is somehow provided something bad he/she will understand this only at the end of the epoch...
Any thoughts?
I think checking the arity upfront is a good idea. My vote would be to have the signature be just as you have it (torch.tensor, torch.tensor). Since this is a pytorch library, that seems like the right default. If people want to use sklearn metrics, they can write a small wrapper to convert to numpy arrays.
@vfdev-5 The implementation looks good to me. We're actually using it in our project already. I was wondering if we should include the current implementation and leave further improvement later so that people can start using and getting benefits from your work without having to delve into details in issues discussions.
As for sklearn dependencies, I think it is ok to me to have dependencies on some other common ML libraries, from a user perspectives, especially considering pytorch itself is a research/development oriented platform rather than deployment platform. If there is a principle of keeping the core library clean and smart, it is also possible to leave such 3rd-party dependent implementations in a separate project like 'ignite-contrib', so that people have the freedom to choose.
@pkdogcom thank you for the feedback! Yes, we have to continue with this issue. What do you think about Jason Kriss' proposition to have a compute_fn without a need of subclassing EpochMetric?
In this case we can avoid any dependencies on sklearn or whatever ...
I think either way works. It's just a matter of which direction the project wants to go, more feature rich/complete with possibly more dependencies v.s. lean & compact but requiring some more customization on users' sides. Or a better question may be where to draw the line between the two directions.
FYI, I think in actual implementation the mAP metric should be written as this (in the subclass case), by performing softmax on predictions:
class AveragePrecision(EpochMetric):
def compute(self):
y_true = self._targets.numpy()
y_pred = F.softmax(self._predictions, 1).numpy()
return average_precision_score(y_true, y_pred[:, 1])
So in this case, leaving the compute_fn implementation to user may require more work (not too complicated but still need some trials and errors) and knowledge (knowing that sklearn can be used here instead of having to hand-craft a mAP computation theirselves) from users but making the softmax and adaptation to sklearn mAP computation more explicit and flexible (especially in case that they've already done softmax in the model module).
Help needed with this issue?
@anmolsjoshi thanks, but there is already a PR on this :)
Added in #235