Ignite: [Feature Request] Adding Average Precision Metric

Created on 4 May 2018 · 16Comments · Source: pytorch/ignite

Currently the built-in Metrics seem to omit Average Precision, which is widely used in classification/object detection as the standard evaluation metrics. I suggest adding this metric as it is usually a better metric than accuracy, especially when data have imbalance classes.

enhancement

Source

pkdogcom

Most helpful comment

@vfdev-5 By definition, Average Precision is the area under the precision-recall curve which tells how precision changes as recall increases (usually by varying threshold of confidence score) while a single precision value only tells the performance at some single uncertain recall value (in current precision metrics implementation the threshold for binary classification is 0.5).

Take face detection as an example. In the detection pipeline usually a binary face/non-face classification score is used, and if the precision metric is used rather than average precision metric, then one has to manually determine a threshold of confidence score, which in most case will be simply 0.5. Then it could be the case that under the given threshold, 99% of the bounding box classified as face are actual faces (i.e. precision 0.99, which looks reasonably good), while maybe only 50% of all faces are classified as face (i.e. recall 0.5, which may not be good in some application). And if the application requires the model detects most, say 95%, faces, then one has to lower the threshold of confidence score,which in turn will include some false faces and thus lower the precision. However, since a single precision metric is used, it is unknown that what such precision will be under the condition of recall 0.95, so the overall performance of the model is not fully evaluated. That's why most object detection and many classification tasks use AP (and mAP, which is the mean of average precision across all classes)

pkdogcom on 4 May 2018

👍3

All 16 comments

@pkdogcom could you please provide more details on your feature request and why it is not possible to use built-in precision metrics.
For the classification task, built-in precision metric can compute precision for each class and the average can be easily computed and reported in a handler, for example like here.

vfdev-5 on 4 May 2018

pkdogcom on 4 May 2018

👍3

@pkdogcom thanks for the explanation!

vfdev-5 on 4 May 2018

@vfdev-5 You're very welcome! I think we can use Tensorflow's implementation as a reference, which can be found at here and here

pkdogcom on 4 May 2018

👍1

@pkdogcom I think this is a good idea. Somewhat related, I think we should add metrics for AUC/ROC as well. These metrics (along with existing Precision/Recall) will likely share quite a bit of logic.

jasonkriss on 4 May 2018

sounds like a great idea. @pkdogcom can you send a PR?

alykhantejani on 8 May 2018

We can inspire from tnt/meters and from skorch

vfdev-5 on 8 May 2018

@jasonkriss @alykhantejani I was thinking about to add AUC/ROC and mAP metrics to ignite, so I tried firstly to adapt the code from tnt/aucmeter. Implementation can be easily adapted, but there are some problems with it:

this part computation is slow and faulty (when we have same probas/logits or whatever provided by model)

Another possibilty instead of rewriting these functions, we can provided a sort of EpochMetric as in skorch that accumulates predictions and targets and can use sklearn metrics on it. We can avoid new dependency just asking for a callable.

What do you think about this ?

vfdev-5 on 17 Jun 2018

I'm not too familiar with Skorch, how would this look in ignite (i.e. what would the user code that uses this metric look like)?

Either way, I think these are useful metrics to have :)

alykhantejani on 18 Jun 2018

Personally, I would like to have these functions as built-ins (at least AUC/ROC) without having to bring in sklearn. That being said, some sort of EpochMetric sounds like it could be a good idea. Could probably refactor some of the current metrics to utilize that and it would also allow us to piggyback on sklearn where there are gaps in our metrics. At least until we add them directly to ignite.

jasonkriss on 18 Jun 2018

EpochMetric implementation to collect prediction is not that complicated and for AUC it could look like this:

Click to expand

class AUC(Metric):
    def reset(self):
        self._scores = torch.tensor([], dtype=torch.float32)
        self._targets = torch.tensor([], dtype=torch.long)

    def update(self, output):
        y_pred, y = output

        assert y_pred.ndimension() == 2, "Predictions should be of shape (batch_size, 1)"
        assert y.ndimension() == 1, "Targets should be of shape (batch_size,)"        
        assert torch.equal(y**2, y), 'Targets should be binary (0 or 1)'

        y_pred = y_pred.squeeze(dim=-1).to('cpu')
        y = y.to('cpu')

        self._scores = torch.cat([self._scores, y_pred.type_as(self._scores)], dim=0)
        self._targets = torch.cat([self._targets, y], dim=0)  

    def compute(self):                

        n_samples = self._scores.shape[0]
        if n_samples == 0:
            raise NotComputableError('AUC must have at least one example before it can be computed')

        # sorting the arrays
        scores, sortind = torch.sort(self._scores, dim=0, descending=True)

        # creating the roc curve
        n = n_samples + 1
        tpr = torch.zeros(n, dtype=torch.float64)
        fpr = torch.zeros(n, dtype=torch.float64)

        # THE FOLLOWING IS SLOW AND NOT CORRECT IF PROBAS ARE SAME
        for i in range(1, n):
            if self._targets[sortind[i - 1]] == 1:
                tpr[i] = tpr[i - 1] + 1
                fpr[i] = fpr[i - 1]
            else:
                tpr[i] = tpr[i - 1]
                fpr[i] = fpr[i - 1] + 1
        # End of THE FOLLOWING IS SLOW AND NOT CORRECT IF PROBAS ARE SAME

        targets_sum = self._targets.sum().item() * 1.0
        tpr /= targets_sum
        inv_targets_sum = (n_samples - targets_sum) * 1.0
        fpr /= inv_targets_sum

        # calculating area under curve using trapezoidal rule
        n = tpr.shape[0]
        h = fpr[1:n] - fpr[0:n - 1]
        sum_h = torch.zeros_like(fpr)
        sum_h[0:n - 1] = h
        sum_h[1:n] += h
        area = (sum_h * tpr).sum().item() / 2.0

        return area, tpr, fpr

So the idea was to ask user to code its compute function.
But I agree that built-in function can be better than sklearn dependency...

vfdev-5 on 19 Jun 2018

Sorry, I don't think I communicated that very clearly. I was agreeing with you for the most part. I was saying an EpochMetric where users can provide their own compute functions sounds like a good idea.

Eventually we can add the built-in metrics but in the meantime, the EpochMetric can make things easier than it is today.

jasonkriss on 20 Jun 2018

@jasonkriss thanks for more the explanation. So, I can go on with a code similar to the one I proposed above and we'll see...

vfdev-5 on 20 Jun 2018

@vfdev-5 are you planning on implementing some of these metrics? Perhaps we should open a new issue and just gather a list of metrics we want to implement for the next release

alykhantejani on 1 Jul 2018

@alykhantejani I'm planning at first to provide something like EpochMetric as described here and maybe latter go on with built-in AUC/mAP metrics.
Agree with you on new issues, we can go ahead with this too.
I can create firstly an issue on EpochMetric if we are ok with this approach..

vfdev-5 on 1 Jul 2018

👍1

This has now been added in #235

alykhantejani on 22 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings