I tried to re-implement the ROC_AUC like the one in contrib.metrics.roc_auc, and passed some extra kwarg options to support for multiclass and multilabel classification. However, when I run my code, an Error "Number of classes in y_true not equal to the number of columns in 'y_score' " was raised by scikit-learn. I checked they_true of the whole epoch, and the number of unique values was equal to the number of columns of y_score. Next, I checked the source code of EpochMetric, and found that in update function, it tried to execute the compute_fn at the first iteration. I supposed this was the cause: they_true of a single batch might not contain all the classes of the dataset, therefore scikit-learn would throws the error mentioned above. The same problem occured for AveragePrecision.
conda, pip, source): pip@sandylaker thanks for the report !
Yes, that's true that we check on the first iteration if compute_fn can compute a value. This is a sort of sanity check. But, as you see in the code:
https://github.com/pytorch/ignite/blob/ebd1876a12ebe16403889e0ede6de61d84c1b44b/ignite/metrics/epoch_metric.py#L73-L78
it is wrapped by try/except and only warns if there is a problem. I do not quite understand why you have an error instead of a warning...
Could you please provide a minimal snippet to reproduce the issue. Thanks !
@sandylaker thanks for the report !
Yes, that's true that we check on the first iteration if
compute_fncan compute a value. This is a sort of sanity check. But, as you see in the code:
https://github.com/pytorch/ignite/blob/ebd1876a12ebe16403889e0ede6de61d84c1b44b/ignite/metrics/epoch_metric.py#L73-L78it is wrapped by try/except and only warns if there is a problem. I do not quite understand why you have an error instead of a warning...
Could you please provide a minimal snippet to reproduce the issue. Thanks !
Sorry, I apologize for my wrong description, yes it was a RuntimeWarining instead of an Error. But for AveragePrecision, the warning said "RuntimeWarning: invalid value encountered in true_divide recall = tps / tps[-1]".
The evaluation took a whole night, while it should be computed whin several minutes. After I removed the signature check block, this problem was eliminated. So I hope that you guys could improve the EpochMetric class.
Here is my code, it does nothing specially, but just add more fixed arguments:
def roc_auc_compute_fn(y_preds: Tensor, y_targets: Tensor):
y_true = y_targets.numpy()
y_pred = y_preds.numpy()
return roc_auc_score(y_true, y_pred, average='weighted', multi_class='ovr')
def average_precision_compute_fn(y_preds: Tensor, y_targets: Tensor):
y_targets = y_targets.numpy()
y_pred = y_preds.numpy()
y_true = np.zeros_like(y_pred, dtype=int)
y_true[np.arange(y_true.shape[0]), y_targets] = 1
return average_precision_score(y_true, y_pred, average='weighted')
class ROC_AUC(EpochMetric):
def __init__(self, output_transform=lambda x: x):
super(ROC_AUC, self).__init__(roc_auc_compute_fn, output_transform)
class AveragePrecision(EpochMetric):
def __init__(self, output_transform=lambda x: x):
super(AveragePrecision, self).__init__(average_precision_compute_fn, output_transform)
Thanks for the update and sorry for inconvenience with this sanity check. Let me reproduce it.
In the master version we replaced RuntimeWarining by a specific warning to ignore the warning. Maybe, we can introduce an argument to disable it, if user knows what he/she is doing...
EDIT: @sandylaker do you think it would help to introduce such kwargs into the implementations of ROC_AUC and AveragePrecision or it is OK for you to rewrite roc_auc_compute_fn and average_precision_compute_fn ?
@sandylaker could you please provide some random data that can help reproduce the issue ? Thanks
@sandylaker could you please provide some random data that can help reproduce the issue ? Thanks
So my dataset contains 13 classes, maybe you can generate some random labels. If batch size is set to a number smaller than 13, let's say 8, so the first batch output cannot contain all the 13 classes. The output tensor has shape (8, 13), while the unique classes of y_target is at most 8. If we try to compute the metrics based on the first batch output, it would result in errors raised by sklearn roc_auc_score and average_precision_score (for multiclasses).
OK, I could reproduce the warning.
Anyway, let's introduce an argument to disable this check. By default, it is True (run checking). Such that if user knows what he/she is doing, can disable it.
@sandylaker sorry for delay with the issue. We just merged a PR that should introduce the argument as discussed here. This functionality will be available in nightly in 12h and in stable 0.4.0 in 1.5-2 weeks.