I think sequence tagger also has the same issue as #439 because of
for tag, gold in gold_tags:
if (tag, gold) not in predicted_tags:
metric.add_fn(tag)
else:
metric.add_tn(tag)
in _evaluate_sequence_tagger()
I am having the incorrect results:
tp: 2834 - fp: 671 - fn: 1057 - tn: 2834 - precision: 0.8086 - recall: 0.7283 - accuracy: 0.7664 - f1-score: 0.7664
Ah, thanks for spotting this! Care to do another pull request? :)
I am quite new to github, but let me try my best:) I believe it is a great learning process for me.
Generally speaking, moving Metric class to a new file, separating evaluation from trainer and adding some unit tests would be very helpful. Is it ok if I introduce such refactoring and unit tests in future PR, @alanakbik ?
@kubapok sure that would be great! How would you separate evaluation from the trainer?
@alanakbik I gave it a second thought. Does it really make sense to calculate _tn_ for sequence tagger? If so, how to define _tn_ for each type of tag? It is different from text classification where _tn_ for each label can be easily calculated.
Yes I guess that is the problem - perhaps we should not count true negatives for sequence tagging. I'll close the issue then, but feel free to reopen if you have further comments!
Yes we should not count true negatives. We can ignore _tn_ in the output at this point of time. Or we can simply comment out metric.add_tn(tag), then _tn_ will be initiated as 0.
Most helpful comment
I am quite new to github, but let me try my best:) I believe it is a great learning process for me.