Looking at the current code, it seems to me that loss function are evaluated with the same weight for each class, which is OK for balanced data. For highly imbalanced data, are there any plan to support different weight for each class in loss function? I am thinking in command line, do:
fasttext -input XXX -output XXX -weight_class1 10 -weight_class2 1 -weight_class3 3
or simply
fasttext -weight_balanced
if the weight is inversely proportional to number of instances in that class?
+1
+1
Hello @kuangchen,
This is part of future work. For now you can balance classes by subsampling or upsampling (e.g. duplicating) datapoints. Indeed some simple heuristic that takes the label count into account, as you mentioned, could already be helpful. Stay tuned for updates and feel free to reopen this issue, if you don't see such changes released.
Thanks,
Christian
+1
This is something I'm also very interested in. @cpuhrsch, has it made it anywhere into the roadmap?
+1
+1
+1
+1
+1
+1
+1
I think that class imbalance is very typical scenario and I hope this initiative proceeds ! :)
+1
+1
+1
+1
@cpuhrsch, Is there any update on this?
Most helpful comment
@cpuhrsch, Is there any update on this?