Fasttext: Loss - OVA model - Not predicting sigmoid output in Ubuntu 16.04

Created on 21 Jun 2019  Â·  15Comments  Â·  Source: facebookresearch/fastText

Install Log:

c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/args.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/matrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/dictionary.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/loss.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/productquantizer.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/densematrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/quantmatrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/vector.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/model.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/utils.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/meter.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/fasttext.cc
src/fasttext.cc: In member function ‘void fasttext::FastText::quantize(const fasttext::Args&)’:
src/fasttext.cc:323:16: warning: ‘std::vector fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations]
auto idx = selectEmbeddings(qargs.cutoff);
^
src/fasttext.cc:293:22: note: declared here
std::vector FastText::selectEmbeddings(int32_t cutoff) const {
^
src/fasttext.cc:323:45: warning: ‘std::vector fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations]
auto idx = selectEmbeddings(qargs.cutoff);
^
src/fasttext.cc:293:22: note: declared here
std::vector FastText::selectEmbeddings(int32_t cutoff) const {
^
src/fasttext.cc: In member function ‘void fasttext::FastText::lazyComputeWordVectors()’:
src/fasttext.cc:551:5: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
precomputeWordVectors(wordVectors_);
^
src/fasttext.cc:534:6: note: declared here
void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
^
src/fasttext.cc:551:40: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
precomputeWordVectors(
wordVectors_);
^
src/fasttext.cc:534:6: note: declared here
void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
^
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops args.o matrix.o dictionary.o loss.o productquantizer.o densematrix.o quantmatrix.o vector.o model.o utils.o meter.o fasttext.o src/main.cc -o fasttext

The output is not sigmoid. Its still same as the Softmax.
Args:
dim 100
ws 5
epoch 1
minCount 1
neg 5
wordNgrams 3
loss one-vs-all
model sup
bucket 1000000
minn 3
maxn 3
lrUpdateRate 100
t 0.0001

bug

All 15 comments

The sample output k - 1:
__label__1 0.212079 __label__2 0.144159 __label__3 0.0675567 __label__4 0.0251888 __label__6 0.0197291 __label__5 0.0197291

Others are lesser than this. I'm not getting, why its not giving independent probability.

Hi @giriannamalai ,
Thank you for reporting.
Can you provide the exact command lines you are using?

Regards,
Onur

@Celebio
To train:
fasttext supervised -input train.txt -loss ova -minn 3 -dim 100 -bucket 1000000 -epoch 10000 -maxn 3 -minCount 1 -lr 0.005 -wordNgrams 3 -output model

Quantize:
fasttext quantize -output model -input train.txt -qnorm -retrain -epoch 1 -cutoff 100000

Even, I tried without using quantize the meodel it. The output is same as softmax.

@giriannamalai Re your example above, how do you know the probabilities are not independent?
(In my case #830 ) the probabilities simply add up to one, but in your example I don't see this happening..

It's just a sample. I have more than 100 labels.

Hi @giriannamalai , @hminooei ,
I can't reproduce the issue. With fastText at the latest commit, I get :

>>> import fastText
>>> model = fastText.train_supervised("data/cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ova')
>>> model.predict("Which baking dish is best to bake a banana bread ?", k=10)
((u'__label__baking', u'__label__bread', u'__label__equipment', u'__label__oven', u'__label__rising', u'__label__temperature', u'__label__crust', u'__label__baking-powder', u'__label__muffins', u'__label__yeast'), array([0.9944551 , 0.97069776, 0.32767832, 0.10375863, 0.03847619,
       0.03847619, 0.03022459, 0.02932223, 0.02443309, 0.02097424]))

on mac os x, with this train data.

Do you both use Ubuntu ? Can you try the commands above on your system?

Regards,
Onur

Hi

I use macOs Sierra.

With the cooking data, I also get multi-label results for 'ova':
model = fastText.train_supervised("cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ova') model.predict("Which baking dish is best to bake a banana bread ?", k=10)
(('__label__baking', '__label__bread', '__label__equipment', '__label__oven', '__label__crust', '__label__temperature', '__label__pie', '__label__cooking-time', '__label__cookies', '__label__muffins'), array([ 0.99194801, 0.76630366, 0.16027603, 0.09010299, 0.03022459, 0.02844604, 0.02676929, 0.02097424, 0.01591639, 0.01407363]))

However, if I use 'ns', it also outputs multi-label results:
model = fastText.train_supervised("cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ns') model.predict("Which baking dish is best to bake a banana bread ?", k=10) (('__label__baking', '__label__bread', '__label__cake', '__label__equipment', '__label__dough', '__label__oven', '__label__cookies', '__label__flour', '__label__yeast', '__label__sourdough'), array([ 0.92193186, 0.89626139, 0.69265199, 0.52343035, 0.51562995, 0.45327184, 0.41490886, 0.34159252, 0.3208313 , 0.30075559]))
is 'ns' intended to be multi-label or multi-class?


In my case that I see 'ova' is not producing multi-label, my training data is essentially labeled in a binary classification fashion (i.e. there's only two labels, and each line has exactly one label). Here is an example of outputs:
trained_model = train_supervised( input=train_data_path, lr=1, dim=100, ws=5, epoch=5, minCount=1, minCountLabel=0, minn=2, maxn=3, neg=5, wordNgrams=2, loss="ova", bucket=200000, lrUpdateRate=100, t=1e-4, label="__label__", verbose=2, pretrainedVectors="", )
trained_model.predict("it's not a cool software but i really like it", k=-1) trained_model.predict("yeah..", k=-1) trained_model.predict("he loves you. he hates you", k=-1) (('__label__0', '__label__1'), array([ 0.83974397, 0.15611489])) (('__label__1', '__label__0'), array([ 0.76630366, 0.22816648])) (('__label__1', '__label__0'), array([ 0.97483116, 0.02443309]))

Hi @hminooei ,
Thank you for your answer.
In your train data, do __label__0 and __label__1 appear exclusively? I mean, for each sample, do you have either __label__0 or __label__1 but never both? In such a case, the independent classifiers of "ova" will indeed be complementary and their probabilities sum to 1.

That's right, they appear exclusively.
So, essentially in this case, 'ova' behaves in a multi-class fashion. Does this happen for the case of 2 classes only? In other words if you have more than 2 classes (e.g. 3 classes) with exclusive labels for each training data, you get multi-class classification again? If the answer is yes, I think we could just mention this as expected behavior in the tutorial doc, otherwise IMHO it looks like a bug.

ova is always multi-label classification, it trains on independent sigmoids for each label, no matter how many labels you have.

When two labels appear exclusively, with __label__0 and __label__1 you are providing the same information: __label__1 just means "absence of __label__0", that's why you end up with complementary probabilities.

I could see that thanks!
Is 'ns' multi-label behavior also expected?

@Celebio I'm using 64bit Ubuntu
But I tried in a docker container, OVA model is working well. Its also Ubuntu 16.04.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

premrajnarkhede picture premrajnarkhede  Â·  3Comments

flybirp picture flybirp  Â·  4Comments

AhmedIdr picture AhmedIdr  Â·  3Comments

PGryllos picture PGryllos  Â·  4Comments

nomadlx picture nomadlx  Â·  3Comments