Fasttext: Loss - OVA model - Not predicting sigmoid output in Ubuntu 16.04

Created on 21 Jun 2019 · 15Comments · Source: facebookresearch/fastText

Install Log:

c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/args.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/matrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/dictionary.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/loss.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/productquantizer.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/densematrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/quantmatrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/vector.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/model.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/utils.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/meter.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/fasttext.cc
src/fasttext.cc: In member function ‘void fasttext::FastText::quantize(const fasttext::Args&)’:
src/fasttext.cc:323:16: warning: ‘std::vector fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations]
auto idx = selectEmbeddings(qargs.cutoff);
^
src/fasttext.cc:293:22: note: declared here
std::vector FastText::selectEmbeddings(int32_t cutoff) const {
^
src/fasttext.cc:323:45: warning: ‘std::vector fasttext::FastText::selectEmbeddings(int32_t) const’ is deprecated: selectEmbeddings is being deprecated. [-Wdeprecated-declarations]
auto idx = selectEmbeddings(qargs.cutoff);
^
src/fasttext.cc:293:22: note: declared here
std::vector FastText::selectEmbeddings(int32_t cutoff) const {
^
src/fasttext.cc: In member function ‘void fasttext::FastText::lazyComputeWordVectors()’:
src/fasttext.cc:551:5: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
precomputeWordVectors(wordVectors_);
^
src/fasttext.cc:534:6: note: declared here
void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
^
src/fasttext.cc:551:40: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
precomputeWordVectors(wordVectors_);
^
src/fasttext.cc:534:6: note: declared here
void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
^
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops args.o matrix.o dictionary.o loss.o productquantizer.o densematrix.o quantmatrix.o vector.o model.o utils.o meter.o fasttext.o src/main.cc -o fasttext

The output is not sigmoid. Its still same as the Softmax.
Args:
dim 100
ws 5
epoch 1
minCount 1
neg 5
wordNgrams 3
loss one-vs-all
model sup
bucket 1000000
minn 3
maxn 3
lrUpdateRate 100
t 0.0001

bug

Source

giriannamalai

All 15 comments

The sample output k - 1:
__label__1 0.212079 __label__2 0.144159 __label__3 0.0675567 __label__4 0.0251888 __label__6 0.0197291 __label__5 0.0197291

Others are lesser than this. I'm not getting, why its not giving independent probability.

giriannamalai on 21 Jun 2019

Hi @giriannamalai ,
Thank you for reporting.
Can you provide the exact command lines you are using?

Regards,
Onur

Celebio on 21 Jun 2019

@Celebio
To train:
fasttext supervised -input train.txt -loss ova -minn 3 -dim 100 -bucket 1000000 -epoch 10000 -maxn 3 -minCount 1 -lr 0.005 -wordNgrams 3 -output model

Quantize:
fasttext quantize -output model -input train.txt -qnorm -retrain -epoch 1 -cutoff 100000

giriannamalai on 22 Jun 2019

Even, I tried without using quantize the meodel it. The output is same as softmax.

giriannamalai on 22 Jun 2019

@giriannamalai Re your example above, how do you know the probabilities are not independent?
(In my case #830 ) the probabilities simply add up to one, but in your example I don't see this happening..

hminooei on 22 Jun 2019

It's just a sample. I have more than 100 labels.

giriannamalai on 23 Jun 2019

Hi @giriannamalai , @hminooei ,
I can't reproduce the issue. With fastText at the latest commit, I get :

>>> import fastText
>>> model = fastText.train_supervised("data/cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ova')
>>> model.predict("Which baking dish is best to bake a banana bread ?", k=10)
((u'__label__baking', u'__label__bread', u'__label__equipment', u'__label__oven', u'__label__rising', u'__label__temperature', u'__label__crust', u'__label__baking-powder', u'__label__muffins', u'__label__yeast'), array([0.9944551 , 0.97069776, 0.32767832, 0.10375863, 0.03847619,
       0.03847619, 0.03022459, 0.02932223, 0.02443309, 0.02097424]))

on mac os x, with this train data.

Do you both use Ubuntu ? Can you try the commands above on your system?

Regards,
Onur

Celebio on 25 Jun 2019

I use macOs Sierra.

With the cooking data, I also get multi-label results for 'ova':
model = fastText.train_supervised("cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ova') model.predict("Which baking dish is best to bake a banana bread ?", k=10)
(('__label__baking', '__label__bread', '__label__equipment', '__label__oven', '__label__crust', '__label__temperature', '__label__pie', '__label__cooking-time', '__label__cookies', '__label__muffins'), array([ 0.99194801, 0.76630366, 0.16027603, 0.09010299, 0.03022459, 0.02844604, 0.02676929, 0.02097424, 0.01591639, 0.01407363]))

However, if I use 'ns', it also outputs multi-label results:
model = fastText.train_supervised("cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ns') model.predict("Which baking dish is best to bake a banana bread ?", k=10) (('__label__baking', '__label__bread', '__label__cake', '__label__equipment', '__label__dough', '__label__oven', '__label__cookies', '__label__flour', '__label__yeast', '__label__sourdough'), array([ 0.92193186, 0.89626139, 0.69265199, 0.52343035, 0.51562995, 0.45327184, 0.41490886, 0.34159252, 0.3208313 , 0.30075559]))
is 'ns' intended to be multi-label or multi-class?

In my case that I see 'ova' is not producing multi-label, my training data is essentially labeled in a binary classification fashion (i.e. there's only two labels, and each line has exactly one label). Here is an example of outputs:
trained_model = train_supervised( input=train_data_path, lr=1, dim=100, ws=5, epoch=5, minCount=1, minCountLabel=0, minn=2, maxn=3, neg=5, wordNgrams=2, loss="ova", bucket=200000, lrUpdateRate=100, t=1e-4, label="__label__", verbose=2, pretrainedVectors="", )
trained_model.predict("it's not a cool software but i really like it", k=-1) trained_model.predict("yeah..", k=-1) trained_model.predict("he loves you. he hates you", k=-1) (('__label__0', '__label__1'), array([ 0.83974397, 0.15611489])) (('__label__1', '__label__0'), array([ 0.76630366, 0.22816648])) (('__label__1', '__label__0'), array([ 0.97483116, 0.02443309]))

hminooei on 25 Jun 2019

Hi @hminooei ,
Thank you for your answer.
In your train data, do __label__0 and __label__1 appear exclusively? I mean, for each sample, do you have either __label__0 or __label__1 but never both? In such a case, the independent classifiers of "ova" will indeed be complementary and their probabilities sum to 1.

Celebio on 25 Jun 2019

That's right, they appear exclusively.
So, essentially in this case, 'ova' behaves in a multi-class fashion. Does this happen for the case of 2 classes only? In other words if you have more than 2 classes (e.g. 3 classes) with exclusive labels for each training data, you get multi-class classification again? If the answer is yes, I think we could just mention this as expected behavior in the tutorial doc, otherwise IMHO it looks like a bug.

hminooei on 25 Jun 2019

ova is always multi-label classification, it trains on independent sigmoids for each label, no matter how many labels you have.

When two labels appear exclusively, with __label__0 and __label__1 you are providing the same information: __label__1 just means "absence of __label__0", that's why you end up with complementary probabilities.

Celebio on 26 Jun 2019

👍1

I could see that thanks!
Is 'ns' multi-label behavior also expected?

hminooei on 26 Jun 2019

@Celebio I'm using 64bit Ubuntu
But I tried in a docker container, OVA model is working well. Its also Ubuntu 16.04.

giriannamalai on 26 Jun 2019

Celebio on 27 Jun 2019

😕1

Was this page helpful?

0 / 5 - 0 ratings