Install Log:
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/args.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/matrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/dictionary.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/loss.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/productquantizer.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/densematrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/quantmatrix.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/vector.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/model.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/utils.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/meter.cc
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops -c src/fasttext.cc
src/fasttext.cc: In member function ‘void fasttext::FastText::quantize(const fasttext::Args&)’:
src/fasttext.cc:323:16: warning: ‘std::vector
auto idx = selectEmbeddings(qargs.cutoff);
^
src/fasttext.cc:293:22: note: declared here
std::vector
^
src/fasttext.cc:323:45: warning: ‘std::vector
auto idx = selectEmbeddings(qargs.cutoff);
^
src/fasttext.cc:293:22: note: declared here
std::vector
^
src/fasttext.cc: In member function ‘void fasttext::FastText::lazyComputeWordVectors()’:
src/fasttext.cc:551:5: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
precomputeWordVectors(wordVectors_);
^
src/fasttext.cc:534:6: note: declared here
void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
^
src/fasttext.cc:551:40: warning: ‘void fasttext::FastText::precomputeWordVectors(fasttext::DenseMatrix&)’ is deprecated: precomputeWordVectors is being deprecated. [-Wdeprecated-declarations]
precomputeWordVectors(wordVectors_);
^
src/fasttext.cc:534:6: note: declared here
void FastText::precomputeWordVectors(DenseMatrix& wordVectors) {
^
c++ -pthread -std=c++0x -march=native -O3 -funroll-loops args.o matrix.o dictionary.o loss.o productquantizer.o densematrix.o quantmatrix.o vector.o model.o utils.o meter.o fasttext.o src/main.cc -o fasttext
The output is not sigmoid. Its still same as the Softmax.
Args:
dim 100
ws 5
epoch 1
minCount 1
neg 5
wordNgrams 3
loss one-vs-all
model sup
bucket 1000000
minn 3
maxn 3
lrUpdateRate 100
t 0.0001
The sample output k - 1:
__label__1 0.212079 __label__2 0.144159 __label__3 0.0675567 __label__4 0.0251888 __label__6 0.0197291 __label__5 0.0197291
Others are lesser than this. I'm not getting, why its not giving independent probability.
Hi @giriannamalai ,
Thank you for reporting.
Can you provide the exact command lines you are using?
Regards,
Onur
@Celebio
To train:
fasttext supervised -input train.txt -loss ova -minn 3 -dim 100 -bucket 1000000 -epoch 10000 -maxn 3 -minCount 1 -lr 0.005 -wordNgrams 3 -output model
Quantize:
fasttext quantize -output model -input train.txt -qnorm -retrain -epoch 1 -cutoff 100000
Even, I tried without using quantize the meodel it. The output is same as softmax.
@giriannamalai Re your example above, how do you know the probabilities are not independent?
(In my case #830 ) the probabilities simply add up to one, but in your example I don't see this happening..
It's just a sample. I have more than 100 labels.
Hi @giriannamalai , @hminooei ,
I can't reproduce the issue. With fastText at the latest commit, I get :
>>> import fastText
>>> model = fastText.train_supervised("data/cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ova')
>>> model.predict("Which baking dish is best to bake a banana bread ?", k=10)
((u'__label__baking', u'__label__bread', u'__label__equipment', u'__label__oven', u'__label__rising', u'__label__temperature', u'__label__crust', u'__label__baking-powder', u'__label__muffins', u'__label__yeast'), array([0.9944551 , 0.97069776, 0.32767832, 0.10375863, 0.03847619,
0.03847619, 0.03022459, 0.02932223, 0.02443309, 0.02097424]))
on mac os x, with this train data.
Do you both use Ubuntu ? Can you try the commands above on your system?
Regards,
Onur
Hi
I use macOs Sierra.
With the cooking data, I also get multi-label results for 'ova':
model = fastText.train_supervised("cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ova')
model.predict("Which baking dish is best to bake a banana bread ?", k=10)
(('__label__baking',
'__label__bread',
'__label__equipment',
'__label__oven',
'__label__crust',
'__label__temperature',
'__label__pie',
'__label__cooking-time',
'__label__cookies',
'__label__muffins'),
array([ 0.99194801, 0.76630366, 0.16027603, 0.09010299, 0.03022459,
0.02844604, 0.02676929, 0.02097424, 0.01591639, 0.01407363]))
However, if I use 'ns', it also outputs multi-label results:
model = fastText.train_supervised("cooking.train", wordNgrams=2, lr=0.5, dim=50, loss='ns')
model.predict("Which baking dish is best to bake a banana bread ?", k=10)
(('__label__baking',
'__label__bread',
'__label__cake',
'__label__equipment',
'__label__dough',
'__label__oven',
'__label__cookies',
'__label__flour',
'__label__yeast',
'__label__sourdough'),
array([ 0.92193186, 0.89626139, 0.69265199, 0.52343035, 0.51562995,
0.45327184, 0.41490886, 0.34159252, 0.3208313 , 0.30075559]))
is 'ns' intended to be multi-label or multi-class?
In my case that I see 'ova' is not producing multi-label, my training data is essentially labeled in a binary classification fashion (i.e. there's only two labels, and each line has exactly one label). Here is an example of outputs:
trained_model = train_supervised(
input=train_data_path,
lr=1,
dim=100,
ws=5,
epoch=5,
minCount=1,
minCountLabel=0,
minn=2,
maxn=3,
neg=5,
wordNgrams=2,
loss="ova",
bucket=200000,
lrUpdateRate=100,
t=1e-4,
label="__label__",
verbose=2,
pretrainedVectors="",
)
trained_model.predict("it's not a cool software but i really like it", k=-1)
trained_model.predict("yeah..", k=-1)
trained_model.predict("he loves you. he hates you", k=-1)
(('__label__0', '__label__1'), array([ 0.83974397, 0.15611489]))
(('__label__1', '__label__0'), array([ 0.76630366, 0.22816648]))
(('__label__1', '__label__0'), array([ 0.97483116, 0.02443309]))
Hi @hminooei ,
Thank you for your answer.
In your train data, do __label__0 and __label__1 appear exclusively? I mean, for each sample, do you have either __label__0 or __label__1 but never both? In such a case, the independent classifiers of "ova" will indeed be complementary and their probabilities sum to 1.
That's right, they appear exclusively.
So, essentially in this case, 'ova' behaves in a multi-class fashion. Does this happen for the case of 2 classes only? In other words if you have more than 2 classes (e.g. 3 classes) with exclusive labels for each training data, you get multi-class classification again? If the answer is yes, I think we could just mention this as expected behavior in the tutorial doc, otherwise IMHO it looks like a bug.
ova is always multi-label classification, it trains on independent sigmoids for each label, no matter how many labels you have.
When two labels appear exclusively, with __label__0 and __label__1 you are providing the same information: __label__1 just means "absence of __label__0", that's why you end up with complementary probabilities.
I could see that thanks!
Is 'ns' multi-label behavior also expected?
@Celebio I'm using 64bit Ubuntu
But I tried in a docker container, OVA model is working well. Its also Ubuntu 16.04.