Fasttext: Can FastText be used for multiple-labels per doc?

Created on 7 Sep 2016  路  8Comments  路  Source: facebookresearch/fastText

This is not an issue. just the usage is not clear for me whether and how the multiple-labels are supported.

Most helpful comment

If a doc have multiple labels, fastText randomly select one label to update the model.

Will multi-label work better if the same text will be used multiple times with different labels? For example we have text1 marked with two labels __label__1 and __label__2 and we write training set as

__label__1 , text1
__label__2 , text1

All 8 comments

If a doc have multiple labels, fastText randomly select one label to update the model. When model is trained, predict function can compute probability of each possible output label and output top-k of them in sorted order.

As @heleifz mentioned, fastText supports multiple labels per documents. At test time, use
./fasttext predict MODEL TEST_FILE k
to predict the k most probable labels. The predict-prob and test command also support top-k prediction.

thanks @heleifz @EdouardGrave for the reply. In the training stage, is the following format correct? has to be comma separated and labelled with predefined prefix for the labels?

     __label__1, __label__2, text body ... 

@Tracy2014 If you use the , without a space you will get 1, as label. __label__1, and __label__1 are considered different labels.

Checkout the data downloaded and transformed using the classification-example.sh script.

__label__4 , antonio meza cuadra , antonio meza cuadra bisso ( born 12 september 1982 ) is a peruvian footballer who plays as a striker for fbc melgar in the torneo descentralizado . he is also currently the club ' s team captain .
__label__8 , toller down , toller down is one of the highest hills in the county of dorset england . it stands 252 metres ( 827 feet ) high and is just 200 metres west of the main a356 road from dorchester to crewkerne . it is part of the dorset downs . its prominence of just under 100 metres classifies it as a sub-hump . the summit is about 2 kilometres south of the village of corscombe . there are standing stones ( the hore stones ) just north of the summit by the junction of the a356 with a minor road .

If a doc have multiple labels, fastText randomly select one label to update the model.

Will multi-label work better if the same text will be used multiple times with different labels? For example we have text1 marked with two labels __label__1 and __label__2 and we write training set as

__label__1 , text1
__label__2 , text1

@alexeypetrushin hmmmm great question. note this might "unbalance" the dataset when some have multiple labels and some not. I've not looked deeply into how fasttext works, to tell if this may somehow get auto-compensated or not. I'd just give it a go, if I had the dataset ready.

I wonder what other folks have determined in this.

@alexeypetrushin is it still the recommended way of multi labels now?

Seems like there's a better option now https://github.com/facebookresearch/StarSpace
But it has very little docs

Was this page helpful?
0 / 5 - 0 ratings