Fasttext: Continuing vector training

Created on 1 May 2017  Â·  10Comments  Â·  Source: facebookresearch/fastText

I'd like to make pre-trained vectors in stages. That is, I'd like to give fastText my corpus and settings, run for an epoch, save the resulting vectors, then continue training those vectors for another epoch and save them, etc.

It isn't clear to me whether fastText handles this at the moment. On the one hand, the skipgram and cbow commands have a -pretrainedVectors parameter. On the other, when I've tried this, when I resume training the reported loss starts out above the final loss on the prior epoch.

I think the loss is actually reporting the true loss times the learning rate, and that it is continuing to train the vectors. But I'd like to know for sure.

Most helpful comment

I have to ask my old company for permission.

On Oct 27, 2017, at 2:19 AM, Chris Davis notifications@github.com wrote:

Hi @elbamos,

Is your modified source code publicly available? I'd be interested in trying it out.

Regards,

Chris

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

All 10 comments

Hi @elbamos,

Unfortunately, it is not really possible to train word vectors in stages that way. Indeed, when you restart training, part of the model is randomly initialized (e.g. the "output" matrix and the vector corresponding to character ngrams). Moreover, the learning rate is re-initialized to its initial value, meaning that large updates are performed (thus, "overriding" the previous values).

May I ask what is the usecase for this training scheme?

@EdouardGrave The usecase is that I'm using vectors in a large project, and as part of that I want to run a series of experiments. One of the experiments I'd like to run is to evaluate the quality of vectors -- using various metrics, including how well a classifier based on the vectors performs -- at different stages of the training process, with different dataset mixtures, etc. etc.

The learning rate problem doesn't bother me because its straightforward to change that in the code.

When we start supervised training with pretrainedVectors, do these vectors get updated also during the process or do they remain fixed and only the output matrix is trained?

Hello @elbamos,

It sounds like you want to write out checkpoints at various stages during training and use them in subsequent applications. This isn't something we've needed thus far for reasons @EdouardGrave has described. We are in the process of developing this internally for other applications (such as checkpointing), but it hasn't really been a priority yet. Stay tuned for upcoming updates.

Thanks,
Christian

I was able to solve this a ways back by modifying the source.  Thanks!

On July 3, 2017 at 7:21:04 PM, cpuhrsch ([email protected]) wrote:

Hello @elbamos,

It sounds like you want to write out checkpoints at various stages during training and use them in subsequent applications. This isn't something we've needed thus far for reasons @EdouardGrave has described. We are in the process of developing this internally for other applications (such as checkpointing), but it hasn't really been a priority yet. Stay tuned for upcoming updates.

Thanks,
Christian

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Hi @elbamos,

Is your modified source code publicly available? I'd be interested in trying it out.

Regards,

Chris

I have to ask my old company for permission.

On Oct 27, 2017, at 2:19 AM, Chris Davis notifications@github.com wrote:

Hi @elbamos,

Is your modified source code publicly available? I'd be interested in trying it out.

Regards,

Chris

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

any updates on this @elbamos?

Any updates on this? Can we checkpoint now?

I haven’t thought about this in 18 months and don’t intend to return to it soon.

On Nov 23, 2018, at 7:57 PM, abeerunscore96 notifications@github.com wrote:

Any updates on this? Can we checkpoint now?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

loretoparisi picture loretoparisi  Â·  3Comments

premrajnarkhede picture premrajnarkhede  Â·  3Comments

shriiitk picture shriiitk  Â·  3Comments

a11apurva picture a11apurva  Â·  3Comments

flybirp picture flybirp  Â·  4Comments