I'm using Word2Vec to train an initial embedding for my dataset. This is done for a labeling task, which necessitates that I pad the input values. In my Word2Vec model, the 0 (standard masking value) maps to the word "the", which I would rather not mask out. Due to the optimization methods Word2Vec uses to store its indices, it is nontrivial to change this value.
In accordance with the CONTRIBUTING.md document, I would like to propose using arbitrary values as the masking value. This is already possible with the standard Masking layer, but the Embedding layer can only be used as the first layer (per the documentation), making this route non-viable.
I have prepared a commit that would illustrate this functionality, whilst retaining the mask_zero parameter to maintain backwards compatibility.
https://github.com/StephanHeijl/keras/commit/1ebb646cd0fdd0eba34018b18d80c1f3841f31ac
Any thoughts on this?
Due to the optimization methods Word2Vec uses to store its indices, it is nontrivial to change this value.
Changing a mapping is basic CS and I don't really think Keras should have to accomodate for that.
Keras already accommodates for this using the core Masking layer (http://keras.io/layers/core/#masking). This would merely extend already existing functionality to this particular use case.