In my NLP Pipeline, I'm using multiple Embedding like BERT, Glove and combining them using Stacked Embedding.
Now, I have a list of words like Location names, Company Names, First Names, Last names etc which other embedding treat as unknown. I want to use feature vectors for these words in my pipeline.
My Intuition: I can create One Hot Embedding using the list of words as corpus and then use it with other embeddings.
Do you think this is right? @alanakbik
What is your suggestion?
Please comment for doubt or clarification
Thanks in advance :)
Hello @KushalVijay do you mean you want to use these lists like gazetteers? If so, we currently do not have support for gazetteers, but this is something we are thinking to add in the near future.
Thank you for your response @alanakbik , Is there any workaround I can do? Suggestions please

Like this, Paper Link
Hi, I've done something similar for a Sequence Tagging task related to NER.
The procedure (for uni-grams) can be summed up as follows:
from gensim.models import KeyedVectors
kv = KeyedVectors(vector_size)
for sentence in sentences:
for term in sentence:
# You can add vectors in bulk using add_vectors
# Note that different Gensim version use slightly different naming for this function (i.e., add)
kv.add_vector(term["key"], term["features"])
kv.save(keyed_vectors_path)
corpus: Corpus = ColumnCorpus(
data_folder,
columns={0: "text", 1: "label", 2: "key"}, # THIS
train_file="train.iob",
dev_file="dev.iob",
test_file="test.iob",
)
embedding_types = [
WordEmbeddings(embeddings=keyed_vectors_path, field="key"), # THIS
...
]
Note that the procedure can be extended to bi-grams (or n-grams) by simply taking into account the bi-grams of the known named entities for each classes when computing the word vectors.
You can use the same procedure to add anything you want actually.
In my case I only use embeddings as input to a CRF with no layers in the middle as it works better in my case.
I suspect projecting these kind of embeddings to perform poorely because the information get lost by mixing them with say BERT or FLAIR embeddings.
Hope this can be of help.
Thanks for sharing such amazing stuff @AmenRa, Even I'm working with Uni-Grams Only. I'll try to use your approach, Hope it will give better results.
We are working over some hypothesis so no matter if that performs well or not.
The thread is still open, Please if there are any suggestions.