Allennlp: Hello everyone. Please I need to cluster word vectors using DBSCAN. What I did was to tokenised all the words in the corpus and remove stopwords. After I collected all the words in the corpus into a list , so my tokenised words are no longer sentences but tokens of words in a list. Thereafter, I used ELMo to find the word vector. But I feel that is not the right approach. Please anyone who is doing something similar could help me. Thanks

Created on 9 May 2019  Â·  7Comments  Â·  Source: allenai/allennlp

Please first search our GitHub repository for similar questions. If you don't find a similar example you can use the following template:

System (please complete the following information):

  • OS: [e.g. OSX, Linux]
  • Python version: [e.g. 3.6.1]
  • AllenNLP version: [e.g. v0.8.3, or "I installed from master"]
  • PyTorch version: (if you installed it yourself)

Question
Ask about something you don't understand, such as:

  • How can I retrain my own ELMo language model?

  • I'm working on a sentiment classification task and I would like to discard sentences that are longer than 1200 characters (my sentences vary from 400 characters to 1400+). How can I configure my model to discard long sentences?

Most helpful comment

This particular error makes me think you need a basic python tutorial, not specific help with allennlp.
We unfortunately don't have the bandwidth to teach you python, but there are plenty of other online resources that should help here.

All 7 comments

Hi @SallyAfua! The main issue with the approach you highlighted is that ELMo is designed to produce word vectors from words in the context of sentences. So your intuition is correct, operating on individual words usually isn't the best approach with ELMo. I'd suggest that you apply ELMo to your original sentences. Keep the stopwords too. After that you'll have a word vector for each use of a word. You can cluster these however you'd like.

We have a tool for producing these embeddings in bulk that may help you. You can find it here: https://github.com/allenai/allennlp/blob/master/allennlp/commands/elmo.py Good luck!

Closing for now. Please comment if you have any followup questions. Thanks!

Thanks Brendan I will execute your codes and give you feedback.

On Fri, 10 May 2019 at 22:09, Brendan Roof notifications@github.com wrote:

Closing for now. Please comment if you have any followup questions. Thanks!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/allenai/allennlp/issues/2821#issuecomment-491444192,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AL6NG4BFFA6KG4IGIKMDLWDPUXXCFANCNFSM4HL4N22Q
.

Brendan I run the codes but I had this error "NameError: name 'self' is not defined". Please could you help me fix it. Thank you.

This particular error makes me think you need a basic python tutorial, not specific help with allennlp.
We unfortunately don't have the bandwidth to teach you python, but there are plenty of other online resources that should help here.

Thank you. I was able to fix it.

This particular error makes me think you need a basic python tutorial, not specific help with allennlp.
We unfortunately don't have the bandwidth to teach you python, but there are plenty of other online resources that should help here.

This is such a friendly reply! I want to thank the whole team for taking the time for each and every issue that's submitted in here. This made my day better!!

Was this page helpful?
0 / 5 - 0 ratings