Flair: Contextual embedding don't seem to work

Created on 11 Jun 2019  路  11Comments  路  Source: flairNLP/flair

Trying to explore the contxtual side of Flair embeddings with a simple example:

# your query
query = 'The capital of Washington'

# some texts
sentences = [
    'George Washington addressed his supporters',
    'Taking a flight to Washington tonight',
    'Arkansaw is a lovely state',
    'George Washington was a great president',
]

# first, declare how you want to embed
embeddings = DocumentPoolEmbeddings([FlairEmbeddings('news-forward'), 
                                     FlairEmbeddings('news-backward')
                                    ])

# embed
q = Sentence(query)
embeddings.embed(q)

# use cosine distance
cos = torch.nn.CosineSimilarity(dim=0, eps=1e-6)

for sentence in sentences:
    s = Sentence(sentence)
    embeddings.embed(s)
    prox = cos(q.embedding, s.embedding)
    print(query, ' - ', sentence, ' - ', prox)

Results:

The capital of Washington  -  George Washington addressed his supporters  -  0.3869
The capital of Washington  -  Taking a flight to Washington tonight  -  0.4389
The capital of Washington  -  Arkansaw is a lovely state  -  0.3746
The capital of Washington  -  George Washington was a great president  -  0.3629

Would've expected much higher scores on the 'Geo context' sentences
Am I doing something wrong?

question

Most helpful comment

Hello @eliehamouche @stefan-it thanks for sharing these results!

Another idea would be not to use the cosine of document vectors as a measure of similarity, but different measures that get document similarity based on word embeddings. An example of this would be the word mover's distance: Like document pool embeddings, it need not be trained so it can be used without supervision. We don't yet have it in Flair, but I think it's probably not difficult to implement and experiment with. It might be interesting to see how well word mover's distance works with different types of contextualized word embeddings.

All 11 comments

Technically, the code looks good. Here are some other comparisons with BERT and ELMo:

| LM | Sentence | Similarity
| ------------------------ | ------------------------------------------ | -----------
| BERT (bert-base-uncased) | George Washington addressed his supporters | 0.6652
| BERT (bert-base-uncased) | Taking a flight to Washington tonight | 0.6186
| BERT (bert-base-uncased) | Arkansaw is a lovely state | 0.5656
| BERT (bert-base-uncased) | George Washington was a great president | 0.6955
| BERT (bert-base-cased) | George Washington addressed his supporters | 0.8641
| BERT (bert-base-cased) | Taking a flight to Washington tonight | 0.8477
| BERT (bert-base-cased) | Arkansaw is a lovely state | 0.8385
| BERT (bert-base-cased) | George Washington was a great president | 0.8622
| BERT (bert-large-uncased)| George Washington addressed his supporters | 0.7823
| BERT (bert-large-uncased)| Taking a flight to Washington tonight | 0.7476
| BERT (bert-large-uncased)| Arkansaw is a lovely state | 0.7185
| BERT (bert-large-uncased)| George Washington was a great president | 0.8058
| BERT (bert-large-cased) | George Washington addressed his supporters | 0.8190
| BERT (bert-large-cased) | Taking a flight to Washington tonight | 0.7761
| BERT (bert-large-cased) | Arkansaw is a lovely state | 0.7934
| BERT (bert-large-cased) | George Washington was a great president | 0.8424
| ELMo | George Washington addressed his supporters | 0.3986
| ELMo | Taking a flight to Washington tonight | 0.4577
| ELMo | Arkansaw is a lovely state | 0.3902
| ELMo | George Washington was a great president | 0.3886
| GPT-1 | George Washington addressed his supporters | 0.8232
| GPT-1 | Taking a flight to Washington tonight | 0.8396
| GPT-1 | Arkansaw is a lovely state | 0.7307
| GPT-1 | George Washington was a great president | 0.8003
| Transformer-XL | George Washington addressed his supporters | 0.2481
| Transformer-XL | Taking a flight to Washington tonight | 0.1841
| Transformer-XL | Arkansaw is a lovely state | 0.3009
| Transformer-XL | George Washington was a great president | 0.2997

ELMo looks quite similar to the result with Flair Embeddings :)

Spelling Arkansas correctly may help the model realize that it's a geolocation.

Also, given the 4 sentences, Flair correctly ranks the "Taking a flight to Washington tonight " as the most important, so I don't see the problem. Maybe you'd like the difference in similarity to be higher.

I'd like to see how TransformerXL and GPT-2 do on this, and maybe even word2vec / fasttext

@stefan-it - thanks for the table that's quite interesting.

@Hellisotherpeople Arkansaw is a town in Wisconsin. Would expect it to pick up on that as geolocation too.

Sorry just realised I said it's a lovely state in the example - I see how that's misleading.

As requested, I added the scores for GPt-1 and Transformer-XL 馃

Thanks @stefan-it

Tried some other examples with Flair - these actually work well:

the bucket and mop are in the closet  -  he kicked the bucket  -  0.5848
the bucket and mop are in the closet  -  i have yet to cross-off all the items on my bucket list  - 0.5263
the bucket and mop are in the closet  -  the bucket was filled with water  - 0.6970
he is currently resting at home  -  the dog sleeps in the kennel  - 0.4730
he is currently resting at home  -  he lived in a beautiful mansion  - 0.5347
he is currently resting at home  -  the home office issued penalties for late filing  - 0.4030
he is currently resting at home  -  press the home button on your phone  -  0.3302

Anyone have any further insight or ideas?
If not I'll close this out later on

Hello @eliehamouche @stefan-it thanks for sharing these results!

Another idea would be not to use the cosine of document vectors as a measure of similarity, but different measures that get document similarity based on word embeddings. An example of this would be the word mover's distance: Like document pool embeddings, it need not be trained so it can be used without supervision. We don't yet have it in Flair, but I think it's probably not difficult to implement and experiment with. It might be interesting to see how well word mover's distance works with different types of contextualized word embeddings.

Hey @alanakbik - sorry for the delay missed the notification

That looks quite interesting actually, I'll do a quick comparison and revert back.

@alanakbik @eliehamouche Hello, really thank you provide the transformerXL embedding, but I have a question, if i train my own transformerXL embedding, it seems that it cannot be integrated into the embedding in Flair, such as elmo, I just provide the option_file and the weight_file.

@songtaoshi I will push a follow-up PR for passing custom models into the newly added embeddings very soon (I've also trained a few XLNet models) :)

@stefan-it Wow great !!!!! thanks for your replying. Really looking forward the new PR.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jewl123 picture jewl123  路  3Comments

isanvicente picture isanvicente  路  3Comments

happypanda5 picture happypanda5  路  3Comments

ChessMateK picture ChessMateK  路  3Comments

mnishant2 picture mnishant2  路  3Comments