Transformers: XLNet Embeddings

Created on 16 Jul 2019 · 21Comments · Source: huggingface/transformers

How can I retrieve contextual word vectors for my dataset using XLNet ?
The usage and examples in the documentation do not include any guide to use XLNet.
Thanks.

Source

kushalj001

All 21 comments

I'm currently finishing to add the documentation but just use XLNetModel instead of BertModel in the usage example with BertModel

thomwolf on 16 Jul 2019

Thanks a lot, @thomwolf for the quick reply. I'll try it out.

kushalj001 on 16 Jul 2019

Here is an example now: https://huggingface.co/pytorch-transformers/model_doc/xlnet.html#pytorch_transformers.XLNetModel

thomwolf on 16 Jul 2019

@thomwolf, I tried the following snippet. The similarity score changes every time I run the cell. That is, the embeddings or the weights are changing every time. Is this related to dropout?

config = XLNetConfig.from_pretrained('xlnet-large-cased')

tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
model = XLNetModel(config)
input_ids = torch.tensor(tokenizer.encode("The apple juice is sour.")).unsqueeze(0) 
input_ids_2 = torch.tensor(tokenizer.encode("The orange juice is sweet.")).unsqueeze(0) 

outputs = model(input_ids)
outputs_2 = model(input_ids_2)
last_hidden_states = outputs[0] 
last_hidden_states_2 = outputs_2[0]

apple = last_hidden_states[0][1]
orange = last_hidden_states_2[0][1]

x = apple
y = orange
cos_sim = dot(x.detach().numpy(),y.detach().numpy())/(norm(x.detach().numpy())*norm(y.detach().numpy()))
print(cos_sim)

kushalj001 on 16 Jul 2019

For me logits values changes as well ... using exactly the same settings as mentioned in the example.

Have you found a way to fix that?

Oxi84 on 21 Jul 2019

👍1

@Oxi84 put model.eval() before you make the predictions. This fixed the problem of changing weights for me.

kushalj001 on 21 Jul 2019

Thanks. For me it works when call like that:

 tokenizer = XLNetTokenizer.from_pretrained("xlnet-large-cased")
 model = XLNetLMHeadModel.from_pretrained("xlnet-large-cased")
 model.eval()

However accuracy seems to be much lower that for Bert - with the code i wrote here: https://github.com/huggingface/pytorch-transformers/issues/846

Did you find that the accuracy is good or bad? I compared with Bert on few examples for masked word prediction and most of XLNet predicted word with the highest probability do not fit at all.

Oxi84 on 21 Jul 2019

👍1

@kushalj001 hi, how can I get the sentence vector

luv4me on 23 Jul 2019

Hi, so it seems that creating a model with a configuration is primarily the problem here:
model = XLNetLMHeadModel.from_pretrained("xlnet-large-cased")
yields consistent outputs, but
config = XLNetConfig.from_pretrained("xlnet-large-uncased")
model = XLNetModel(config)
does not at all.
My question is, how is it possible to set configuration states (like getting hidden states of the model). I have run the glue STS-B fine tuning code to customize the model which is stored at ./proc_data/sts-b-100, but when I load the model using code like this to get hidden states:

config = XLNetConfig.from_pretrained('./proc_data/sts-b-110/')
config.output_hidden_states=True
tokenizer = XLNetTokenizer.from_pretrained('././proc_data/sts-b-110/')
model = XLNetForSequenceClassification(config)

I get results that vary wildly across runs.

Specifically, I would like to get the hidden states of each layer from the fine tuned model and correlate it to the actual text similarity. I was thinking I'd load the model with XLNetForSequenceClassification, get all the hidden states setting the configuration to output hidden states and do such a correlation. Is my approach incorrect?

ksrinivs64 on 25 Jul 2019

Looking at run_glue, it seems that actually outputs[1] is used for prediction? This is confusing because all the examples use [0] and the documentation is not very clear.
outputs = model(**inputs)
tmp_eval_loss, logits = outputs[:2]
From run_glue.py

ksrinivs64 on 25 Jul 2019

Ok, I figured the logits and loss issue out - the issue is that for XLNetForSequenceClassification, the second index does in fact have logits while the first has loss.

ksrinivs64 on 26 Jul 2019

@thomwolf @Oxi84 while calculating word-embeddings of a document, i.e multiple sentences, is it necessary to pass the document sentence-wise? For my dataset, I removed punctuation as a part of the pre-processing step. So now, my whole document goes into the model. Does this hurt the model's performance? Does it make a lot of difference in terms of capturing the context of words?
Thanks

kushalj001 on 28 Jul 2019

👍1

It should improve acuracy if the text is longer, but still for me Bert is way better ... on 20-40 words long text.

Oxi84 on 28 Jul 2019

👍1

It should improve acuracy if the text is longer, but still for me Bert is way better ... on 20-40 words long text.

Yeah, even for my experiments, BERT simply outperfoms XLNet. Still don't know why though.
When you say "it should improve accuracy", you mean that feeding sentences to calculate word-vec would be better, right?

kushalj001 on 28 Jul 2019

Did you managed to try tensorflow version of XLNet, there is a chance it might be different from the pytorch version?

Oxi84 on 29 Jul 2019

Maybe there is some bug, but its unlikely since the bechmark results with the XLnet pytorch are the same. But I gues this would the first thing to try to recheck.

Oxi84 on 29 Jul 2019

👀1

Did you managed to try tensorflow version of XLNet, there is a chance it might be different from the pytorch version?

Any simple way of doing this?

kushalj001 on 29 Jul 2019

any updates regarding this issue?

cherepanovic on 12 Aug 2019

@kushalj001 why remove the punctuation ? Is it domain specific or to improve accuracy?

FannySB on 26 Aug 2019

@kushalj001 why remove the punctuation ? Is it domain specific or to improve accuracy?

My dataset had a lot of random punctuation, ie misplaced single and double-quotes.
But also, do punctuations add any valuable information to the text? Apart from the period (which can be used to break a large para into sentences), does keeping other punctuation symbols make sense?

kushalj001 on 26 Aug 2019

I will close this issue which dates back before we had the clean documentation up here: https://huggingface.co/pytorch-transformers/

Please open a new issue with a clear explanation of your specific problem if you have related issues.

thomwolf on 27 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings