Transformers: Does this project have this function ?

Created on 6 Mar 2020 · 5Comments · Source: huggingface/transformers

🚀 Feature request

can we use this project to calculate the probability that a input text as a real/resonable sentence base on the corpus we trained

wontfix

Source

loveJasmine

Most helpful comment

@loveJasmine Have a look at lm-scorer.

It is a tiny wrapper around transformers I wrote that allows you to get sentences probabilities using models that support it (only GPT2 models are implemented at the time of writing).

simonepri on 7 Apr 2020

👍4

All 5 comments

https://github.com/huggingface/transformers/issues/2311

frankniujc on 19 Mar 2020

@frankniujc it is helpful
but maybe a better way is take the all tokens in a whole, not prediction the next tokens

loveJasmine on 20 Mar 2020

The probability of a sentence P(s0s1s2s3s4...sn) = P(s1|s0) * P(s2|s0s1) * P(s3|s0s1s2) * ... * P(sn|s0s1s2...sn-1)

So you can do something like this

def sentence_probability(sent):
    bos = tokenizer.encode('<|endoftext|>')
    tokens = tokenizer.encode(sent)
    tokens = bos + tokens
    input_ids = torch.tensor(tokens).unsqueeze(0).to('cuda')

    sent_probs = []

    for i, next_word in enumerate(tokens[1:]):
        next_word_logits = model(input_ids[:,:i+1])[0][0, -1].detach()
        next_word_prob = F.log_softmax(next_word_logits, dim=0)[next_word].item()

        sent_probs.append(next_word_prob)

    return sum(sent_probs)

frankniujc on 20 Mar 2020

@loveJasmine Have a look at lm-scorer.

It is a tiny wrapper around transformers I wrote that allows you to get sentences probabilities using models that support it (only GPT2 models are implemented at the time of writing).

simonepri on 7 Apr 2020

👍4

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.