Transformers: Does this project have this function ?

Created on 6 Mar 2020  路  5Comments  路  Source: huggingface/transformers

馃殌 Feature request

can we use this project to calculate the probability that a input text as a real/resonable sentence base on the corpus we trained

wontfix

Most helpful comment

@loveJasmine Have a look at lm-scorer.

It is a tiny wrapper around transformers I wrote that allows you to get sentences probabilities using models that support it (only GPT2 models are implemented at the time of writing).

All 5 comments

@frankniujc it is helpful
but maybe a better way is take the all tokens in a whole, not prediction the next tokens

The probability of a sentence P(s0s1s2s3s4...sn) = P(s1|s0) * P(s2|s0s1) * P(s3|s0s1s2) * ... * P(sn|s0s1s2...sn-1)

So you can do something like this

def sentence_probability(sent):
    bos = tokenizer.encode('<|endoftext|>')
    tokens = tokenizer.encode(sent)
    tokens = bos + tokens
    input_ids = torch.tensor(tokens).unsqueeze(0).to('cuda')

    sent_probs = []

    for i, next_word in enumerate(tokens[1:]):
        next_word_logits = model(input_ids[:,:i+1])[0][0, -1].detach()
        next_word_prob = F.log_softmax(next_word_logits, dim=0)[next_word].item()

        sent_probs.append(next_word_prob)

    return sum(sent_probs)

@loveJasmine Have a look at lm-scorer.

It is a tiny wrapper around transformers I wrote that allows you to get sentences probabilities using models that support it (only GPT2 models are implemented at the time of writing).

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x01h picture 0x01h  路  3Comments

adigoryl picture adigoryl  路  3Comments

quocnle picture quocnle  路  3Comments

alphanlp picture alphanlp  路  3Comments

rsanjaykamath picture rsanjaykamath  路  3Comments