Transformers: AlbertForQuestionAnswering

Created on 28 Nov 2019 · 14Comments · Source: huggingface/transformers

Hello! Thanks for adding Albert so quickly! I have a problem with Albert answering a simple question from the Huggingface default example:

tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForQuestionAnswering.from_pretrained('albert-base-v2')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(3) else 1 for i in range(len(input_ids))] # for albert [SEP] token has id 3
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
print(' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]))

It actually shows empty output because

torch.argmax(start_scores), torch.argmax(end_scores)+1
## (tensor(7), tensor(6))

For other versions of Albert I also get some nonsense results :(
Thanks in advance

Source

garkavem

👀1

Most helpful comment

I have found a facebook model pretrained (oh sorry, fine tuned :) on squad2.0 in https://github.com/facebookresearch/SpanBERT.
it is compatible with the huggingface models, so you can get get it with:
wget http://dl.fbaipublicfiles.com/fairseq/models/spanbert_squad2.tar.gz
and extract it into say, directory spanbert
I use it something like:

import torch                                                                                                                                                                      
from transformers import BertTokenizer, BertForQuestionAnswering
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = BertForQuestionAnswering.from_pretrained('./spanbert')
q = "who am i?"
doc = "my name is slim shady"
input_text = "[CLS] " + q+ " [SEP] " + doc + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
res = all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]
if not res or res[0] == "[CLS]":
    print("MISSING")
else:
    prev_token = ""
    for i, t in enumerate(res):
        if t.startswith("##"):
            res[i-1] += t[2:]
            res[i] = ""
    print(" ".join([x for x in res if x != ""]))

I am including the snipped here as it is so hard to find minimal activations of bert on single entries, especially for Q&A

mosheliv on 1 Dec 2019

👍3

All 14 comments

Hi! The albert checkpoints only include the base model (the transformer model), and not the separate heads for each task (classification/question answering/...).

For question answering, you would have to first fine-tune the model to this specific task, as the question answering head is initialized randomly. You can do so with the run_squad.py example.

LysandreJik on 28 Nov 2019

👍1

It should be explained in that example, thank you for raising this issue! I'll change that.

LysandreJik on 28 Nov 2019

Ok! Thanks a lot!

garkavem on 29 Nov 2019

It would be really nice is you can release pretrained checkpoints for the specific tasks... I know its a big ask but it would save so many watts of energy all over the world....

mosheliv on 30 Nov 2019

The model need to finetune for downstream task is very general and
task-agnostic, if released the specific task , what extra thing u need to
do ? Also if released, it is not called “pretrained model” , all training
process finished.....

On Sat, Nov 30, 2019 at 15:58 mosheliv notifications@github.com wrote:

It would be really nice is you can release pretrained checkpoints for the
specific tasks... I know its a big ask but it would save so many watts of
energy all over the world....

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/1979?email_source=notifications&email_token=AIEAE4ESDYPMYNVNJWN266DQWIMLFA5CNFSM4JSVQEGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFP377Y#issuecomment-559923199,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIEAE4ASDKKVFD3ZVMA4WYDQWIMLFANCNFSM4JSVQEGA
.

pohanchi on 30 Nov 2019

I mean fine tuned for squad 2, for example. I would like to play with its
capabilities but the fine tuning process is a tad daunting....

On Sat, Nov 30, 2019, 21:22 pohan notifications@github.com wrote:

The model need to finetune for downstream task is very general and
task-agnostic, if released the specific task , what extra thing u need to
do ? Also if released, it is not called “pretrained model” , all training
process finished.....

On Sat, Nov 30, 2019 at 15:58 mosheliv notifications@github.com wrote:

It would be really nice is you can release pretrained checkpoints for the
specific tasks... I know its a big ask but it would save so many watts of
energy all over the world....

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<
https://github.com/huggingface/transformers/issues/1979?email_source=notifications&email_token=AIEAE4ESDYPMYNVNJWN266DQWIMLFA5CNFSM4JSVQEGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFP377Y#issuecomment-559923199
,
or unsubscribe
<
https://github.com/notifications/unsubscribe-auth/AIEAE4ASDKKVFD3ZVMA4WYDQWIMLFANCNFSM4JSVQEGA

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/1979?email_source=notifications&email_token=AC7IWC66W7KKINHNFXJETBLQWIPD5A5CNFSM4JSVQEGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFP4Q7A#issuecomment-559925372,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AC7IWC65E5R33BTHMHSYJWLQWIPD5ANCNFSM4JSVQEGA
.

mosheliv on 30 Nov 2019

👍1

I totally agree that it would be nice to have the weights for Albert finetuned on Squad available.

garkavem on 30 Nov 2019

👍3

import torch                                                                                                                                                                      
from transformers import BertTokenizer, BertForQuestionAnswering
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = BertForQuestionAnswering.from_pretrained('./spanbert')
q = "who am i?"
doc = "my name is slim shady"
input_text = "[CLS] " + q+ " [SEP] " + doc + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
res = all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]
if not res or res[0] == "[CLS]":
    print("MISSING")
else:
    prev_token = ""
    for i, t in enumerate(res):
        if t.startswith("##"):
            res[i-1] += t[2:]
            res[i] = ""
    print(" ".join([x for x in res if x != ""]))

I am including the snipped here as it is so hard to find minimal activations of bert on single entries, especially for Q&A

mosheliv on 1 Dec 2019

👍3

Thanks a lot!

garkavem on 2 Dec 2019

@mosheliv - isn't that just for bert, not albert?

mfeblowitz on 4 Dec 2019

Yes, it is, but it was the only squad2 pre-trained i could find.

On Thu, Dec 5, 2019, 07:40 Mark Feblowitz notifications@github.com wrote:

@mosheliv https://github.com/mosheliv - isn't that just for bert, not
albert?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/1979?email_source=notifications&email_token=AC7IWC3RQUZWCNR34BFZVHTQW72QFA5CNFSM4JSVQEGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF6B5II#issuecomment-561782433,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AC7IWC3TJXCV3RN46RWFRX3QW72QFANCNFSM4JSVQEGA
.

mosheliv on 4 Dec 2019

import torch                                                                                                                                                                      
from transformers import BertTokenizer, BertForQuestionAnswering
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = BertForQuestionAnswering.from_pretrained('./spanbert')
q = "who am i?"
doc = "my name is slim shady"
input_text = "[CLS] " + q+ " [SEP] " + doc + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
res = all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]
if not res or res[0] == "[CLS]":
    print("MISSING")
else:
    prev_token = ""
    for i, t in enumerate(res):
        if t.startswith("##"):
            res[i-1] += t[2:]
            res[i] = ""
    print(" ".join([x for x in res if x != ""]))

I am including the snipped here as it is so hard to find minimal activations of bert on single entries, especially for Q&A

Can we assume that whenever there's a [CLS] in the answer, it basically means no answer? I'm asking since I know depending on how we treat such cases, it can affect the performance evaluation. Please take a look at my question asked here on SO.

Also for folks who might be looking for a running example of fine-tuned ALBERT on SQuAD v2.0, you might find this helpful:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")
model = AutoModelForQuestionAnswering.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")
question = "Where is the capital of the USA?"
text = "Capital of the USA is the beautiful Washington D.C."

input_dict = tokenizer.encode_plus(question, text, return_tensors="pt")
input_ids = input_dict["input_ids"].tolist()
start_scores, end_scores = model(**input_dict)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ''.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]).replace('▁', ' ').strip()
print(answer)

phosseini on 9 Feb 2020

❤1 👍1

No expert on this model but yes, this is how I used it.
Thanks for the albert, will try it later on!

On Sun, Feb 9, 2020, 16:56 Pedram notifications@github.com wrote:

I have found a facebook model pretrained (oh sorry, fine tuned :) on
squad2.0 in https://github.com/facebookresearch/SpanBERT.
it is compatible with the huggingface models, so you can get get it with:
wget http://dl.fbaipublicfiles.com/fairseq/models/spanbert_squad2.tar.gz
and extract it into say, directory spanbert
I use it something like:

import torch

from transformers import BertTokenizer, BertForQuestionAnswering

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

model = BertForQuestionAnswering.from_pretrained('./spanbert')

q = "who am i?"

doc = "my name is slim shady"

input_text = "[CLS] " + q+ " [SEP] " + doc + " [SEP]"

input_ids = tokenizer.encode(input_text)

token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]

start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))

all_tokens = tokenizer.convert_ids_to_tokens(input_ids)

res = all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]

if not res or res[0] == "[CLS]":
print("MISSING")
else:
prev_token = ""

for i, t in enumerate(res):

    if t.startswith("##"):

        res[i-1] += t[2:]

        res[i] = ""

print(" ".join([x for x in res if x != ""]))
I am including the snipped here as it is so hard to find minimal
activations of bert on single entries, especially for Q&A

Can we assume that whenever there's a [CLS] in the answer, it basically
means no answer? I'm asking since I know depending on how we treat such
cases, it can affect the performance evaluation. Please see take a look at
my question asked here on SO
https://stackoverflow.com/questions/60133236/what-does-berts-special-characters-appearance-in-squads-qa-answers-mean
.

Also for folks who might be looking for a running example of fine-tuned
ALBERT on SQuAD v2.0, you might find this helpful:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")

model = AutoModelForQuestionAnswering.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")

question = "Where is the capital of the USA?"

text = "The capital of the USA is beautiful Washington D.C."

input_dict = tokenizer.encode_plus(question, text, return_tensors="pt")

input_ids = input_dict["input_ids"].tolist()

start_scores, end_scores = model(**input_dict)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])

answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]).replace('▁', '')

print(answer)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/1979?email_source=notifications&email_token=AC7IWCYV3DSSF2HRGDQDFWLRB55FLA5CNFSM4JSVQEGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGBZ2A#issuecomment-583802088,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AC7IWC6KKB4S4UBYSQGQ3M3RB55FLANCNFSM4JSVQEGA
.

mosheliv on 9 Feb 2020

I have found a facebook model pretrained (oh sorry, fine tuned :) on squad2.0 in https://github.com/facebookresearch/SpanBERT.
it is compatible with the huggingface models, so you can get get it with:
wget http://dl.fbaipublicfiles.com/fairseq/models/spanbert_squad2.tar.gz
and extract it into say, directory spanbert
I use it something like:
import torch                                                                                                                                                                      
from transformers import BertTokenizer, BertForQuestionAnswering
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = BertForQuestionAnswering.from_pretrained('./spanbert')
q = "who am i?"
doc = "my name is slim shady"
input_text = "[CLS] " + q+ " [SEP] " + doc + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
res = all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]
if not res or res[0] == "[CLS]":
    print("MISSING")
else:
    prev_token = ""
    for i, t in enumerate(res):
        if t.startswith("##"):
            res[i-1] += t[2:]
            res[i] = ""
    print(" ".join([x for x in res if x != ""]))
I am including the snipped here as it is so hard to find minimal activations of bert on single entries, especially for Q&A
Can we assume that whenever there's a [CLS] in the answer, it basically means no answer? I'm asking since I know depending on how we treat such cases, it can affect the performance evaluation. Please see take a look at my question asked here on SO.

Also for folks who might be looking for a running example of fine-tuned ALBERT on SQuAD v2.0, you might find this helpful:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")
model = AutoModelForQuestionAnswering.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")
question = "Where is the capital of the USA?"
text = "Capital of the USA is the beautiful Washington D.C."

input_dict = tokenizer.encode_plus(question, text, return_tensors="pt")
input_ids = input_dict["input_ids"].tolist()
start_scores, end_scores = model(**input_dict)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ''.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]).replace('▁', ' ').strip()
print(answer)

Hi! Thanks for this, I'm only a beginner and this really saved me a lot of trouble! I had a small question however. Apparently the page for this model [https://huggingface.co/ktrapeznikov/albert-xlarge-v2-squad-v2] shows there is a way to get the 'scores' of the spans in addition to getting an answer but I couldn't get it to work myself. The code is supposed to be on the lines of:

start_scores, end_scores = model(input_ids) 
span_scores = start_scores.softmax(dim=1).log()[:,:,None] + end_scores.softmax(dim=1).log()[:,None,:]
ignore_score = span_scores[:,0,0] #no answer scores

But this doesn't return a single score. What am I missing?