System (please complete the following information):
I am working on snli type pair of sentences labeling task. Essentially I have one premise and arbitrarily many hypothesis out of which only 1 can be correct. My variable size batch would look like
[(premise, hypothesis1)]
[(premise, hypothesis2)]
[(premise, hypothesis3)]
....
[(premise, hypothesisn)]
again n can be variable. I was planning to create a custom iterator for generating such a batch. I want to compute scores for all the instances in the batch and then apply softmax on the batch axis.
I am expecting a softmax vector of size n for the above example which I want to compare with the batch labels and compute the batch loss accordingly. Is it possible to implement this sort of batch-softmax and batch-loss in allennlp/pytorch?
It sounds like you're structuring your data such that you have one Instance being a (premise, hypothesis) pair. It'd probably be simpler for you to structure your data such that each Instance is a (premise, [hypotheses]) pair. Then the loss computation is simple, and everything matches what you want. Does this make sense?
The problem with that is that the number of hypotheses can be huge (as much as 2000+). Secondly I'm using the decomposable attention model (decomposable_attention.py). I am not sure how to update that model in case I change the Instance to (premise, [hypotheses]) format.
Are you expecting to have a batch size of 2000 when doing it in the way that you're thinking?
My point is that, conceptually, your input appears to be a premise and a list of hypotheses, with the output being a choice over the hypotheses. If this is true, you really want your Instances to be structured this way. First figure out what the right way is to think about your problem, _then_ figure out how to structure your code to match that, instead of seeing what code is available and trying to shoehorn your task in that format. Once you come up with a format that you're happy with and that matches the structure of your problem, if you have questions about how to implement that format in AllenNLP, we can try to answer your questions.
Hi,
Thank you for the response.
I thought about the problem. I want to use a similar architecture as the decomposable_attention_model however I am now arranging the Instance as you suggested (premise, [hypotheses]). This is the code I have so far
class QuestionResponseSoftmaxReader(DatasetReader):
def __init__(self,
tokenizer: Tokenizer = None,
token_indexers: Dict[str, TokenIndexer] = None,
lazy: bool = False) -> None:
super().__init__(lazy)
self._tokenizer = tokenizer or WordTokenizer()
self._token_indexers = token_indexers or {'tokens': SingleIdTokenIndexer()}
def _read(self, file_path: str):
# if `file_path` is a URL, redirect to the cache
file_path = cached_path(file_path)
with open(file_path, 'r') as features_file:
logger.info("Reading Generated Responses and questions instances from features file: %s", file_path)
current_qa = None
current_responses = list()
current_labels = list()
for i, line in enumerate(features_file):
# TODO: remove this after debugging
# if i==10000:
# break
line = line.strip()
row = re.split('\t|\\t', line)
q = row[0].strip()
q = q.lower()
a = row[1].strip()
a = a.lower()
if current_qa != (q,a):
# send the previous batch
if len(current_responses) > 1:
yield self.text_to_instance(current_qa[0], current_responses, current_labels)
current_qa = (q,a)
current_responses = list()
current_labels = list()
r = row[2].strip()
r = r.lower()
rule = row[3].strip()
count = row[-1]
if int(count) > 0:
label = "1"
else:
label = "0"
current_responses.append(r)
current_labels.append(label)
# yield the last batch
if len(current_responses) > 1:
yield self.text_to_instance(current_qa[0], current_responses, current_labels)
current_qa = None
current_responses = list()
current_labels = list()
def text_to_instance(self, # type: ignore
premise: str,
hypotheses: List[str],
labels: List[str] = None) -> Instance:
# pylint: disable=arguments-differ
fields: Dict[str, Field] = {}
premise_tokens = self._tokenizer.tokenize(premise)
fields['premise'] = TextField(premise_tokens, self._token_indexers)
all_hypotheses_fields = list()
for hypothesis in hypotheses:
hypothesis_tokens = self._tokenizer.tokenize(hypothesis)
all_hypotheses_fields.append(TextField(hypothesis_tokens, self._token_indexers))
fields['hypotheses'] = ListField(all_hypotheses_fields)
if labels:
all_labels_fields = list()
for label in labels:
all_labels_fields.append(LabelField(label))
fields['labels'] = ListField(label)
# metadata = {"premise_tokens": [x.text for x in premise_tokens],
# "hypothesis_tokens": [x.text for x in hypothesis_tokens]}
# fields["metadata"] = MetadataField(metadata)
return Instance(fields)
Am I creating the instance correctly here? Also I am getting the following error when I try to read my train data:
Traceback (most recent call last):
File "decomposable_attention_model_softmax_training.py", line 149, in <module>
vocab = Vocabulary.from_instances(train_dataset + val_dataset)
File "/home/baheti/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/data/vocabulary.py", line 397, in from_instances
instance.count_vocab_items(namespace_token_counts)
File "/home/baheti/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/data/instance.py", line 57, in count_vocab_items
field.count_vocab_items(counter)
File "/home/baheti/anaconda3/envs/allennlp/lib/python3.6/site-packages/allennlp/data/fields/list_field.py", line 47, in count_vocab_items
field.count_vocab_items(counter)
AttributeError: 'str' object has no attribute 'count_vocab_items'
What would be the correct Fields to use for the list hypotheses and the list labels? Kindly help me with this.
you are creating a list of LabelFields but not using them, where you have
fields['labels'] = ListField(label)
it should be
fields['labels'] = ListField(all_labels_fields)
@joelgrus Thank you for the correction. I also want some help in the model development part. As I mentioned before I want to keep the components from Decomposable Attention Model intact and just change the objective and loss computation. I have copied the code from DecomposableAttention(Model) and tried to implement it in the forward() function like this:
class DecomposableAttentionSoftmax(Model):
"""
This ``Model`` implements the Decomposable Attention model described in `"A Decomposable
Attention Model for Natural Language Inference"
<https://www.semanticscholar.org/paper/A-Decomposable-Attention-Model-for-Natural-Languag-Parikh-T%C3%A4ckstr%C3%B6m/07a9478e87a8304fc3267fa16e83e9f3bbd98b27>`_
by Parikh et al., 2016, with some optional enhancements before the decomposable attention
actually happens. Parikh's original model allowed for computing an "intra-sentence" attention
before doing the decomposable entailment step. We generalize this to any
:class:`Seq2SeqEncoder` that can be applied to the premise and/or the hypothesis before
computing entailment.
The basic outline of this model is to get an embedded representation of each word in the
premise and hypothesis, align words between the two, compare the aligned phrases, and make a
final entailment decision based on this aggregated comparison. Each step in this process uses
a feedforward network to modify the representation.
Parameters
----------
vocab : ``Vocabulary``
text_field_embedder : ``TextFieldEmbedder``
Used to embed the ``premise`` and ``hypothesis`` ``TextFields`` we get as input to the
model.
attend_feedforward : ``FeedForward``
This feedforward network is applied to the encoded sentence representations before the
similarity matrix is computed between words in the premise and words in the hypothesis.
similarity_function : ``SimilarityFunction``
This is the similarity function used when computing the similarity matrix between words in
the premise and words in the hypothesis.
compare_feedforward : ``FeedForward``
This feedforward network is applied to the aligned premise and hypothesis representations,
individually.
aggregate_feedforward : ``FeedForward``
This final feedforward network is applied to the concatenated, summed result of the
``compare_feedforward`` network, and its output is used as the entailment class logits.
premise_encoder : ``Seq2SeqEncoder``, optional (default=``None``)
After embedding the premise, we can optionally apply an encoder. If this is ``None``, we
will do nothing.
hypothesis_encoder : ``Seq2SeqEncoder``, optional (default=``None``)
After embedding the hypothesis, we can optionally apply an encoder. If this is ``None``,
we will use the ``premise_encoder`` for the encoding (doing nothing if ``premise_encoder``
is also ``None``).
initializer : ``InitializerApplicator``, optional (default=``InitializerApplicator()``)
Used to initialize the model parameters.
regularizer : ``RegularizerApplicator``, optional (default=``None``)
If provided, will be used to calculate the regularization penalty during training.
"""
def __init__(self, vocab: Vocabulary,
text_field_embedder: TextFieldEmbedder,
attend_feedforward: FeedForward,
similarity_function: SimilarityFunction,
compare_feedforward: FeedForward,
aggregate_feedforward: FeedForward,
premise_encoder: Optional[Seq2SeqEncoder] = None,
hypothesis_encoder: Optional[Seq2SeqEncoder] = None,
initializer: InitializerApplicator = InitializerApplicator(),
regularizer: Optional[RegularizerApplicator] = None) -> None:
super(DecomposableAttention, self).__init__(vocab, regularizer)
self._text_field_embedder = text_field_embedder
self._attend_feedforward = TimeDistributed(attend_feedforward)
self._matrix_attention = LegacyMatrixAttention(similarity_function)
self._compare_feedforward = TimeDistributed(compare_feedforward)
self._aggregate_feedforward = aggregate_feedforward
self._premise_encoder = premise_encoder
self._hypothesis_encoder = hypothesis_encoder or premise_encoder
self._num_labels = vocab.get_vocab_size(namespace="labels")
check_dimensions_match(text_field_embedder.get_output_dim(), attend_feedforward.get_input_dim(),
"text field embedding dim", "attend feedforward input dim")
check_dimensions_match(aggregate_feedforward.get_output_dim(), self._num_labels,
"final output dimension", "number of labels")
self._accuracy = CategoricalAccuracy()
self._loss = torch.nn.CrossEntropyLoss()
initializer(self)
def forward(self, # type: ignore
premise: Dict[str, torch.LongTensor],
hypotheses: List[Dict[str, torch.LongTensor]],
labels: List[torch.IntTensor] = None,
metadata: List[Dict[str, Any]] = None) -> Dict[str, torch.Tensor]:
# pylint: disable=arguments-differ
"""
Parameters
----------
premise : Dict[str, torch.LongTensor]
From a ``TextField``
hypothesis : Dict[str, torch.LongTensor]
From a ``TextField``
label : torch.IntTensor, optional, (default = None)
From a ``LabelField``
metadata : ``List[Dict[str, Any]]``, optional, (default = None)
Metadata containing the original tokenization of the premise and
hypothesis with 'premise_tokens' and 'hypothesis_tokens' keys respectively.
Returns
-------
An output dictionary consisting of:
label_logits : torch.FloatTensor
A tensor of shape ``(batch_size, num_labels)`` representing unnormalised log
probabilities of the entailment label.
label_probs : torch.FloatTensor
A tensor of shape ``(batch_size, num_labels)`` representing probabilities of the
entailment label.
loss : torch.FloatTensor, optional
A scalar loss to be optimised.
"""
embedded_premise = self._text_field_embedder(premise)
premise_mask = get_text_field_mask(premise).float()
if self._premise_encoder:
embedded_premise = self._premise_encoder(embedded_premise, premise_mask)
projected_premise = self._attend_feedforward(embedded_premise)
all_label_logits = list()
all_label_probs = list()
all_h2p_attention = list()
all_p2h_attention = list()
for hypothesis in hypotheses:
embedded_hypothesis = self._text_field_embedder(hypothesis)
hypothesis_mask = get_text_field_mask(hypothesis).float()
if self._hypothesis_encoder:
embedded_hypothesis = self._hypothesis_encoder(embedded_hypothesis, hypothesis_mask)
projected_hypothesis = self._attend_feedforward(embedded_hypothesis)
# Shape: (batch_size, premise_length, hypothesis_length)
similarity_matrix = self._matrix_attention(projected_premise, projected_hypothesis)
# Shape: (batch_size, premise_length, hypothesis_length)
p2h_attention = masked_softmax(similarity_matrix, hypothesis_mask)
all_p2h_attention.append(p2h_attention)
# Shape: (batch_size, premise_length, embedding_dim)
attended_hypothesis = weighted_sum(embedded_hypothesis, p2h_attention)
# Shape: (batch_size, hypothesis_length, premise_length)
h2p_attention = masked_softmax(similarity_matrix.transpose(1, 2).contiguous(), premise_mask)
all_h2p_attention.append(h2p_attention)
# Shape: (batch_size, hypothesis_length, embedding_dim)
attended_premise = weighted_sum(embedded_premise, h2p_attention)
premise_compare_input = torch.cat([embedded_premise, attended_hypothesis], dim=-1)
hypothesis_compare_input = torch.cat([embedded_hypothesis, attended_premise], dim=-1)
compared_premise = self._compare_feedforward(premise_compare_input)
compared_premise = compared_premise * premise_mask.unsqueeze(-1)
# Shape: (batch_size, compare_dim)
compared_premise = compared_premise.sum(dim=1)
compared_hypothesis = self._compare_feedforward(hypothesis_compare_input)
compared_hypothesis = compared_hypothesis * hypothesis_mask.unsqueeze(-1)
# Shape: (batch_size, compare_dim)
compared_hypothesis = compared_hypothesis.sum(dim=1)
aggregate_input = torch.cat([compared_premise, compared_hypothesis], dim=-1)
label_logit = self._aggregate_feedforward(aggregate_input)
all_label_logits.append(label_logits)
# How to apply softmax on all_label_logits
label_probs = torch.nn.functional.softmax(label_logits, dim=-1)
output_dict = {"label_logits": all_label_logits,
"label_probs": label_probs,
"h2p_attention": all_h2p_attention,
"p2h_attention": all_p2h_attention}
# How to compute correct loss here?
if labels is not None:
loss = self._loss(label_logits, labels.long().view(-1))
self._accuracy(label_logits, label)
output_dict["loss"] = loss
# if metadata is not None:
# output_dict["premise_tokens"] = [x["premise_tokens"] for x in metadata]
# output_dict["hypothesis_tokens"] = [x["hypothesis_tokens"] for x in metadata]
return output_dict
Sepecifically, I have following issues
hypotheses and labels correct in the forward() functionsoftmax on list of all_label_logits. Is creating a list a correct thing to do here?loss after taking the softmax (what will be the correct syntax to use the list of labels)for-loop to iterate over all the hypotheses. I am guessing using for-loop will make it very inefficientHmm, we definitely need better documentation on how putting things in lists work. I'm planning on making some better docs soon, and I'll be sure to include this.
When you make a ListField[TextField], you still get a Dict[str, Tensor] in the model, it's just that each Tensor has an extra dimension. You need to accompany this with passing the argument num_wrapping_dims=1 to the call to self._text_field_embedder(hypothesis). In the end, you'll have a tensor of shape (batch_size, num_hypotheses, num_hypothesis_tokens, embedding_dim), and you'll have to structure your operations to work with the extra dimension, and aggregate across the num_hypotheses dimension somehow when computing your loss.
you are creating a list of LabelFields but not using them, where you have
fields['labels'] = ListField(label)it should be
fields['labels'] = ListField(all_labels_fields)@joelgrus @matt-gardner I was busy with something else so couldn't work more on this for a while. I facing some issue with the fields['labels']. Although I set this to ListField() I am getting a single tensor of size 1 when I load the batch. Could you tell me what I'm doing wrong here?
def forward(self, # type: ignore
premise: Dict[str, torch.LongTensor],
hypotheses: Dict[str, torch.LongTensor],
labels: List[torch.IntTensor] = None,
metadata: List[Dict[str, Any]] = None) -> Dict[str, torch.Tensor]:
print(hypotheses.keys())
print("Hypotheses")
print(type(hypotheses["tokens"]))
print(hypotheses["tokens"].shape)
print("Premise")
print(type(premise["tokens"]))
print(premise["tokens"].shape)
print("Labels")
print(type(labels))
print(len(labels))
exit()
This gives the output:
dict_keys(['tokens'])
Hypotheses
<class 'torch.Tensor'>
torch.Size([1, 25, 11])
Premise
<class 'torch.Tensor'>
torch.Size([1, 13])
Labels
<class 'torch.Tensor'>
1
I was expecting labels to be a List[torch.IntTensor]. Kindly help me with this.
I also tried to send the labels from the metadata field and it is able to send the labels like I want. Essentially in my text_to_instance() function of reader I added metadata as follows
metadata = {"labels": all_labels_fields}
fields["metadata"] = MetadataField(metadata)
in my model's forward() function I found metadata to be of the type list. It had one element only as I had set the batch-size to 1. First element was a dictionary with labels key and List[torch.Inttensor]as values (which is exactly what I want in the labels field).
Currently myforward()` function looks like:
def forward(self, # type: ignore
premise: Dict[str, torch.LongTensor],
hypotheses: Dict[str, torch.LongTensor],
labels: List[torch.IntTensor] = None,
metadata: List[Dict[str, Any]] = None) -> Dict[str, torch.Tensor]:
print(hypotheses.keys())
print("Hypotheses")
print(type(hypotheses["tokens"]))
print(hypotheses["tokens"].shape)
print("Premise")
print(type(premise["tokens"]))
print(premise["tokens"].shape)
print("Labels")
print(type(labels))
print(len(labels))
print("Metadata")
print(type(metadata[0]))
print(len(metadata[0]['labels']))
exit()
and it gives the following output:
dict_keys(['tokens'])
Hypotheses
<class 'torch.Tensor'>
torch.Size([1, 46, 17])
Premise
<class 'torch.Tensor'>
torch.Size([1, 11])
Labels
<class 'torch.Tensor'>
1
Metadata
<class 'dict'>
46
Here I received 46 labels for 46 different hypotheses of my input. Kindly inform me of the correct way to do this.
in general, a ListField doesn't generate a list of tensors, it stacks all the tensors together into a single tensor with one added dimension
Thank you for updating. However, I had assigned provided the ListField with a list of integers and for some reason it squashed it into a single tensor.
This is how I assign the 'labels' field in my instance creation
all_labels_fields = list()
for label in labels:
all_labels_fields.append(LabelField(label))
fields['labels'] = ListField(all_labels_fields)
while I get a torch.Tensor for size 1 in my forward function (kindly refer to my previous 2 comments). Could you explain why is this happening?
labels is a tensor; you shouldn't be printing len(labels), you should be printing labels.size(). You have batch size one, which is why you are seeing len(labels) == 1, because len(tensor) will tell you the size of the first dimension. If you were to print labels.size(), you should see something like torch.Size([1, 46]).
Also, your type annotation is wrong - it should be labels: torch.IntTensor, not labels: List[torch.IntTensor]. We always batch things together into tensor for you, for efficient GPU computations, and the outer dimension is always the batch dimension. A ListField will add a dimension _after_ the batch dimension, as you see in the hypothesis field, which has shape (batch_size, num_hypotheses, hypothesis_length).
@matt-gardner Thank you for the prompt responses. I deeply appreciate it. Your comments makes a lot of sense. I will now try to figure out how to make the softmax working. Hopefully, I won't face too many obstructions now that I have figured out all the dimensions. Thank you again for the prompt clarification!
Pretty sure the issue here has been solved; closing this issue.