Describe the bug
Creating a list field of sequence fields in my data encoder throws an assertion error.
AssertionError: ListFields must contain a single field type, found set()
To Reproduce
srl_tags: List[Field] = []
for frame, tags in srl.items():
srl_tags.append(SequenceLabelField(tags, premise_field, label_namespace="SRL_tags" ))
srl_tags_field: ListField = ListField(srl_tags)
fields['srl_tags']=srl_tags_field
Expected behavior
Expect no exception since the list field is all of the same type
System (please complete the following information):
Additional context
Add any other context about the problem here.
The found set() message indicates that the ListField is empty (i.e. your srl_tags list is empty). I agree that the error message is not helpful here. (I'm undecided on whether this should be an error or not)
is it intentional that your srl_tags is empty?
I faced this exception a couple of days back. It was because one of the times ListField was instantiated with empty list []. I believe you are having a similar problem. In the loop, one of the times srl_tags is actually an empty list. When the list is non-empty, all of them are of same type though. Can you check if this is true? If so, I can help to fix it.
ListField([]) (instantiation with empty list) does not work since the list field must know what type of field it is containing. This information is essential for batching and padding. Hence, instead of calling my_list_field = ListField([]), following should be used: my_list_field = stub_list_field.empty_field(). You can check usage of empty_field in list_field_test.py
Thank you that seemed to work but it would be nice if there was a way to call the stub_list_field.empty_field() method without actually having to create an instance, creating an instance of certain feilds like a sequence field is a pain.
Now i'm getting another error I am a little new to allennlp is this because i need to mark the field as optional in my model if so is there a reference?
```018-06-18 23:15:52,367 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1001
0%| | 0/1563 [00:00, ?it/s]2018-06-18 23:15:52,367 - INFO - allennlp.training.trainer - Training
Traceback (most recent call last):
File "/home/aripc/anaconda2/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/aripc/anaconda2/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/aripc/Documents/nlu/allennlp/allennlp/run.py", line 18, in
main(prog="allennlp")
File "/home/aripc/Documents/nlu/allennlp/allennlp/commands/__init__.py", line 65, in main
args.func(args)
File "/home/aripc/Documents/nlu/allennlp/allennlp/commands/train.py", line 101, in train_model_from_args
args.recover)
File "/home/aripc/Documents/nlu/allennlp/allennlp/commands/train.py", line 131, in train_model_from_file
return train_model(params, serialization_dir, file_friendly_logging, recover)
File "/home/aripc/Documents/nlu/allennlp/allennlp/commands/train.py", line 296, in train_model
metrics = trainer.train()
File "/home/aripc/Documents/nlu/allennlp/allennlp/training/trainer.py", line 679, in train
train_metrics = self._train_epoch(epoch)
File "/home/aripc/Documents/nlu/allennlp/allennlp/training/trainer.py", line 453, in _train_epoch
for batch in train_generator_tqdm:
File "/home/aripc/anaconda2/envs/allennlp/lib/python3.6/site-packages/tqdm/_tqdm.py", line 930, in __iter__
for obj in iterable:
File "/home/aripc/Documents/nlu/allennlp/allennlp/data/iterators/data_iterator.py", line 54, in __call__
yield from self._yield_one_epoch(instances, shuffle, cuda_device)
File "/home/aripc/Documents/nlu/allennlp/allennlp/data/iterators/data_iterator.py", line 66, in _yield_one_epoch
for batch in batches:
File "/home/aripc/Documents/nlu/allennlp/allennlp/data/iterators/bucket_iterator.py", line 83, in _create_batches
grouped_instances = list(super()._create_batches(instance_list, shuffle=False))
File "/home/aripc/Documents/nlu/allennlp/allennlp/data/iterators/basic_iterator.py", line 133, in _create_batches
yield Batch(batch_instances)
File "/home/aripc/Documents/nlu/allennlp/allennlp/data/dataset.py", line 32, in __init__
self._check_types()
File "/home/aripc/Documents/nlu/allennlp/allennlp/data/dataset.py", line 43, in _check_types
raise ConfigurationError("You cannot construct a Batch with non-homogeneous Instances.")
allennlp.common.checks.ConfigurationError: 'You cannot construct a Batch with non-homogeneous Instances.'
```
Can you paste your updated code? All the field names and field types should be same for all instances that are to be batched together. I believe that the instance which had an empty ListField is causing problem. This could potentially happen if your stub list is not of the same type as it should be.
I can paste a snippet
if (len(srl_tags)>0):
fields['srl_tags'] = ListField(srl_tags)
else:
# in my actual code dummyseq is a populated SequenceLabelField
fields[''srl_tags'] = dummyseq.empty_field()
Are you sure the dummyseq is a ListField of the same type that of srl_tags?
You might want to check the source of this error:
def _check_types(self) -> None:
"""
Check that all the instances have the same types.
"""
all_instance_fields_and_types: List[Dict[str, str]] = [{k: v.__class__.__name__
for k, v in x.fields.items()}
for x in self.instances]
# Check all the field names and Field types are the same for every instance.
if not all([all_instance_fields_and_types[0] == x for x in all_instance_fields_and_types]):
raise ConfigurationError("You cannot construct a Batch with non-homogeneous Instances.")
If dummyseq.empty_field().__class__.__name__ is not same as ListField(srl_tags).__class__.__name__, then this can happen. Keys of fields dict also have to be same across the instances, but from this snippet it doesn't seem that could be a problem
yes it's a SequenceLabelField
@aribornstein I think you want:
if (len(srl_tags)>0):
fields['srl_tags'] = ListField(srl_tags)
else:
# in my actual code dummyseq is a populated SequenceLabelField
fields[''srl_tags'] = ListField(dummyseq.empty_field())
As Harsh was saying, keys in an instance have to have the same type, so we can batch them together.
Additionally, you may wish to look at this in a slightly different way. If i've inferred correctly that you are trying to predict all frames for all verbs at the same time, and that you are using a BIO encoding for each verb's frame, you might want to consider using a sequence labelling of all "O" for verbs which have no frame, rather than not using one at all. You can see a concrete example of this here: https://github.com/allenai/allennlp/blob/master/allennlp/data/dataset_readers/semantic_role_labeling.py#L66
Yes that worked!! Thank you all might have a few more questions a little later but this is amazing progress thank you!
This code works, thank you very much.
The ListField __init__() methos should add a parameter to specify field when the list is empty.
Most helpful comment
@aribornstein I think you want:
As Harsh was saying, keys in an instance have to have the same type, so we can batch them together.
Additionally, you may wish to look at this in a slightly different way. If i've inferred correctly that you are trying to predict all frames for all verbs at the same time, and that you are using a BIO encoding for each verb's frame, you might want to consider using a sequence labelling of all "O" for verbs which have no frame, rather than not using one at all. You can see a concrete example of this here: https://github.com/allenai/allennlp/blob/master/allennlp/data/dataset_readers/semantic_role_labeling.py#L66