While fine-tuning BERT with the new script I am facing the issue as follows:
Traceback (most recent call last):
File "run_mlm.py", line 310, in <module>
main()
File "run_mlm.py", line 259, in main
load_from_cache_file=not data_args.overwrite_cache,
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/dataset_dict.py", line 300, in map
for k, dataset in self.items()
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/dataset_dict.py", line 300, in <dictcomp>
for k, dataset in self.items()
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1256, in map
update_data=update_data,
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 156, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 158, in wrapper
self._fingerprint, transform, kwargs_for_fingerprint
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 105, in update_fingerprint
hasher.update(transform_args[key])
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 57, in update
self.m.update(self.hash(value).encode("utf-8"))
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 53, in hash
return cls.hash_default(value)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/fingerprint.py", line 46, in hash_default
return cls.hash_bytes(dumps(value))
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/utils/py_utils.py", line 367, in dumps
dump(obj, file)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/datasets/utils/py_utils.py", line 339, in dump
Pickler(file, recurse=True).dump(obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 446, in dump
StockPickler.dump(self, obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 1438, in save_function
obj.__dict__, fkwdefaults), obj=obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 1170, in save_cell
pickler.save_reduce(_create_cell, (f,), obj=obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 605, in save_reduce
save(cls)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 1365, in save_type
obj.__bases__, _dict), obj=obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 933, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/site-packages/dill/_dill.py", line 933, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 507, in save
self.save_global(obj, rv)
File "/home/ai-students/anaconda3/envs/env_nesara/lib/python3.6/pickle.py", line 927, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle typing.Union[str, NoneType]: it's not the same object as typing.Union
I am trying to run the same script with the already mentioned wikitext dataset. However, I am not able to run it successfully due to the above mentioned error.
@sgugger Could you please help me resolve this error?
Can you give all the information you can about your environment and pip list and I鈥檓 pinging @lhoestq.
If you can manage to reproduce the error in a google colab or shareable environment that would be the top for debugging.
@VictorSanh got a similar issue once. Did you install transformers using pip install -e . ?
Can you give all the information you can about your environment and pip list and I鈥檓 pinging @lhoestq.
If you can manage to reproduce the error in a google colab or shareable environment that would be the top for debugging.
@thomwolf Please have a look at the colab. It is also reproducing the same error as before.
@VictorSanh got a similar issue once. Did you install transformers using
pip install -e .?
@lhoestq Yes, I have installed it from source using:
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
I also tried installing as suggested here in examples as:
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
pip install -r ./examples/requirements.txt
I am trying to train roberta model from scratch using run_mlm.py file. But, facing the same issue.
Didn't find file ./model_output/tokenizer.json. We won't load it.
Didn't find file ./model_output/added_tokens.json. We won't load it.
Didn't find file ./model_output/special_tokens_map.json. We won't load it.
Didn't find file ./model_output/tokenizer_config.json. We won't load it.
loading file ./model_output/vocab.json
loading file ./model_output/merges.txt
loading file None
loading file None
loading file None
loading file None
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Defaultto no truncation.
Traceback (most recent call last):
File "transformers/examples/language-modeling/run_mlm.py", line 310, in
main()
File "transformers/examples/language-modeling/run_mlm.py", line 259, in main
load_from_cache_file=not data_args.overwrite_cache,
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/dataset_dict.py", line 300, in map
for k, dataset in self.items()
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/dataset_dict.py", line 300, in
for k, dataset in self.items()
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1256, in map
update_data=update_data,
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 156, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, *kwargs)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/fingerprint.py", line 158, in wrapper
self._fingerprint, transform, kwargs_for_fingerprint
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/fingerprint.py", line 105, in update_fingerprint
hasher.update(transform_args[key])
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/fingerprint.py", line 57, in update
self.m.update(self.hash(value).encode("utf-8"))
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/fingerprint.py", line 53, in hash
return cls.hash_default(value)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/fingerprint.py", line 46, in hash_default
return cls.hash_bytes(dumps(value))
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/utils/py_utils.py", line 367, in dumps
dump(obj, file)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/datasets/utils/py_utils.py", line 339, in dump
Pickler(file, recurse=True).dump(obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/dill/_dill.py", line 446, in dump
StockPickler.dump(self, obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/dill/_dill.py", line 1438, in save_function
obj.__dict__, fkwdefaults), obj=obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/dill/_dill.py", line 1170, in save_cell
pickler.save_reduce(_create_cell, (f,), obj=obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 605, in save_reduce
save(cls)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/dill/_dill.py", line 1365, in save_type
obj.__bases__, _dict), obj=obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/dill/_dill.py", line 933, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/dill/_dill.py", line 933, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 507, in save
self.save_global(obj, rv)
File "/anaconda/envs/azureml_py36/lib/python3.6/pickle.py", line 927, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle typing.Union[str, NoneType]: it's not the same object as typing.Union
I have the same problem, how to fix it?
@VictorSanh got a similar issue once. Did you install transformers using
pip install -e .?@lhoestq Yes, I have installed it from source using:
git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .I also tried installing as suggested here in examples as:
git clone https://github.com/huggingface/transformers cd transformers pip install . pip install -r ./examples/requirements.txt
Yes @naturecreator, I had the same error last week. I managed to circumvent that by removing the editable mode when pip installing (from pip install -e . to a standard pip install .).
It worked for me both for python 3.6 and 3.7.
After the recent commit made to the script, it is running as expected without any errors.
Hey, I forked it and followed the solution given by @VictorSanh, but I am still getting this error. I am loading a custom dataset (text file) not a predefined one, and for Roberta-Base. Also using the --line_by_line parameter.Any ideas why this may be happening?
Tried by removing --line_by_line parameter. It works, but it is not taking line by line input anymore since we removed the parameter. I processed the file as a JSON for now. Is there a fix using the --line_by_line?
This an error that none of us on the team managed to fully reproduce, so if you could give us your full environment, that would be super helpful.
I would love to help. I am a bit new to this, do let me know if any more specifics are required. The versions of the required lib/lang are -
Python - 3.6.7
transformers - 3.4.0
pickle - 4.0
The command I ran was -
python3 run_mlm.py
--model_name_or_path roberta-base
--train_file train.txt
--validation_file test.txt
--do_train
--do_eval
--output_dir results/
--line_by_line
Ahah! Can reproduce! This will make investigation easier.
For future reference, here is how I create an env reproducing the bug, and the command that shows it (self-contained to the repo):
pyenv install 3.6.7
pyenv virtualenv 3.6.7 picklebug
pyenv activate picklebug
pip install --upgrade pip
pip install transformers[torch]
pip install datasets
cd git/transformers # Adapt to your local path to the cloned repo
pip install -e .
python examples/language-modeling/run_mlm.py \
--model_name_or_path roberta-base \
--train_file ./tests/fixtures/sample_text.txt \
--validation_file ./tests/fixtures/sample_text.txt \
--do_train \
--do_eval \
--output_dir /tmp/test=clm \
--line_by_line
The bug disappears for me with python 3.7.9 so if you can upgrade your python version, you should be good to go.
Further reduced, the bug appears in all python versions <= 3.6.12 but disappears in python 3.7.0.
Thanks, this was really helpful !!!