Hi,
I wanted to convert the TensorFlow checkpoint for the lm1b model to PyTorch with the convert_transfo_xl_checkpoint_to_pytorch.py script.
I downloaded the checkpoint with the download.sh script.
Then I called the convert script with:
$ python3 convert_transfo_xl_checkpoint_to_pytorch.py --pytorch_dump_folder_path converted --tf_checkpoint_path
/mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/checkpoint
Then the following error message is returned:
2019-02-19 22:46:54.693060: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
File "convert_transfo_xl_checkpoint_to_pytorch.py", line 116, in <module>
args.transfo_xl_dataset_file)
File "convert_transfo_xl_checkpoint_to_pytorch.py", line 81, in convert_transfo_xl_checkpoint_to_pytorch
model = load_tf_weights_in_transfo_xl(model, config, tf_path)
File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 141, in load_tf_weights_in_transfo_xl
init_vars = tf.train.list_variables(tf_path)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 95, in list_variables
reader = load_checkpoint(ckpt_dir_or_file)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 64, in load_checkpoint
return pywrap_tensorflow.NewCheckpointReader(filename)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 382, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 548, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I'm using the 0.6.1 version of pytorch-pretrained-BERT and the latest tf-nightly-gpu package that ships TensorFlow 1.13dev.
Hm, I guess I was using the wrong checkpoint file? When I used /mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/model.ckpt-1191000 weights are loaded, but another error occurs:
Loading TF weight transformer/r_r_bias/Adam_1 with shape [24, 16, 80]
Loading TF weight transformer/r_w_bias with shape [24, 16, 80]
Loading TF weight transformer/r_w_bias/Adam with shape [24, 16, 80]
Loading TF weight transformer/r_w_bias/Adam_1 with shape [24, 16, 80]
Traceback (most recent call last):
File "convert_transfo_xl_checkpoint_to_pytorch.py", line 116, in <module>
args.transfo_xl_dataset_file)
File "convert_transfo_xl_checkpoint_to_pytorch.py", line 81, in convert_transfo_xl_checkpoint_to_pytorch
model = load_tf_weights_in_transfo_xl(model, config, tf_path)
File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 169, in load_tf_weights_in_transfo_xl
assert pointer.shape == array.shape
AssertionError: (torch.Size([3, 1024]), (3, 1280))
Ok, the TransfoXLConfig for the lm1b model is a bit different. I tried:
config = TransfoXLConfig(vocab_size_or_config_json_file=793472,
cutoffs=[0, 60000, 100000, 640000, 793472],
d_model=1280,
d_embed=1280,
n_head=16,
d_head=80,
d_inner=8192,
div_val=4,
pre_lnorm=False,
n_layer=24,
tgt_len=32,
ext_len=0,
mem_len=128,
clamp_len=-1,
same_length=True,
proj_share_all_but_first=False,
attn_type=0,
sample_softmax=-1,
adaptive=True,
tie_weight=True,
dropout=0.0,
dropatt=0.0,
untie_r=True,
init="normal",
init_range=0.01,
proj_init_std=0.01,
init_std=0.02)
which seems not to be 100% correct. Where do I get the model json configuration from (so I can easily pass it to the convert_transfo_xl_checkpoint_to_pytorch.py script 馃
Hi Stefan,
You have to create the configuration yourself indeed 馃檪
I usually do it by looking at the training parameters of the Tensorflow code related to the model you are trying to load.
The voab cutoffs were wrong. I changed the configuration to:
config = TransfoXLConfig(vocab_size_or_config_json_file=793472,
cutoffs=[60000, 100000, 640000],
d_model=1280,
d_embed=1280,
n_head=16,
d_head=80,
d_inner=8192,
div_val=4,
pre_lnorm=False,
n_layer=24,
tgt_len=32,
ext_len=0,
mem_len=128,
clamp_len=-1,
same_length=True,
proj_share_all_but_first=False,
attn_type=0,
sample_softmax=-1,
adaptive=True,
tie_weight=True,
dropout=0.0,
dropatt=0.0,
untie_r=True,
init="normal",
init_range=0.01,
proj_init_std=0.01,
init_std=0.02,
)
And then the transformer/adaptive_softmax/cutoff_0/proj key wasn't found in the tf_weights dict:
transformer/adaptive_softmax/cutoff_0/proj
Traceback (most recent call last):
File "convert_transfo_xl_checkpoint_to_pytorch.py", line 142, in <module>
args.transfo_xl_dataset_file)
File "convert_transfo_xl_checkpoint_to_pytorch.py", line 107, in convert_transfo_xl_checkpoint_to_pytorch
model = load_tf_weights_in_transfo_xl(model, config, tf_path)
File "/mnt/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling_transfo_xl.py", line 150, in load_tf_weights_in_transfo_xl
assert name in tf_weights
AssertionError
That's probably a question of weights or projection tying, try to set tie_weight or proj_share_all_but_first to False (the correct value should be indicated in Google/CMU hyper-parameters for lm1b).
(I can convert this model later if you don't manage to but not before next week unfortunately)
Thanks for your help @thomwolf ! I'll try to find the correct configuration settings.
We are currently trying to integrate the Transformer-XL model into flair, and we would really like to use a larger (in terms of training size) model for downstream tasks like NER :)
Here's the last configuration I tried:
```json
{
"adaptive": true,
"attn_type": 0,
"clamp_len": -1,
"cutoffs": [
60000,
100000,
640000
],
"d_embed": 1280,
"d_head": 80,
"d_inner": 8192,
"d_model": 1280,
"div_val": 4,
"dropatt": 0.0,
"dropout": 0.1,
"ext_len": 0,
"init": "normal",
"init_range": 0.01,
"init_std": 0.02,
"mem_len": 32,
"n_head": 16,
"n_layer": 24,
"n_token": 793472,
"pre_lnorm": false,
"proj_init_std": 0.01,
"same_length": true,
"sample_softmax": -1,
"tgt_len": 32,
"tie_weight": true,
"untie_r": true,
"proj_share_all_but_first": false,
"proj_same_dim": false,
"tie_projs": [
true,
false,
true
]
}
````
Unfortunately, an error is thrown. @thomwolf it would be awesome if you can take a look on this :)
Did you manage to convert this model @stefan-it?
Sadly, I couldn't managed to convert it (I tried several options)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@stefan-it did you ever manage to convert this model?
Hi @irugina, unfortunately, I wasn't able to convert the model 馃槥