Transformers: Transformer-XL: Convert lm1b model to PyTorch

Created on 19 Feb 2019 · 12Comments · Source: huggingface/transformers

Hi,

I wanted to convert the TensorFlow checkpoint for the lm1b model to PyTorch with the convert_transfo_xl_checkpoint_to_pytorch.py script.

I downloaded the checkpoint with the download.sh script.

Then I called the convert script with:

$ python3 convert_transfo_xl_checkpoint_to_pytorch.py --pytorch_dump_folder_path converted --tf_checkpoint_path
/mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/checkpoint

Then the following error message is returned:

2019-02-19 22:46:54.693060: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
  File "convert_transfo_xl_checkpoint_to_pytorch.py", line 116, in <module>
    args.transfo_xl_dataset_file)
  File "convert_transfo_xl_checkpoint_to_pytorch.py", line 81, in convert_transfo_xl_checkpoint_to_pytorch
    model = load_tf_weights_in_transfo_xl(model, config, tf_path)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 141, in load_tf_weights_in_transfo_xl
    init_vars = tf.train.list_variables(tf_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 95, in list_variables
    reader = load_checkpoint(ckpt_dir_or_file)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 64, in load_checkpoint
    return pywrap_tensorflow.NewCheckpointReader(filename)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 382, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern), status)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 548, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/checkpoint: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I'm using the 0.6.1 version of pytorch-pretrained-BERT and the latest tf-nightly-gpu package that ships TensorFlow 1.13dev.

Help wanted wontfix

Source

stefan-it

All 12 comments

Hm, I guess I was using the wrong checkpoint file? When I used /mnt/transformer-xl/tf/sota/pretrained_xl/tf_lm1b/model/model.ckpt-1191000 weights are loaded, but another error occurs:

Loading TF weight transformer/r_r_bias/Adam_1 with shape [24, 16, 80]
Loading TF weight transformer/r_w_bias with shape [24, 16, 80]
Loading TF weight transformer/r_w_bias/Adam with shape [24, 16, 80]
Loading TF weight transformer/r_w_bias/Adam_1 with shape [24, 16, 80]
Traceback (most recent call last):
  File "convert_transfo_xl_checkpoint_to_pytorch.py", line 116, in <module>
    args.transfo_xl_dataset_file)
  File "convert_transfo_xl_checkpoint_to_pytorch.py", line 81, in convert_transfo_xl_checkpoint_to_pytorch
    model = load_tf_weights_in_transfo_xl(model, config, tf_path)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 169, in load_tf_weights_in_transfo_xl
    assert pointer.shape == array.shape
AssertionError: (torch.Size([3, 1024]), (3, 1280))

stefan-it on 19 Feb 2019

Ok, the TransfoXLConfig for the lm1b model is a bit different. I tried:

config = TransfoXLConfig(vocab_size_or_config_json_file=793472,
                 cutoffs=[0, 60000, 100000, 640000, 793472],
                 d_model=1280,
                 d_embed=1280,
                 n_head=16,
                 d_head=80,
                 d_inner=8192,
                 div_val=4,
                 pre_lnorm=False,
                 n_layer=24,
                 tgt_len=32,
                 ext_len=0,
                 mem_len=128,
                 clamp_len=-1,
                 same_length=True,
                 proj_share_all_but_first=False,
                 attn_type=0,
                 sample_softmax=-1,
                 adaptive=True,
                 tie_weight=True,
                 dropout=0.0,
                 dropatt=0.0,
                 untie_r=True,
                 init="normal",
                 init_range=0.01,
                 proj_init_std=0.01,
                 init_std=0.02)

which seems not to be 100% correct. Where do I get the model json configuration from (so I can easily pass it to the convert_transfo_xl_checkpoint_to_pytorch.py script 🤔

stefan-it on 20 Feb 2019

Hi Stefan,
You have to create the configuration yourself indeed 🙂
I usually do it by looking at the training parameters of the Tensorflow code related to the model you are trying to load.

thomwolf on 20 Feb 2019

The voab cutoffs were wrong. I changed the configuration to:

config = TransfoXLConfig(vocab_size_or_config_json_file=793472,
                 cutoffs=[60000, 100000, 640000],
                 d_model=1280,
                 d_embed=1280,
                 n_head=16,
                 d_head=80,
                 d_inner=8192,
                 div_val=4,
                 pre_lnorm=False,
                 n_layer=24,
                 tgt_len=32,
                 ext_len=0,
                 mem_len=128,
                 clamp_len=-1,
                 same_length=True,
                 proj_share_all_but_first=False,
                 attn_type=0,
                 sample_softmax=-1,
                 adaptive=True,
                 tie_weight=True,
                 dropout=0.0,
                 dropatt=0.0,
                 untie_r=True,
                 init="normal",
                 init_range=0.01,
                 proj_init_std=0.01,
                 init_std=0.02,
                 )

And then the transformer/adaptive_softmax/cutoff_0/proj key wasn't found in the tf_weights dict:

transformer/adaptive_softmax/cutoff_0/proj
Traceback (most recent call last):
  File "convert_transfo_xl_checkpoint_to_pytorch.py", line 142, in <module>
    args.transfo_xl_dataset_file)
  File "convert_transfo_xl_checkpoint_to_pytorch.py", line 107, in convert_transfo_xl_checkpoint_to_pytorch
    model = load_tf_weights_in_transfo_xl(model, config, tf_path)
  File "/mnt/pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling_transfo_xl.py", line 150, in load_tf_weights_in_transfo_xl
    assert name in tf_weights
AssertionError

stefan-it on 20 Feb 2019

That's probably a question of weights or projection tying, try to set tie_weight or proj_share_all_but_first to False (the correct value should be indicated in Google/CMU hyper-parameters for lm1b).

(I can convert this model later if you don't manage to but not before next week unfortunately)

thomwolf on 20 Feb 2019

Thanks for your help @thomwolf ! I'll try to find the correct configuration settings.

We are currently trying to integrate the Transformer-XL model into flair, and we would really like to use a larger (in terms of training size) model for downstream tasks like NER :)

stefan-it on 20 Feb 2019

Here's the last configuration I tried:

```json
{
"adaptive": true,
"attn_type": 0,
"clamp_len": -1,
"cutoffs": [
60000,
100000,
640000
],
"d_embed": 1280,
"d_head": 80,
"d_inner": 8192,
"d_model": 1280,
"div_val": 4,
"dropatt": 0.0,
"dropout": 0.1,
"ext_len": 0,
"init": "normal",
"init_range": 0.01,
"init_std": 0.02,
"mem_len": 32,
"n_head": 16,
"n_layer": 24,
"n_token": 793472,
"pre_lnorm": false,
"proj_init_std": 0.01,
"same_length": true,
"sample_softmax": -1,
"tgt_len": 32,
"tie_weight": true,
"untie_r": true,
"proj_share_all_but_first": false,
"proj_same_dim": false,
"tie_projs": [
true,
false,
true
]
}
````

Unfortunately, an error is thrown. @thomwolf it would be awesome if you can take a look on this :)

stefan-it on 25 Feb 2019

Did you manage to convert this model @stefan-it?

thomwolf on 6 Mar 2019

Sadly, I couldn't managed to convert it (I tried several options)

stefan-it on 6 Mar 2019

😕1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.