Bert: How to get distributed checkpoints to reduce the size of model only for prediction

Created on 11 Nov 2018  路  9Comments  路  Source: google-research/bert

Most helpful comment

@ZizhenWang I have exactly the same issue here, that I would like the size of the model to be small when doing the prediction. As stated by @jacobdevlin-google in #63 , the weight file contains momentum ('adam_m') and variance ('adam_v'). Then I found a solution here to exclude all Adam variables in this link

sess = tf.Session()
imported_meta = tf.train.import_meta_graph('./model.ckpt-322.meta')
imported_meta.restore(sess, './model.ckpt-322')
my_vars = []
for var in tf.all_variables():
    if 'adam_v' not in var.name and 'adam_m' not in var.name:
        my_vars.append(var)
saver = tf.train.Saver(my_vars)
saver.save(sess, './model.ckpt')

There must be some tidier solutions, but at least this one works for me, and the size of the weight file drops from 1.3GB to 400MB.

All 9 comments

I'm not sure what this means, the BERT-Base model is about 110M parameters and 440MB which should fit comfortably on most devices.

@jacobdevlin-google yes the released model is small, but after run run_classifier.py I get a 1.2G model, how to reduce its size to 400M?

@ZizhenWang here is the reason we get bigger model file https://github.com/google-research/bert/issues/63

@ZizhenWang I have exactly the same issue here, that I would like the size of the model to be small when doing the prediction. As stated by @jacobdevlin-google in #63 , the weight file contains momentum ('adam_m') and variance ('adam_v'). Then I found a solution here to exclude all Adam variables in this link

sess = tf.Session()
imported_meta = tf.train.import_meta_graph('./model.ckpt-322.meta')
imported_meta.restore(sess, './model.ckpt-322')
my_vars = []
for var in tf.all_variables():
    if 'adam_v' not in var.name and 'adam_m' not in var.name:
        my_vars.append(var)
saver = tf.train.Saver(my_vars)
saver.save(sess, './model.ckpt')

There must be some tidier solutions, but at least this one works for me, and the size of the weight file drops from 1.3GB to 400MB.

@ymcdull Good solution to strip out adam-related variables from ckpt file. The shrinked ckpt works well in inference mode (estimator.predict()). However, when i try to take it as the latest ckpt within the model_dir to resume training, it raises:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key bert/embeddings/LayerNorm/beta/adam_m not found in checkpoint
         [[node save/RestoreV2 (defined at /home/xuanhua/zhangjinhe/berts/bert_recipes/recipes/recipes/ner/berts.py:937)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

any idea?

@ymcdull Good solution to strip out adam-related variables from ckpt file. The shrinked ckpt works well in inference mode (estimator.predict()). However, when i try to take it as the latest ckpt within the model_dir to resume training, it raises:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key bert/embeddings/LayerNorm/beta/adam_m not found in checkpoint
         [[node save/RestoreV2 (defined at /home/xuanhua/zhangjinhe/berts/bert_recipes/recipes/recipes/ner/berts.py:937)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

any idea?

Hi @longbowking In my understanding, if you wanna continue training, you will need adam-related variables, since they are part of the optimizer. This stripping out adam variables way is only useful when you wanna serve the model without any more training.

Your solution works really perfect in tf 1.x verisons.BUt in tf 2.x i don't have ckpt.meta file in my checkpoint folder,BECAUSE OF EAGER EXECUTION.Do you how to do the same steps above in tf 2.1x without .meta file in checkpoint folder?

Your solution works really perfect in tf 1.x verisons.BUt in tf 2.x i don't have ckpt.meta file in my checkpoint folder,BECAUSE OF EAGER EXECUTION.Do you how to do the same steps above in tf 2.1x without .meta file in checkpoint folder?

Did you find a solution to this?

@ymcdull using your code snippets, model reduces to 390MB, but when to reload the new small checkpoint and convert it to SavedModel format. Got following errors
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value opt/bert/embeddings/word_embeddings/Adam
I try to print tf.global_variables, still got adam related variables. Any solutions?

Was this page helpful?
0 / 5 - 0 ratings