Transformers: Where does the pre-trained bert model gets cached in my system by default?

Created on 26 Dec 2019  ยท  6Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help


I used model_class.from_pretrained('bert-base-uncased') to download and use the model. The next time when I use this command, it picks up the model from cache. But when I go into the cache, I see several files over 400M with large random names. How do I know which is the bert-base-uncased or distilbert-base-uncased model? Maybe I am looking at the wrong place

Most helpful comment

Each file in the cache comes with a .json file describing what's inside.

_This isn't part of transformers' public API and may change at any time in the future._

Anyway, here's how you can locate a specific file:

$ cd ~/.cache/torch/transformers
$ grep /bert-base-uncased *.json
26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.json:{"etag": "\"64800d5d8528ce344256daf115d4965e\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt"}
4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.bf3b9ea126d8c0001ee8a1e8b92229871d06d36d8808208cc2449280da87785c.json:{"etag": "\"74d4f96fdabdd865cbdbe905cd46c1f1\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json"}
d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.json:{"etag": "\"41a0e56472bad33498744818c8b1ef2c-64\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5"}

Here, bert-base-uncased-tf_model.h5 is cached as d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.

All 6 comments

AFAIK, the cache folder is hidden. You can download the files manually and the save them to your desired location two files to download is config.json and .bin and you can call it through pretrained suppose you wanted to instantiate BERT then do BertForMaskedLM.from_pretrained(Users/<Your location>/<your folder name>)

Each file in the cache comes with a .json file describing what's inside.

_This isn't part of transformers' public API and may change at any time in the future._

Anyway, here's how you can locate a specific file:

$ cd ~/.cache/torch/transformers
$ grep /bert-base-uncased *.json
26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.json:{"etag": "\"64800d5d8528ce344256daf115d4965e\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt"}
4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.bf3b9ea126d8c0001ee8a1e8b92229871d06d36d8808208cc2449280da87785c.json:{"etag": "\"74d4f96fdabdd865cbdbe905cd46c1f1\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json"}
d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.json:{"etag": "\"41a0e56472bad33498744818c8b1ef2c-64\"", "url": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5"}

Here, bert-base-uncased-tf_model.h5 is cached as d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.

The discussion in #2157 could be useful too.

Hi!
What if I use colab then how can I find the cash file? @aaugustin

For anyone landed here wondering if one can globally change the cache directory: set PYTORCH_TRANSFORMERS_CACHE environment variable in shell before running the python interpreter.

You can get find it the same way transformers do it:

from transformers.file_utils import hf_bucket_url, cached_path
pretrained_model_name = 'DeepPavlov/rubert-base-cased'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

adigoryl picture adigoryl  ยท  3Comments

fabiocapsouza picture fabiocapsouza  ยท  3Comments

siddsach picture siddsach  ยท  3Comments

rsanjaykamath picture rsanjaykamath  ยท  3Comments

lcswillems picture lcswillems  ยท  3Comments