It would be nice to make CACHE_ROOT configurable.
For example passing its value through environment variable:
CACHE_ROOT = os.path.expanduser(os.getenv('FLAIR_CACHE_ROOT', '~/.flair'))
instead of:
CACHE_ROOT = os.path.expanduser(os.path.join('~', '.flair'))
in file_utils.py
This would be extremely helpful, especially for those of us working in HPC cluster environments with tiny home directory quota sizes!
Yes, good point. I'll add a feature tag to this issue!
Hi @gknor and @stevenbedrick,
we just pushed a PR to address this into master. You can now specify the base cache directory, i.e. the directory to where all models/embeddings/datasets get downloaded. It defaults to .flair in your home folder, but you can override this by calling
import flair
flair.cache_root = 'my/cache/directory'
before calling your code. Let us know if this works for you!
Awesome, thanks! I'll give it a try next time I fire up Flair. 馃憤
Thanks @alanakbik .
I would like to suggest to extend this with os.getenv('FLAIR_CACHE_ROOT', <defaultpath>) (suggested by @gknor), as this won't create conflicts to existing usages but would extend it to allow configurations for use in packages that use flair without manually patching them.
In my own code I could set this global variable but other packages would again default to the system one and I have duplicate model data. With an optional (normally unset) environment variable, users can better specify globally where to store their models, e. g. external disks ...
E. g. like the transformers package: https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py#L75
Or NLTK: https://www.nltk.org/_modules/nltk/data.html
Hi, I just tried the following code to try to set the cache folder to another location.
import flair
flair.cache_root = 'my/cache/directory' (with a real directory).
Unfortunately, I meet this error:
/data/DataikuDSS/DATA_DIR/code-envs/python/catalys-nlp/lib/python3.6/site-packages/flair/models/sequence_tagger_model.py in _fetch_model(model_name)
1218 model_path = cached_download(url=url, library_name="flair",
1219 library_version=flair.__version__,
-> 1220 cache_dir=flair.cache_root / 'models' / model_folder)
1221 except HTTPError as e:
1222 # output information
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Solved the problem, putting it there if someone comes here and has the same issue. The required path must be a Path from the pathlib module.
from pathlib import Path
#flair.cache_root = "/my/path/.flair" # DOES NOT WORK
flair.cache_root = Path("/my/path/.flair") # WORKS
Most helpful comment
Thanks @alanakbik .
I would like to suggest to extend this with
os.getenv('FLAIR_CACHE_ROOT', <defaultpath>)(suggested by @gknor), as this won't create conflicts to existing usages but would extend it to allow configurations for use in packages that useflairwithout manually patching them.In my own code I could set this global variable but other packages would again default to the system one and I have duplicate model data. With an optional (normally unset) environment variable, users can better specify globally where to store their models, e. g. external disks ...
E. g. like the
transformerspackage: https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py#L75Or
NLTK: https://www.nltk.org/_modules/nltk/data.html