Allennlp: Get descriptions of cached urls

Created on 14 Oct 2017  路  7Comments  路  Source: allenai/allennlp

Currently it's hard to know what has been cached, say if I want to delete some old stuff, or I want to invalidate a cache because I know the content has changed:

$ ls -l ~/.allennlp/datasets/
total 180464
-rw-r--r--  1 tafjord  staff  46217074 Sep  6 10:33 aHR0cHM6Ly9zMy11cy13ZXN0LTIuYW1hem9uYXdzLmNvbS9hbGxlbm5scC9tb2RlbHMvYmlkYWYtbW9kZWwtMjAxNy4wOC4zMS50YXIuZ3o=."4688241e6d52191372d0d8dcdc97e2ae-6"
-rw-r--r--  1 tafjord  staff  46175392 Sep 20 17:58 aHR0cHM6Ly9zMy11cy13ZXN0LTIuYW1hem9uYXdzLmNvbS9hbGxlbm5scC9tb2RlbHMvYmlkYWYtbW9kZWwtMjAxNy4wOS4xNS1jaGFycGFkLnRhci5neg==."bc6cc08ed491c6ac831bf363f7312fc1-6"

I don't know if this can be made a little less obscure (e.g., make these folders with a metadata file), but a workaround is to manually base64 decode them, e.g.,:

$ for f in `ls -1 ~/.allennlp/datasets/`; do echo $f | base64 --decode; echo ''; done
https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.08.31.tar.gz
https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz
Good First Issue Help Wanted

All 7 comments

we could easily write (say) scripts/inspect_cache.py that shows filename: decoded_url

or you can do what I do when I clean caches (such as my ivy cache)--delete everything. It's just a cache.

I would like to take this issue, can someone guide me through it?

if you wanted to take the scripts/inspect_cache.py approach, most of the ingredients are in

https://github.com/allenai/allennlp/blob/master/allennlp/common/file_utils.py

you would just need to grab the value of DATASET_CACHE, call os.listdir() on it, and for each file print its name along with the result of filename_to_url. I'm not 100% sure what the ideal formatting is, you'd have to play with that.

Okay, I'm trying some things here but the long file name makes it hard to print prettily.
I tried 2 approaches, which one do you think it's better?

Printing filename: decoded_url as you suggested, using colon as separator:
image

Printing filename and url in separated lines, using line break as separator and labelling each one:
image

I significantly prefer the latter (separate lines).

Submitted a Pull Request, can someone review it please? 馃槃

Was this page helpful?
0 / 5 - 0 ratings