Currently it's hard to know what has been cached, say if I want to delete some old stuff, or I want to invalidate a cache because I know the content has changed:
$ ls -l ~/.allennlp/datasets/
total 180464
-rw-r--r-- 1 tafjord staff 46217074 Sep 6 10:33 aHR0cHM6Ly9zMy11cy13ZXN0LTIuYW1hem9uYXdzLmNvbS9hbGxlbm5scC9tb2RlbHMvYmlkYWYtbW9kZWwtMjAxNy4wOC4zMS50YXIuZ3o=."4688241e6d52191372d0d8dcdc97e2ae-6"
-rw-r--r-- 1 tafjord staff 46175392 Sep 20 17:58 aHR0cHM6Ly9zMy11cy13ZXN0LTIuYW1hem9uYXdzLmNvbS9hbGxlbm5scC9tb2RlbHMvYmlkYWYtbW9kZWwtMjAxNy4wOS4xNS1jaGFycGFkLnRhci5neg==."bc6cc08ed491c6ac831bf363f7312fc1-6"
I don't know if this can be made a little less obscure (e.g., make these folders with a metadata file), but a workaround is to manually base64 decode them, e.g.,:
$ for f in `ls -1 ~/.allennlp/datasets/`; do echo $f | base64 --decode; echo ''; done
https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.08.31.tar.gz
https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz
we could easily write (say) scripts/inspect_cache.py that shows filename: decoded_url
or you can do what I do when I clean caches (such as my ivy cache)--delete everything. It's just a cache.
I would like to take this issue, can someone guide me through it?
if you wanted to take the scripts/inspect_cache.py approach, most of the ingredients are in
https://github.com/allenai/allennlp/blob/master/allennlp/common/file_utils.py
you would just need to grab the value of DATASET_CACHE, call os.listdir() on it, and for each file print its name along with the result of filename_to_url. I'm not 100% sure what the ideal formatting is, you'd have to play with that.
Okay, I'm trying some things here but the long file name makes it hard to print prettily.
I tried 2 approaches, which one do you think it's better?
Printing filename: decoded_url as you suggested, using colon as separator:

Printing filename and url in separated lines, using line break as separator and labelling each one:

I significantly prefer the latter (separate lines).
Submitted a Pull Request, can someone review it please? 馃槃