Currently, the default download directory for dataset caching appears to be ~/tensorflow_datasets. However, since it's not a folder that is meant to be accessed through a file manager, I'd suggest to make it hidden by default, e.g. ~/.tensorflow_datasets.
Thanks for the suggestion! We'd be open to making it hidden, but it is sometimes useful to look in that directory. The manual_dir that some datasets need (e.g. ImageNet) also defaults to being in there, so that also makes us lean to not having it be hidden. I'll leave this open though so that other users can chime in.
Also open to discussion, but I would have some concerns about making it hidden.
The download directory may contains hundreds of GB downloaded automatically. In a hidden directory, users may have more difficulty to locate where their data are located, or even not being aware that they have a huge hidden folder consuming their diskspace.
Thanks for the proposal @frthjf
I'd second what @Conchylicultor wrote: this directory can be quite large, and making it hidden by default might be confusing to some users.
You can always select the hidden path as your data dir - if you so prefer.
I'll close this bug.
I have been searching and struggling to find an option to allow me setting a different default directory other than ~/tensorflow_datasets. Oftentimes there is personal convention on organizing files and directories, not having such an option is really surprising.
@tufei Can't you just set data_dir to whatever you want ?
ds = tfds.load('mnist', data_dir='/my/custom/tensorflow_datasets')
Could this be put into a JSON config option? e.g. ~/.config/tensorflow_datasets.json:
{
"data_dir": "/mnt/big_drive/.tensorflow_datasets"
}
Most helpful comment
Could this be put into a JSON config option? e.g.
~/.config/tensorflow_datasets.json: