Hello,
When I cargo run --release on a newly cloned repo, it creates a data.ms folder which is 200 GB in size. Why is this folder so big even before I've indexed any documents? Then after I indexed the movies.json file, which is around 8 MB, the data.ms folder size remained the same. Is this some kind of pre-allocation of disk space? Can I reduce it by configuring some environment variable, or maybe point me to the code where this is hardcoded?
I'm running MeiliSearch on Windows.
Thanks
I found the file where the sizes are hardcoded. It is in meilisearch-core/src/database.rs in the Database's open_or_create function, but I'm still wondering about the required size.
Hey @imor,
This is the first time I see the disk space used be equal to the LMDB::max_db_size parameter, this setting is normally a high bound to the size of the database (data + updates, 100GB + 100GB), it seems like on Windows this setting is not handled the same way as on unix systems, Windows kind of reserve the disk space beforehand.
This setting must have been configurable but we haven鈥檛 made it be for the moment, it seems like you will need to change the harcoded value by hand for the moment, sorry for that.
The required size for 8MB of data is probably something between 2GB and 5GB. Internal indexes are big!
Thank you for this report :)
This is related to https://github.com/mozilla/lmdb-rs/issues/40
Unfortunately, it is nuanced: there is sort of a fix upstream, but it's never been enabled on a stable release because of performance reasons.
Closed by #646