Originally reported by @thatguymike
Simple repro, after downloading and creating the MNIST model (see examples/mnist/readme.md):
$ valgrind ./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt
[...]
F0501 18:17:36.543970 20545 db.hpp:109] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
[...]
Note that valgrind reports no errors before this (excluding the proverbial CUDA driver errors).
The error was generated by the following line in db.cpp:
https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp#L33
Using strace, I found that lmdb tries to mmap(2) the existing database file (examples/mnist/mnist_train_lmdb/data.mdb) with a size of 1 TB (because of variable LMDB_MAP_SIZE):
mmap(NULL, 1099511627776, PROT_READ, MAP_SHARED, 31, 0) = 0x7e2832583000
This might be the issue, I don't fully understand how valgrind works, but for heap allocation it has to shadow ("duplicate") the memory in order to keep track of uninitialized values with a bit-level accuracy.
This seems to be also true for mmaped memory since on my system with 16 GB the mmap allocation limit was around ~7.2 GB (so 14.4 GB with shadow memory).
I don't think this is a LMDB bug, but a limitation of valgrind. So, I only see two ways of tackling this:
1) Mention this limitation in the documentation, to prevent other people from staring blankly at this puzzling issue.
2) Try to find a workaround, do we really need to set the map size to 1TB? Maybe in some cases we can avoid that, for instance if we know the DB is read-only, maybe we can set LMDB_MAP_SIZE == size of the database file. If the database can be modified by another writer simultaneously, I suppose this workaround is not possible.
Thoughts? Other ideas?
valgrind has a hardcoded limit on how much memory it will handle. The limit is arbitrary and can be changed by recompiling valgrind. I think 1TB may be a bit much for it, but we have certainly used valgrind with LMDB before.
http://stackoverflow.com/questions/8644234/why-is-valgrind-limited-to-32-gb-on-64-bit-architectures
valgrind has a hardcoded limit on how much memory it will handle. The limit is arbitrary and can be changed by recompiling valgrind.
Sure, but if valgrind still needs to shadow the memory, it won't solve this issue since I don't have 2TB of RAM.
I think 1TB may be a bit much for it, but we have certainly used valgrind with LMDB before.
I don't argue with that, as I said, it works if I modify the map size to 7GB (~half my RAM). I'm trying to find a solution to accommodate all reasonable sizes of database while still allowing valgrind to be used.
@flx42 So what is the solution now? Thank you!
I don't think anything has changed on this side.
Is there a fix for this problem? I am seeing the same issue with valgrind.
Same issue when I trying to create new lmdb
string db_type= "lmdb";
pDatabase= db::GetDB(db_type);
string path= dbPath.string();
cout << path << endl;
pDatabase->Open(path, db::NEW);
F0518 15:43:50.384389 21482 db_lmdb.hpp:15] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
But I'm not using Valgrind.
I see the same error as "mrgloom" above while trying to run _convert_mnist_data_ on the dockerized caffe (hosted on OS X, if that matters):
$docker run -ti --rm --volume=$(pwd):/workspace caffe:cpu bash mnist/create_mnist.sh
Creating lmdb...
libdc1394 error: Failed to initialize libdc1394
F0615 02:55:42.737716 7 db_lmdb.hpp:15] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
*** Check failure stack trace: ***
@ 0x7f8718fcbdaa (unknown)
@ 0x7f8718fcbce4 (unknown)
@ 0x7f8718fcb6e6 (unknown)
@ 0x7f8718fce687 (unknown)
@ 0x7f8719342361 caffe::db::LMDB::Open()
@ 0x402b8f convert_dataset()
@ 0x40261d main
@ 0x7f87181dbf45 (unknown)
@ 0x402666 (unknown)
@ (nil) (unknown)
mnist/create_mnist.sh: line 17: 7 Aborted $BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}
I end up with an 8k file called "lock.mdb" in each of the train and test folders.
...and a little more searching indicates my particular issue may be a lack of support for memory mapped files when mounting host folders in docker / boot2docker / vitualbox. Maybe similar to https://github.com/docker-library/mongo/issues/30 and maybe only superficially similar to the other issues here.
Also, _convert_mnist_data_ works when writing inside the container's own filesystem rather than to a shared folder.
I found that problem was that for some reason lmdb can't create it's database in shared folder (I was running caffe on Ubuntu 14.04 in VirtualBox).
So I think problem as you say is in lack of support for memory mapped files.
@cbare Same problem .. How to solve .. Thank you ..
@guotong1988 don't use share folder.user a local folder is ok for me.
Most helpful comment
I found that problem was that for some reason lmdb can't create it's database in shared folder (I was running caffe on Ubuntu 14.04 in VirtualBox).
So I think problem as you say is in lack of support for memory mapped files.