Caffe: LMDB source doesn't work under valgrind

Created on 2 May 2015 · 11Comments · Source: BVLC/caffe

Originally reported by @thatguymike

Simple repro, after downloading and creating the MNIST model (see examples/mnist/readme.md):

$ valgrind ./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt 
[...]
F0501 18:17:36.543970 20545 db.hpp:109] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
[...]

Note that valgrind reports no errors before this (excluding the proverbial CUDA driver errors).
The error was generated by the following line in db.cpp:
https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp#L33

Using strace, I found that lmdb tries to mmap(2) the existing database file (examples/mnist/mnist_train_lmdb/data.mdb) with a size of 1 TB (because of variable LMDB_MAP_SIZE):

mmap(NULL, 1099511627776, PROT_READ, MAP_SHARED, 31, 0) = 0x7e2832583000

This might be the issue, I don't fully understand how valgrind works, but for heap allocation it has to shadow ("duplicate") the memory in order to keep track of uninitialized values with a bit-level accuracy.
This seems to be also true for mmaped memory since on my system with 16 GB the mmap allocation limit was around ~7.2 GB (so 14.4 GB with shadow memory).

I don't think this is a LMDB bug, but a limitation of valgrind. So, I only see two ways of tackling this:
1) Mention this limitation in the documentation, to prevent other people from staring blankly at this puzzling issue.
2) Try to find a workaround, do we really need to set the map size to 1TB? Maybe in some cases we can avoid that, for instance if we know the DB is read-only, maybe we can set LMDB_MAP_SIZE == size of the database file. If the database can be modified by another writer simultaneously, I suppose this workaround is not possible.

Thoughts? Other ideas?

upstream issue

Source

flx42

Most helpful comment

I found that problem was that for some reason lmdb can't create it's database in shared folder (I was running caffe on Ubuntu 14.04 in VirtualBox).
So I think problem as you say is in lack of support for memory mapped files.

mrgloom on 15 Jun 2016

👍2

All 11 comments

valgrind has a hardcoded limit on how much memory it will handle. The limit is arbitrary and can be changed by recompiling valgrind. I think 1TB may be a bit much for it, but we have certainly used valgrind with LMDB before.

http://stackoverflow.com/questions/8644234/why-is-valgrind-limited-to-32-gb-on-64-bit-architectures

hyc on 2 May 2015

valgrind has a hardcoded limit on how much memory it will handle. The limit is arbitrary and can be changed by recompiling valgrind.

Sure, but if valgrind still needs to shadow the memory, it won't solve this issue since I don't have 2TB of RAM.

I think 1TB may be a bit much for it, but we have certainly used valgrind with LMDB before.

I don't argue with that, as I said, it works if I modify the map size to 7GB (~half my RAM). I'm trying to find a solution to accommodate all reasonable sizes of database while still allowing valgrind to be used.

flx42 on 2 May 2015

@flx42 So what is the solution now? Thank you!

acgtyrant on 19 Nov 2015

I don't think anything has changed on this side.

flx42 on 19 Nov 2015

Is there a fix for this problem? I am seeing the same issue with valgrind.

awan-10 on 17 Jan 2016

Same issue when I trying to create new lmdb

    string db_type= "lmdb";
    pDatabase= db::GetDB(db_type);
    string path= dbPath.string();
    cout << path << endl;
    pDatabase->Open(path, db::NEW);

F0518 15:43:50.384389 21482 db_lmdb.hpp:15] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument

But I'm not using Valgrind.

mrgloom on 18 May 2016

I see the same error as "mrgloom" above while trying to run _convert_mnist_data_ on the dockerized caffe (hosted on OS X, if that matters):

$docker run -ti --rm --volume=$(pwd):/workspace caffe:cpu bash mnist/create_mnist.sh
Creating lmdb...
libdc1394 error: Failed to initialize libdc1394
F0615 02:55:42.737716     7 db_lmdb.hpp:15] Check failed: mdb_status == 0 (22 vs. 0) Invalid argument
*** Check failure stack trace: ***
    @     0x7f8718fcbdaa  (unknown)
    @     0x7f8718fcbce4  (unknown)
    @     0x7f8718fcb6e6  (unknown)
    @     0x7f8718fce687  (unknown)
    @     0x7f8719342361  caffe::db::LMDB::Open()
    @           0x402b8f  convert_dataset()
    @           0x40261d  main
    @     0x7f87181dbf45  (unknown)
    @           0x402666  (unknown)
    @              (nil)  (unknown)
mnist/create_mnist.sh: line 17:     7 Aborted                 $BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}

I end up with an 8k file called "lock.mdb" in each of the train and test folders.

cbare on 15 Jun 2016

...and a little more searching indicates my particular issue may be a lack of support for memory mapped files when mounting host folders in docker / boot2docker / vitualbox. Maybe similar to https://github.com/docker-library/mongo/issues/30 and maybe only superficially similar to the other issues here.

Also, _convert_mnist_data_ works when writing inside the container's own filesystem rather than to a shared folder.

cbare on 15 Jun 2016

👍2

mrgloom on 15 Jun 2016

👍2

@cbare Same problem .. How to solve .. Thank you ..