Libelektra: mmapstorage: data corruption problem

Created on 12 Jun 2018  路  12Comments  路  Source: ElektraInitiative/libelektra

I finally found out what was causing problems with testing my mmapstorage plugin with ksAppend. The problem has nothing to do with ksAppend and unfortunately, the solution is outside of mmap.

The following scenario causes problems with mmap:

  1. kdbSet (write) some KeySet to file test.mmap
  2. kdbGet KeySet from file test.mmap. Now we have a KeySet which is mmap'ed.
  3. kdbSet (write) the same KeySet back to file test.mmap

In the scenario above, the problem that arises is that we hold a reference to a KeySet in a memory mapped region. When we write the file, we first destroy test.mmap and then try to read from the destroyed file during kdbSet in step 3.

One can come up with various other scenarios which lead to similar results. The problem is that holding a KeySet that is mmap'ed is "dangerous", because the underlying file could be corrupted at any time. mmap() makes no guarantees if the file changes.

The obvious solutions are:

  1. Using a copy of the file (some temp location). This does not solve the problem of file manipulation after mmap(), and could be potentially slow (slower than memcpy() usually).
  2. Use memcpy() to have a "safe" copy of the data. Changes to the file do not affect the copy and the copy is faster than a file copy.

While I think a plain memcpy() is fine, using such an approach we could avoid mmap completely and just use fread/fwrite. Thus the point of using mmap for a storage plugin somehow becomes moot.

@markus2330 what are your thoughts?

_Note: to make sure that this has nothing to do with other parts of Elektra, I extracted a ~100 LOC example code. The same problem occurs there too. If at any point the file is changed, data from an earlier mapping becomes corruped._

question

All 12 comments

Thank you for the detailed analysis.

I assume you use MAP_PRIVATE and are talking about issues that occur within one process. Please create a PR with the example code, then it is more clear what we are talking about.

The problem is that holding a KeySet that is mmap'ed is "dangerous", because the underlying file could be corrupted at any time.

I think we can assume that nobody edits (=destroys) this file. With MAP_PRIVATE we know we do not write, so we can assume that the file stays unchanged.

Using a copy of the file

The point is avoid copying everything. Instead you should only copy something if it is modified (COW=copy on write). Updating the pointers (allocating a KeySet which points to the region within mmap) might be necessary though.

Furthermore, you need to remember which Keys are mmaped, and which are allocated. Changes in mmaped keys lead to COW. If this is implemented correctly you could avoid any changes in the mmap region. But I think you actually do not need to avoid any changes (to improve performance) because MAP_PRIVATE implements COW already. (E.g. if you write a shorter value then the current one, you simply overwrite the size+value without reallocating anything.)

Of course performance is everything, that is the whole purpose of we are doing here :wink: So you should benchmark if an operation might be expensive.

kdbSet

And another important point: kdbSet() copies per design everything (see splitPrepare, which does a ksDeepDup). Maybe you simply need to copy earlier or reimplement ksDeepDup to consider mmaped Keys. This alone should fix the problem you have.

The error occurs e.g. when the unterlying file is truncated.

And another important point: kdbSet() copies per design everything [...]

I am just using kdbGet/kdbSet on the plugin in the unit-tests, so I was not talking about the KDB operations.

Since I don't see the point of a PR for the code I was talking about, I simply pushed it to a separate github repo. The problem is that, even with MAP_PRIVATE, if the file was fopen()ed with mode "w+" it is truncated and the old mappings are corrupted.

I can of course avoid this in my code. My concern are external file changes.

I think we can assume that nobody edits (=destroys) this file.

I have a bad feeling about this assumption. If it happens, we get a SIGBUS or segfault or ...

Ahh, I see. So the problem is that you are trying to write to the mmap file within kdbSet? You should not do that. Instead only kdbGet should write (of course only if needed) and mmap the file.

The problem is that, even with MAP_PRIVATE, if the file was fopen()ed with mode "w+" it is truncated and the old mappings are corrupted.

Yes, of course you are not allowed to change the underlying file while it is mmaped.

I can of course avoid this in my code. My concern are external file changes.

Before you start writing to the mmap file you of course need to unlink the old file. Otherwise you will destroy data of the processes which still have it mmaped.

I have a bad feeling about this assumption. If it happens, we get a SIGBUS or segfault or ...

Yes, thus we need to clearly state that nobody (except our plugins within kdbGet) is allowed to touch these files. mmap is a common technique, most databases etc. have the same assumption. Usually, people do not try to edit binary files.

Since I don't see the point of a PR for the code I was talking about

The point of PRs is not always to merge something but also to discuss and review code (or just to build something on the build server).

Ah I think we're not talking about the same plugin. I was talking about mmapstorage, which should be a "simple" storage plugin and not the global cache. It would be like a drop-in replacement for dump.
For the storage, I would have to write on kdbSet, like dump does.

Edit: the intention of the storage is to use the same binary format as the cache plugin will do.

For the storage, I would have to write on kdbSet, like dump does.

Ok, nevertheless unlink before writing to the file is the key :wink:

A technical document how everything should work would be very interesting.

I'm a bit busy until friday, but then I will try the unlinking approach. I didn't have that idea yet. With a bit of bookkeeping of old mappings it could work, but I want to make a POC to be sure. This could also solve part of the problem for the global cache. Thanks very much for the input!

@markus2330 thanks again for the input. I implemented the unlinking approach and it seems to work fine. Unlinking currently moves mappings from file backed mappings to MAP_ANONYMOUS, since this was faster to implement. MAP_ANONYMOUS works on my Linux and macOS, but it is not POSIX, so I might replace it later with normal mallocs.

During development of mmapstorage I have written many tests that could also be run on all storage plugins in general (i.e. they do not rely on mmap and test many Key and KeySet operations). Is there any interest to move the tests outside of my plugin? I intend to write tests for a large portion of the Key and KeySet APIs.

I also wrote a Map kind of structure with binary search and insert and auto-realloc. This could probably be used to speed up dump, or generalized as a small key/value map library. Is there interest for this or should I just leave it inside of my plugin?

seems to work fine

Good to hear these news.

replace MAP_ANONYMOUS

Yes, it would be great if it works on POSIX.

many tests that could also be run on all storage plugins in general

Yes, I would love to see all storage plugins tested with your unit tests. We currently only have tests/shell/check_kdb_internal_check.sh which calls the kdb check on some storage plugins.

Furthermore, we have a build job that uses ini as default. Maybe you can also add a build job that uses the mmap storage as default?

I intend to write tests for a large portion of the Key and KeySet APIs.

Thank you! :heart:

speed up dump

No, there is little need to speed up dump. If someone wants to speed up kdbGet your global mmap should be used.

generalized as a small key/value map library

I am not against it but beware: designing good library interfaces is very time-consuming.

Yes, it would be great if it works on POSIX.

Noted, will be done.

all storage plugins tested

Will be done.

Maybe you can also add a build job that uses the mmap storage as default?

I have it on a private jenkins. I will add a build job when I figure out the new build system.

very time-consuming.

I will not dive into that then. :laughing:

I have it on a private jenkins. I will add a build job when I figure out the new build system.

Now is a good time as long @ingwinlu is still around all the time. It is also a good time for adding further distros or similar. There is already docu (~e/doc/BUILDSERVER.md) but it obviously can always be improved.

Everything from here is clear now, I will track further mmapstorage progress in #2106. Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

markus2330 picture markus2330  路  3Comments

mpranj picture mpranj  路  3Comments

dominicjaeger picture dominicjaeger  路  3Comments

markus2330 picture markus2330  路  4Comments

markus2330 picture markus2330  路  3Comments