Litedb: Concurrency issue while Querying

Created on 25 Jan 2020  路  17Comments  路  Source: mbdavid/LiteDB

In LiteDB 5 it appears you cannot use the result of a query as an Enumerable to pass to Parallel.Foreach if you are performing additional query operations on the database inside of the Parallel.Foreach loop.

The error message received is "System.InvalidOperationException: 'Collection was modified; enumeration operation may not execute.'" this is thrown on var exists = col.Exists(x => x.IdentityHash == IdentityHash && x.RunId == RunId); inside WriteObjectExists

I'm not performing any writes while this is being run. The below code can reproduce the issue with a few thousand objects in the database. I've tried generating the IEnumerable different ways (as shown below)

public static IEnumerable<WriteObject> GetMissingFromFirst2(string firstRunId, string secondRunId)
{
    var col = db.GetCollection<WriteObject>("WriteObjects");

    var list = new ConcurrentBag<WriteObject>();

    var wos = col.Find(x => x.RunId == secondRunId);

    Parallel.ForEach(wos, wo =>
    {
        if (!WriteObjectExists(firstRunId, wo.IdentityHash))
        {
            list.Add(wo);
        }
    });

    return list;
}

public static IEnumerable<WriteObject> GetMissingFromFirst(string firstRunId, string secondRunId)
{
    var bag= new ConcurrentBag<WriteObject>();

    var identityHashes = db.Execute($"SELECT IdentityHash FROM WriteObjects WHERE RunId = @0",
            new BsonDocument
            {
                ["0"] = secondRunId
            });

    Parallel.ForEach(identityHashes.ToEnumerable(), IdentityHash =>
    {
        if (!WriteObjectExists(firstRunId, IdentityHash["IdentityHash"].AsString))
        {
            list.Add(GetWriteObject(secondRunId, IdentityHash.AsString));
        }
    });
    return bag;
}

private static bool WriteObjectExists(string RunId, string IdentityHash)
{
    var col = db.GetCollection<WriteObject>("WriteObjects");
    var exists = col.Exists(x => x.IdentityHash == IdentityHash && x.RunId == RunId);
    return exists;
}

private static WriteObject GetWriteObject(string runId, string IdentityHash)
{
    var col = db.GetCollection<WriteObject>("WriteObjects");
    return col.FindOne(x => x.IdentityHash == IdentityHash && x.RunId == runId);
}
bug

Most helpful comment

@gfs, nested queries will be on our roadmap!

All 17 comments

This same code (GetMissingFromFirst2) works on LiteDB 4.1.4.

I also noticed that support for nested queries seems to have been dropped in v5. Is is planned to bring that back?

For example:

col.Find(x => x.RunId == secondRunId && !col.Exists(y => y.RunId == firstRunId && y.IdentityHash == x.IdentityHash));

is accepted as valid valid on LiteDB 4.1.4 but not on LiteDB 5-rc.

This is another query I'd like to be able to run:

col.Find(x => x.RunId == firstRunId && col.Exists(y => y.RunId == secondRunId && y.IdentityHash == x.IdentityHash && y.InstanceHash != x.InstanceHash));

This is to replace an SQL Query which runs relatively quickly on SQLite, which I'm trying to replicate in LiteDB.

Hi @gfs, I fix some dispose query pipe and try to simulate your example. Please, try with current master branch. If not works, please, write an unit test so I can debug whats going on.

About nested queries: this feature in v4 is executed as LINQ Objects, not using engine . In v4, when you write a LINQ expression that are not parsed into index-value query, LiteDB just execute all records and run your expression as LINQ expression. There is no Query Planner or Query Engine.

In v5, all LINQ expression are converted into Query Planner (with optimizations) and Query Executor execute query pipe. There is no support yet for nested queries (maybe in next versions). If you need, your use a simple LINQ Object query... as v4 runs:

C# col.Find(x => x.RunId == secondRunId) .Where(col.Exists(y => y.RunId == firstRunId && y.IdentityHash == x.IdentityHash))

Please, try with current master branch.

I tried to test with the master branch, but it seems to be tracking 4.x series, which branch should I use to get the v5 with the patch?

About nested queries

Thanks, that helps clarify (and explain why that query is pretty slow). Unfortunately the Find().Where() syntax is significantly slower than the equivalent SQL query so support for true nested queries would be amazing.

@mbdavid I'll also try to reproduce the issue as it seems my benchmarks stumbled upon the same issue during the load tests over the weekend. I'll come back to this again in an hour (or 2).

Master branch contains v5 version only.... v4 are in v4 branch. How did you get v4 in master?

@gfs, nested queries will be on our roadmap!

Master branch contains v5 version only.... v4 are in v4 branch. How did you get v4 in master?

Configuration error on my part. I'm testing it properly now.

Please, try with current master branch

Current master branch appears to have fixed the issue. Thanks!

@mbdavid Seems like there's still a threading issue (possibly another one?), pushed my branch with the benchmark so you can see it for yourself (The command to run the benchmark is in the Program.cs file in the LiteDB.Benchmarks project). I'll also send you the log of my run that I've right now.

LiteDB.Benchmarks.Benchmarks.Insertion.MultiThreadedInsertionBenchmark-20200126-184642.log

Current master branch appears to have fixed the issue. Thanks!

False alarm. Now I get

System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.

I'm not inserting anything into this collection as I'm reading from it.

@JensSchadron, this log error looks like in BsonDocument concurrent access:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.AggregateException: One or more errors occurred. (Collection was modified; enumeration operation may not execute.)
 ---> System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.Collections.Generic.Dictionary`2.Enumerator.MoveNext()
   at LiteDB.BsonDocument.GetBytesCount(Boolean recalc) in /Users/jens/Documents/Projects/LiteDB/LiteDB/Document/BsonDocument.cs:line 164
   at LiteDB.Engine.HeaderPage.GetAvaiableCollectionSpace() in /Users/jens/Documents/Projects/LiteDB/LiteDB/Engine/Pages/HeaderPage.cs:line 250
   at LiteDB.Engine.CollectionService.CheckName(String name, HeaderPage header) in /Users/jens/Documents/Projects/LiteDB/LiteDB/Engine/Services/CollectionService.cs:line 28
   at LiteDB.Engine.CollectionService.Add(String name, CollectionPage& collectionPage) in /Users/jens/Documents/Projects/LiteDB/LiteDB/Engine/Services/CollectionService.cs:line 60
   at LiteDB.Engine.CollectionService.Get(String name, Boolean addIfNotExists, CollectionPage& collectionPage) in /Users/jens/Documents/Projects/LiteDB/LiteDB/Engine/Services/CollectionService.cs:line 49
   at LiteDB.Engine.Snapshot..ctor(LockMode mode, String collectionName, HeaderPage header, UInt32 transactionID, TransactionPages transPages, LockService locker, WalIndexService walIndex, DiskReader reader, Boolean addIfNotExists) in /Users/jens/Documents/Projects/LiteDB/LiteDB/Engine/Services/SnapShot.cs:line 72

@gfs, can you paste your stacktrace to check?

@gfs, about nest queries, I forgot that I implement $query function as experimental subquery: You can pass as collection input another query. It's experimental and will be replaced in future for correct syntax:

SELECT $ FROM $query('SELECT name, age FROM customer WHERE curstomerId > 100') WHERE age > 100

@gfs, can you paste your stacktrace to check?

---> System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
at System.Collections.Generic.Dictionary`2.FindEntry(TKey key)
at System.Collections.Generic.Dictionary`2.TryGetValue(TKey key, TValue& value)
at LiteDB.CacheService.GetPage(UInt32 pageID)
at LiteDB.PageService.GetPage[T](UInt32 pageID)
at LiteDB.IndexService.Find(CollectionIndex index, BsonValue value, Boolean sibling, Int32 order)
at LiteDB.QueryEquals.ExecuteIndex(IndexService indexer, CollectionIndex index)+MoveNext()
at LiteDB.LinqExtensions.<>c__DisplayClass2_0`2.<<DistinctBy>g___|0>d.MoveNext()
at LiteDB.QueryCursor.Fetch(TransactionService trans, DataService data, BsonReader bsonReader)
at LiteDB.LiteEngine.Find(String collection, Query query, Int32 skip, Int32 limit)+MoveNext()
at LiteDB.LiteEngine.Find(String collection, Query query, String[] includes, Int32 skip, Int32 limit)+MoveNext()
at LiteDB.LiteCollection`1.Find(Query query, Int32 skip, Int32 limit)+MoveNext()
at System.Linq.Enumerable.TryGetFirst[TSource](IEnumerable`1 source, Boolean& found)
at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
at LiteDB.LiteCollection`1.FindOne(Query query)
at AttackSurfaceAnalyzer.Utils.DatabaseManager.<>c__DisplayClass39_0.<GetModified>b__0(WriteObject WO) in C:\Users\Gstoc\Documents\GitHub\AttackSurfaceAnalyzer\Lib\Utils\DatabaseManager.cs:line 417
at System.Threading.Tasks.Parallel.<>c__DisplayClass33_0`2.<ForEachWorker>b__0(Int32 i)
at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)`

We are experiencing similar issues currently (v4 and v5)
The cause might be related to that:
https://github.com/dotnet/runtime/issues/26868

we are hitting this issue as well. I made a PR and we would really appreciate a bugfix release for v4.

We are hitting the issue as well :-|

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RealBlazeIt picture RealBlazeIt  路  3Comments

lidanger picture lidanger  路  3Comments

axelgenus picture axelgenus  路  3Comments

GW-FUB picture GW-FUB  路  3Comments

dangershony picture dangershony  路  3Comments