NullReferenceException on Insert with auto id and BsonDocument with _id=0
Version
LiteDB 5.0.2 and the latest from repo, Windows 10, net45
Describe the bug
LiteCollection<BsonDocument> with auto id Int32.BsonDocument with _id = 0.Code to Reproduce
using (var db = new LiteDatabase(":memory:"))
{
var test = db.GetCollection("Test", BsonAutoId.Int32);
var doc = new BsonDocument() { ["_id"] = 0, ["p1"] = 1 };
test.Insert(doc); // -> NullReferenceException
}
Expected behavior
This scenario works without exceptions and maybe sets the _id in the input document to the generated value.
But if for some reason this scenario is not valid then the exception should be different, some clear error.
Stack
> LiteDB.dll!LiteDB.LiteCollection<LiteDB.BsonDocument>.Insert(LiteDB.BsonDocument entity) Line 27 C#
MyLiteDB.exe!TryLiteDB.Program.Main(string[] args) Line 17 C#
Failed code due to null _id:
// checks if must update _id value in entity
if (removed)
{
_id.Setter(entity, id.RawValue);
}
Thanks again @nightroman, this 3th-party fine tests are great for LiteDB
I have tried the change. It does not fail anymore, thank you (some workaround in my code will go).
I have some doubts about the details... The test code does not change the _id in the input document. So that in the test we may add doc["_id"].AsInt32.Should().Be(0); to the end.
But is this "right"? This is not consistent with strongly typed cases where input IDs are updated in "documents" and set to values used by Insert. (Just a question, I have no opinion on my own..)
I assume that if you are using BsonDocument as T in GetCollection, you can omit _id in your BsonDocument entity (different when you use strong typed class with int or long id value).
Make sense to you?
This explanation makes sense, yes. But note that I cannot "omit _id in my
documents" in some scenarios because they are created from strongly typed
instances using the mapper Serialize. In theory, my Ldbc might want to
propagate these IDs back to input instances (I am not doing this now but...).
There is a bigger problem though, it looks like _id = 0 is inserted, i.e. this passes:
using (var db = new LiteDatabase(":memory:"))
{
var test = db.GetCollection("Test", BsonAutoId.Int32);
var doc = new BsonDocument() { ["_id"] = 0, ["p1"] = 1 };
test.Insert(doc); // -> NullReferenceException
doc["_id"].AsInt32.Should().Be(0); // _id not changed
var r1 = test.FindById(1);
Assert.Null(r1); //! but expected not null
var r2 = test.FindById(0);
Assert.NotNull(r2); //! but expected null
}
And another bigger problem.
Let's undo this change and create and run this new test for InsertBulk:
[Fact]
public void AutoId_Zero_Int_Bulk()
{
using (var db = new LiteDatabase(":memory:"))
{
var test = db.GetCollection("Test", BsonAutoId.Int32);
var doc = new BsonDocument() { ["_id"] = 0, ["p1"] = 1 };
test.InsertBulk(new BsonDocument[] { doc });
doc["_id"].AsInt32.Should().Be(1); // _id is updated
var r1 = test.FindById(1);
Assert.NotNull(r1); // _id = 1 is inserted
}
}
It passes without the fix and fails with the fix applied.
This means:
InsertBulk.InsertBulk (old) updates _id in input BsonDocument (so for consistency, at least, Insert should, too).It is labelled "fixed". I just want to be sure that the recent remarks are taken into account. The fix is not quite right and it also introduces a regression.
I will back into this issue before new release
@mbdavid v5.0.3 is pushed with this change not finished (I think). There may be problems. I see some in my tests in _Ldbc_.
Hi @nightroman, I needed release 5.0.3 because to track an important bug. But let's back to this issue:
About when engine should be use AutoId, we have 2 cases:
1) When you have a strong entity collection (that will be converted into BsonDocument before insert in database) you always have _id field with valid value (at lease if you are not using Nullable). In this case, engine needs remove _id in "empty" value (0 for Int32). Removing _id will use AutoId
2) When you have a BsonDocument collection you will not convert from any strong type. So, if user wants use AutoId they should leave _id missing (or with null).
So, IMO, when I insert this:
var col = db.GetCollection("col1", AutoId.Int32);
var doc = new BsonDocument() { ["_id"] = 0, ["p1"] = 1 };
col.Insert(doc)
The _id should be 0. If AutoId is used (and remove _id to auto increment), I will never can store _id = 0 in any case.
I have the case 3, a BsonDocument collection but I convert from a strong type
using BsonMapper.Serialize, then add these documents. So, documents have
existing _id values with 0 or nulls.
Before the change InsertBulk used to work like I expected. In this scenario it generated new _id,
inserted documents with new _id and updated _id in input documents. As if it were case 1.
Insert used to fail with NRE. It would be ideal if InsertBulk keeps the old behaviour
and Insert works the same way.
This worked (like expected) for InsertBulkbefore the change:
[Fact]
public void AutoId_Zero_Int_Bulk()
{
using (var db = new LiteDatabase(":memory:"))
{
var test = db.GetCollection("Test", BsonAutoId.Int32);
var doc = new BsonDocument() { ["_id"] = 0, ["p1"] = 1 };
test.InsertBulk(new BsonDocument[] { doc });
doc["_id"].AsInt32.Should().Be(1); // _id is updated
var r1 = test.FindById(1);
Assert.NotNull(r1); // _id = 1 is inserted
}
}
If you think the new behaviour is how it should be, then I should adapt to the scenario like this:
I have a BsonDocument collection (with auto id Int32, de facto) and a bunch of
documents with _id = 0. I have to remove these _id, something that LiteDB
can do easily (and probably used to do for InsertBulk). But I cannot because
I do not know if the collection is with auto id or not, this info is not public.
But in this scenario, why are you not using specific <T> to get collection? Or, in this case, can you remove _id before send to Insert?
All collection are AutoId. Each document are, or not, autoincrement if contains or not an _id.
why are you not using specific
<T>to get collection?
My context is Ldbc, the PowerShell module, helper for scripts. Long story
short, PowerShell and generics are not friends. Basically everything generic
is not usable in PowerShell. Workarounds are ugly and only advanced users
know them. That is why wrappers like Ldbc make sense.
This might be worked around if LiteDB provides a not generic method for getting
a typed collection with non-generic signature method, i.e. with a Type
parameter. There is not such a method.
Or, in this case, can you remove
_idbefore send toInsert?
I would be able to do so if LiteCollection exposes the auto id type.
I would need to write and use my own (copy/paste from LiteDB):
private bool RemoveDocId(BsonDocument doc)
{
if (_id != null && doc.TryGetValue("_id", out var id))
{
// check if exists _autoId and current id is "empty"
if ((_autoId == BsonAutoId.Int32 && (id.IsInt32 && id.AsInt32 == 0)) ||
(_autoId == BsonAutoId.ObjectId && (id.IsNull || (id.IsObjectId && id.AsObjectId == ObjectId.Empty))) ||
(_autoId == BsonAutoId.Guid && id.IsGuid && id.AsGuid == Guid.Empty) ||
(_autoId == BsonAutoId.Int64 && id.IsInt64 && id.AsInt64 == 0))
{
// in this cases, remove _id and set new value after
doc.Remove("_id");
return true;
}
}
return false;
}
But I cannot because collection does not expose its auto id type. That is,
given a collection, I do not know its _autoId in the above code.
Like I mentioned, in 5.0.2 InsertBulk worked in the expected and desired
way for untyped collections and untyped documents. I think this was right.
But Insert used to NRE. IMHO, Insert should have been corrected and
made working in the same way, but InsertBulk should not be changed.
Concrete example:
Here we get an untyped collection "Log" with auto id Int32:
and then create some class instances and send them to Add-LiteData $log.
Add-LiteData converts input objects to BsonDocument using BsonMapper.Serialize.
These documents have _id = 0. Then Add-LiteData calls insert on passed $log
with input documents. In 5.0.2 it failed due to NRE. In 5.0.3 it fails "duplicated id 0".
IMHO, this scenario should work right away:
Int32_id = 0 (the fact that the collection is untyped does not matter)_id, it has all the required information for doing thisThis looks natural. Moreover, InsertBulk already worked this way in 5.0.2.
I pushed Ldbc 0.6.3 with this workaround:
if (doc.TryGetValue("_id", out var id))
{
if (id.IsNull
|| id.IsInt32 && id.AsInt32 == 0
|| id.IsObjectId && id.AsObjectId == ObjectId.Empty
|| id.IsGuid && id.AsGuid == Guid.Empty
|| id.IsInt64 && id.AsInt64 == 0L)
{
doc.Remove("_id");
}
}
I call this before every Insert, InsertBulk, Upsert for each input
BsonDocument.
I do not think it's quite right for several reasons. In particular, I cannot
take into account collection auto id type, so I have to remove any "default"
id. And, as a result, in Ldbc, users cannot insert documents with default ids
at all (not a big deal for now, not OK in the long run).
But all in all, this workaround kind of works.
Feel free to close the issue, Insert does not NRE anymore.
Here is the old possibly related issue https://github.com/mbdavid/LiteDB/issues/1149
OP uses the scenario like mine and gets the same problem (id is not generated).