I haven't fully thought this through and I'm just logging an issue @JohnAD brought up in https://github.com/MongoEngine/mongoengine/issues/1106#issuecomment-269027633. We should think through all the use cases and discuss what's the best approach.
Right now, you can store None in any non-required field, e.g.
In [46]: class EmbeddedDoc(EmbeddedDocument):
text = StringField()
....:
In [47]: class Doc(Document):
name = StringField()
num = IntField()
embedded = EmbeddedDocumentField(EmbeddedDoc)
....:
In [48]: Doc.objects.create(name=None, num=None, embedded=None)
Out[48]: <Doc: Doc object>
Of course, that capability goes away once you define a field as required:
In [50]: Doc.objects.create(name=None, num=None, embedded=None)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-50-eb46633914b2> in <module>()
----> 1 Doc.objects.create(name=None, num=None, embedded=None)
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/queryset/base.pyc in create(self, **kwargs)
286 .. versionadded:: 0.4
287 """
--> 288 return self._document(**kwargs).save()
289
290 def first(self):
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/document.pyc in save(self, force_insert, validate, clean, write_concern, cascade, cascade_kwargs, _refs, save_condition, signal_kwargs, **kwargs)
318
319 if validate:
--> 320 self.validate(clean=clean)
321
322 if write_concern is None:
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/document.py in validate(self, clean)
400 pk = self._instance.pk
401 message = 'ValidationError (%s:%s) ' % (self._class_name, pk)
--> 402 raise ValidationError(message, errors=errors)
403
404 def to_json(self, *args, **kwargs):
ValidationError: ValidationError (Doc:None) (Field is required: ['num', 'name', 'embedded'])
Now, I can imagine that we'd expect the same behavior from particular items within the ListField, namely that lists of non-required strings/ints/embedded docs would allow Nones in the list. That is not the case right now:
In [55]: class Doc(Document):
names = ListField(StringField())
nums = ListField(IntField())
embedded_docs = ListField(EmbeddedDocumentField(EmbeddedDoc))
....:
In [56]: Doc.objects.create(names=[None], nums=[1], embedded_docs=[EmbeddedDoc(text='aaa')])
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-56-c349daa935c2> in <module>()
----> 1 Doc.objects.create(names=[None], nums=[1], embedded_docs=[EmbeddedDoc(text='aaa')])
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/queryset/base.pyc in create(self, **kwargs)
286 .. versionadded:: 0.4
287 """
--> 288 return self._document(**kwargs).save()
289
290 def first(self):
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/document.pyc in save(self, force_insert, validate, clean, write_concern, cascade, cascade_kwargs, _refs, save_condition, signal_kwargs, **kwargs)
318
319 if validate:
--> 320 self.validate(clean=clean)
321
322 if write_concern is None:
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/document.py in validate(self, clean)
400 pk = self._instance.pk
401 message = 'ValidationError (%s:%s) ' % (self._class_name, pk)
--> 402 raise ValidationError(message, errors=errors)
403
404 def to_json(self, *args, **kwargs):
ValidationError: ValidationError (Doc:None) (StringField only accepts string values: ['names'])
In [57]: Doc.objects.create(names=['name'], nums=[None], embedded_docs=[EmbeddedDoc(text='aaa')])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-57-d6ac608e67d9> in <module>()
----> 1 Doc.objects.create(names=['name'], nums=[None], embedded_docs=[EmbeddedDoc(text='aaa')])
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/queryset/base.pyc in create(self, **kwargs)
286 .. versionadded:: 0.4
287 """
--> 288 return self._document(**kwargs).save()
289
290 def first(self):
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/document.py in __init__(self, *args, **values)
113 field = self._fields.get(key)
114 if field and not isinstance(field, FileField):
--> 115 value = field.to_python(value)
116 setattr(self, key, value)
117 else:
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/fields.pyc in to_python(self, value)
304 self.field._auto_dereference = self._auto_dereference
305 value_dict = {key: self.field.to_python(item)
--> 306 for key, item in value.items()}
307 else:
308 Document = _import_class('Document')
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/fields.pyc in <dictcomp>((key, item))
304 self.field._auto_dereference = self._auto_dereference
305 value_dict = {key: self.field.to_python(item)
--> 306 for key, item in value.items()}
307 else:
308 Document = _import_class('Document')
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/fields.py in to_python(self, value)
179 def to_python(self, value):
180 try:
--> 181 value = int(value)
182 except ValueError:
183 pass
TypeError: int() argument must be a string or a number, not 'NoneType'
In [58]: Doc.objects.create(names=['name'], nums=[1], embedded_docs=[None])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-58-e79991fbb4fe> in <module>()
----> 1 Doc.objects.create(names=['name'], nums=[1], embedded_docs=[None])
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/queryset/base.pyc in create(self, **kwargs)
286 .. versionadded:: 0.4
287 """
--> 288 return self._document(**kwargs).save()
289
290 def first(self):
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/document.py in __init__(self, *args, **values)
113 field = self._fields.get(key)
114 if field and not isinstance(field, FileField):
--> 115 value = field.to_python(value)
116 setattr(self, key, value)
117 else:
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/fields.pyc in to_python(self, value)
304 self.field._auto_dereference = self._auto_dereference
305 value_dict = {key: self.field.to_python(item)
--> 306 for key, item in value.items()}
307 else:
308 Document = _import_class('Document')
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/fields.pyc in <dictcomp>((key, item))
304 self.field._auto_dereference = self._auto_dereference
305 value_dict = {key: self.field.to_python(item)
--> 306 for key, item in value.items()}
307 else:
308 Document = _import_class('Document')
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/fields.py in to_python(self, value)
544 def to_python(self, value):
545 if not isinstance(value, self.document_type):
--> 546 return self.document_type._from_son(value, _auto_dereference=self._auto_dereference)
547 return value
548
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/document.py in _from_son(cls, son, _auto_dereference, only_fields, created)
681 # Get the class name from the document, falling back to the given
682 # class if unavailable
--> 683 class_name = son.get('_cls', cls._class_name)
684
685 # Convert SON to a dict, making sure each key is a string
AttributeError: 'NoneType' object has no attribute 'get'
As you can see, embedded document is completely broken (trying to call _from_son w/ None), but even the non-required strings and ints don't validate None properly...
Whatever we do here, we should consider performance implications, backward compatibility, etc. before making the final decision.
I have been thinking about this for a while. Please forgive my wordiness:
In addition to handling None, we also need to deal with invalid data. For example, if a document called d in the Doc container was:
{
"_id": {"$oid": "586329e585f4820be722fe25"},
"names": ["a", 0, "c"],
"nums": [1, 2, 3]
}
We can see that any iterative string operations on the d.names attribute would generate errors on the second entry (0).
So, the bigger question could be: how should invalid data be handled?
I see 6 possible options:
Perhaps we could have an option to ignore any invalid data, including None. So, if the ListField was declared with ignoreInvalid=True:
class Doc(me.Document):
names = me.ListField(me.StringField(), ignoreInvalid=True)
nums = me.ListField(me.IntField())
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc)
Then the document would be 'seen' as:
{
"_id": {"$oid": "586329e585f4820be722fe25"},
"names": ["a", "c"],
"nums": [1, 2, 3]
}
Any null entry would be, of course, simply another form of invalid data.
When read from MongoDB, the names field would be pre-marked as changed so that a subsequent d.save() would change d.names to a corrected list.
In fact, I suggest we make ignoreInvalid=True the default, since I suspect that is what most users would select in most circumstances.
Sometimes the ordering and placement of each item is critical in a list. In that case, removing an invalid element would cause damage. In this example, if "c" must be the third item in the list, then something must be placed into the second element.
We could support a simple 'defaultForInvalidEntry' option in the ListField. So, for example:
class Doc(me.Document):
names = me.ListField(me.StringField(), defaultForInvalidEntry="invalid")
nums = me.ListField(me.IntField())
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc)
The document would be 'seen' as:
{
"_id": {"$oid": "586329e585f4820be722fe25"},
"names": ["a", "invalid", "c"],
"nums": [1, 2, 3]
}
This would be very similar to the 'default' option of other fields.
Again, null is simply another form of invalid data.
For EmbeddedDocs, the default could be a dict. Of course, if that dict has lists and those lists have dicts....
This option would be consistent with other fields. But... there is large part of me that is uncomfortable with this technique. It screams "duct-tape for error handling". I'm not a fan of this option; but that is just my opinion.
Another way to have an invalid field handled is by calling a function in the class for it's handling. This allows for a great deal of flexibility.
For example, having an 'onInvalidField' option:
class Doc(me.Document):
names = me.ListField(me.StringField(), onInvalidField=badNameHandler)
nums = me.ListField(me.IntField())
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc)
def badNameHandler(self, item):
if item is None:
return ""
elif str(item)=="-1":
return "xyz"
else:
return str("error, was: {}".format(repr(item)))
In this example, I've written a function that handles invalid data such as null/None or anything else. Since it is a handler for a bad string, it returns a string.
In this scenario, the programmer has full control over the error handling. Handling a None or invalid embedded document could look this:
class Doc(me.Document):
names = me.ListField(me.StringField())
nums = me.ListField(me.IntField())
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc, onInvalidField=badEDHandler)
def badEDHandler(self, item):
ed = EmbeddedDoc()
ed.text = ""
try:
ed.text = str(item["text"])
except Exception:
pass
return ed
For this option, a new meta field allows the document to be ignored as non-existent if it fails to load the ListField.
class Doc(me.Document):
names = me.ListField(me.StringField())
nums = me.ListField(me.IntField())
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc)
meta = {ignoreInvalidDocs=True}
I would probably rarely ever use this option myself. But, it is much faster than the other options.
And, IMO, it certainly should not be a default since it only 'seems' like the document does not exist. In practice, that could cause invalid document counts and scenarios that cause 'upsert'-style loops.
Rather than have both 'defaultForInvalidEntry' and 'onInvalidField', perhaps we could have a combined field called something else such as 'handleInvalidEntry'. If set to a value, it is used as the default. If set to a function pointer, it calls that function.
class Doc(me.Document):
names = me.ListField(me.StringField(), handleInvalidEntry=badName)
nums = me.ListField(me.IntField(), handleInvalidEntry=0)
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc)
def badName(self, item):
return str(item)[:1]
So, if a name entry is invalid, an attempt is made to return the first character of a str version of it, (N on a null None). If a nums entry is invalid, it is set to integer 0.
Simply:
Allow null fields to be imported as None. I believe this is how MongoEngine 0.11.0 behaved.
Allow other invalid fields to be read and automatically interpreted into other field types; hope the programmer corrects them prior to saving, otherwise an error is generated.
This is, I believe, the behavior of MongoEngine 0.11.0.
This option does not solve lists of embedded documents problem. But it 'solves' things for the other fields.
And, since this as behavior I've seen in the past, it is certainly the easier way out.
THOUGHTS SO FAR
So, those are my thoughts. Personally, I see quite a bit of value to options 1 and 5.
Option 6 could also work, but I'm not in favor of it. I realize that options 1 and 5 will have performance impacts, IMO such impacts are expected. If I was strongly oriented to performance, I'd simply be using PyMongo. But I want the oversight to prevent me, the programmer, from making and distributing data errors and I'm willing to live with performance hits.
Esp. if that impact is of a O(n) scale. If of O(n^2) impact, that might be a bigger issue. I'm not expert like you all on the internals of this library, so let me know if these options would scale badly.
Adding a bit to the handleInvalidEntry option. We could also pass the index in the array/list, was well as the item. So:
def badNameHandler(self, item, index):
...
BTW, if setup as an instance function, as seen above, it would have access to the other fields. And while that sounds great, that might introduce complications since the document is not fully parsed yet. Instead, we might make it strictly a class method:
@classmethod
def badNameHandler(item, index):
...
Reviewing the BaseField __init__ parameters, perhaps a better name for handleInvalidEntry is simply entry_default or invalid_entry_default and document that it can be a callable. The parameter would only apply to ComplexBaseField class; so it might be handled as an addition to __init__ for that class.
def __init__(self, *args, **kwargs):
super(ComplexBaseField, self).__init__(*args, **kwargs)
self.invalid_entry_default = kwargs.get("invalid_entry_default", None)
if self.invalid_entry_default is None:
self.ignore_invalid = kwargs.get("ignore_invalid", True)
else:
self.ignore_invalid = False
I should not be writing code during a discussion. Sorry 'bout that. :)
It has been 9 days since last comment. Okay, admittedly, most of them have been by me. :)
Would everyone be okay I attempted the following?:
Add an invalid_entry_default=None parameter to ComplexBaseField so that it applies to:
The effect of this option, if set, is that if a list item (or key/value) is pulled into the MongoDB document is not valid, the 'invalid_entry_default' is used as a replacement. Specifically:
if invalid_entry_default is a literal value or field instance, then that value/field is used as a replacement.
if invalid_entry_default is a callable, then that function is called to get the replacement value/field. A check is made to ensure that the function is _not_ an instance function. The parameters passed are (index, item) if a list and (key, value) if a dict.
if invalid_entry_default is None then the entry (or key/value) is skipped. This is the default behavior.
If the invalid_entry_default's behavior is invoked during the reading of a document, then the field is marked as changed. Otherwise, this parameter only effects the reading of the document.
In some ways, this parameter helps ensure the Robustness Principle .
Example:
class Doc(me.Document):
# default any None, strings, or another other kind of invalid number, to 0:
nums = me.ListField(me.IntField(), invalid_entry_default=0)
# default to a hand-written handler for bad strings:
names = me.ListField(me.StringField(), invalid_entry_default=badName)
# default to skipping bad names:
more_names = me.ListField(me.StringField(), invalid_entry_default=None)
# default to a field instance:
embedded_docs = me.EmbeddedDocumentListField(EmbeddedDoc, invalid_entry_default=EmbeddedDoc(text="hello"))
@classmethod
def badName(index, item):
return "bad name: {} at index {}".format(str(item), index)"
To finish all this:
And possibly, if there is time:
(BTW, I'm happy to support DictFields, but really recommend against using them. MapFields seem okay.)
Hi @JohnAD sorry to be stalling on this one, have gotten a bit swamped. I'll respond within the next few days though!
Hm, let me push back a bit on this one ;)
What you're essentially proposing is validation upon read. That could have disastrous effects on performance and IMHO, we're already doing way too much processing of the data coming in.
If you (meaning the user of this package) truly need to validate data after fetching it, you can always call the validate() method. If you need something more custom then that, I'd much rather have you write your case-specific logic than try to cover some of the possible scenarios in the library itself. I think the cost of such an approach in terms of complexity and future maintenance would be huge.
Finally, if we'd want to do that for lists and dicts, why wouldn't we do it for regular fields? For example, consider this:
In [15]: class Doc(Document):
...: txt = StringField()
...:
In [16]: Doc.drop_collection()
In [17]: Doc._get_collection().insert({ 'txt': 1 }) # invalid type - int instead of str
Out[17]: ObjectId('58781640c6e47b544835677b')
In [18]: doc = Doc.objects.first()
In [19]: doc.txt
Out[19]: 1
In [20]: type(doc.txt)
Out[20]: int
In [21]: doc.validate() # raises a validation error
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-21-35785ada099a> in <module>()
----> 1 doc.validate() # raises a validation error
/Users/wojcikstefan/Repos/temp/mongoengine/mongoengine/base/document.pyc in validate(self, clean)
400 pk = self._instance.pk
401 message = 'ValidationError (%s:%s) ' % (self._class_name, pk)
--> 402 raise ValidationError(message, errors=errors)
403
404 def to_json(self, *args, **kwargs):
ValidationError: ValidationError (Doc:58781640c6e47b544835677b) (StringField only accepts string values: ['txt'])
I understand the temptation to provide (somewhat) flexible validation upon reading, but I'd much rather have MongoEngine stick to the validate-upon-save (or on demand via validate()) approach.
And to bring it back to the original issue from the description, StringField() allows None or an instance of basestring as a valid value. It does not allow e.g. an int. Hence I believe ListField(StringField()) should allow a mix of Nones and basestrings (again, validated upon save).
@wojcikstefan I should have read your response prior to creating issue #1468 . Sorry about that. Essentially it is talking about this very thing on other fields.
I agree that validation does create considerable overhead, especially when pulling in a large list of documents and each document might, in turn, have a large group of lists (and lists within lists.)
As such, I can totally see wanting it to be strictly optional.
In a large production system involving multiple software agents (not all of them in python), it is, IMO, generally assumed that bad data (in documents) is common. That is never desired, of course, but handling bad data should be both resilient and as self-evident as possible. For me personally, having validation on reading would be very valuable as I only do it in a haphazard fashion now and do regret it sometimes.
I'd like to brainstorm some ideas:
Keep the read-validation optional on a per-field or per-document basis. If the parameter is never invoked, then it is never used. Put in the documentation that such validation has a high cost.
filter method:# without filter
x = SomeDoc.objects.first()
# with filter:
x = SomeDoc.objects.filter().first()
Essentially, create filteredmongoengine which uses mongoengine as a library.
.validate() raises error and does mention the failed condition and field name in the error message. But turning that around to invoking additional response code seems difficult. I might be missing something though. Perhaps a variant of that method make such response easier.
Just thinking out loud. Looking this over, perhaps "idea 2" has a certain appeal.
As it happens, when I started using mongoengine, I was bit confused by 'ListField' and it's naming.
In fact, I was also confused by some of MongoDB/JSON naming conventions. Technically, an array is an ordered sequence of homogeneous items. A list is an ordered sequence of items of any kind. In other words, a list can mix strings, numbers, nulls, booleans, whatever. Whereas an array forbids this. An array must have all elements be of the same type.
So, an array is always also a list. But the opposite is not always true.
Both Python and C/C++ get this correct.
But unfortunately, JSON (and thus MongoDB) get it wrong. They allow different types in what they call an array. But by doing so, it clearly violates the CompSci meaning of the word array. It's really a list.
So, when I saw the ListField I thought: "Oh, mongoengine has properly named this so-called-'array' as a list". So mongoengine will allow both StringFields and IntFields into the ListField. Except...no actually. mongoengine is requiring all of the elements of the list to be the same. So, it is treating it as an array. but calling it a list. I've been wondering why it wasn't being called an ArrayField to match it's array-like behavior.
TL:DR:
C/C++ has arrays that are arrays
Python has lists that are lists
JSON has arrays that are really lists
MongoEngine has lists that are really arrays
Way off-topic I suppose. I'm not suggesting we change anything.
(Also, I agree that ListField should forbid mixing types. It should be behaving as an array. All is good.)
@wojcikstefan wrote:
And to bring it back to the original issue from the description, StringField() allows None or an instance of basestring as a valid value. It does not allow e.g. an int.
We might want start a separate issue for this. But here is the challenge regarding None, IMO:
We could have two different Doc documents (using your code from earlier):
{
"_id": {
"$oid": "5877f4e29a9cb569cc6b4e49"
},
"txt": null
}
and
{
"_id": {
"$oid": "5877f4e29a9cb569cc6b4e49"
}
}
Generally speaking, we are using None to mean not there. But "not there" and null are really different things. In a standalone StringField it might be rare that we really need to see the difference, but I strongly suspect that is not the case in a ListField. This is somewhat akin to comparing "" and None in python. Most of the time one can safely ignore the difference, but when it is important, one can write:
if myvar is None:
something()
elif myvar=="":
somethingelse()
rather than the generic truthiness check:
if myvar:
I've not found a way in mongoengine to handle this. If we were to treat null as None, then that implies that the entry was never there; even though it is. This further complicates the picture.
Ideally, IMO, it would be good to have some equivalent of:
if doc.txt is None:
somethingmissing()
elif doc.txt=="":
somethingempty()
elif doc.txt==mongoengine.null:
somethingnull()
Or for a list of strings:
{
"_id": {
"$oid": "5877f4e29a9cb569cc6b4e49"
},
"strings": [
"a",
null,
"c",
]
}
would become:
["a", null, "b"]
But I suspect handling that is a very non-trivial change.
For a bug fix in 0.11.x, I recommend we insert None entries into lists. Since it does not allow null entries currently, we don't need to worry about legacy code. We could then add a better or different solution in later versions.
I do apologize for writing so much...I can get wordy sometimes.
Inspired by the idea of a separate refine filter function, I might write a quick proof-of-concept branch this weekend. That way any validation on reading is done explicity, separately, and with knowledge of its cost.
import mongoengine as me
class Post(me.Document):
title = me.StringField(max_length=120, required=True, correction=str)
tags = me.ListField(me.StringField(max_length=30), correction=me.skip)
author = me.StringField(required=True, correction="unknown", validation=my_routine)
post_list = Post.objects(tags='mongodb').refine
other_list = Post.objects(tags='other').refine(limit_check=['title'])
Generally speaking, we are using
Noneto mean not there. But "not there" andnullare really different things.
Generally speaking, you're right. However, that's not the case in MongoDB. As you can see in the following snippet, both "not there" and null are returned when you're looking for null:
> db.test.insert({ whatever: 1 })
> db.test.find({ aaa: null })
{
"_id": ObjectId("587c717499a6938456f321ae"),
"whatever": 1
}
Fetched 1 record(s) in 1ms
> db.test.insert({ aaa: null })
> db.test.find({ aaa: null })
{
"_id": ObjectId("587c717499a6938456f321ae"),
"whatever": 1
}
{
"_id": ObjectId("587c71f399a6938456f321af"),
"aaa": null
}
Fetched 2 record(s) in 3ms
Hi again @JohnAD, sorry for a delay! I'm afraid I'm still very skeptical of this built-in validate-on-read approach... There's a thousand different ways one might want to "refine" corrupted or outdated data depending on a use case, and exposing a few of the possible options to deal with it in the schema seems like a complex way to kinda solve some of the potential problems... And even if something is "completely optional", once it's part of the core library, the cost of supporting it in the future is IMHO insurmountable.
What you can do now:
@wojcikstefan, when writing the early code for refine, I discovered that pyMongo does the same. So, your pointing out that MongoDB itself treats null and missing as identical makes even more sense. So, officially, None is both null and an indication of missing.
That being case, allowing the ListField to contain either the Field type or None makes a lot of sense. I'll see what it takes to make that work.
As to validate-on-read, I'll assume for now that my needs are odd and I'll write a private library. Something possibly heritable akin to:
from mongoengine import Document, StringField, ListField
from mongoenginerefine import Refine, Skip
class Post(Document, Refine):
title = StringField(max_length=120, required=True, correction=str)
tags = ListField(StringField(max_length=30), allow_null=False, correction=Skip)
author = StringField(required=True, correction="unknown", validation=my_routine)
post_list = Post.objects(tags='mongodb').refine
other_list = Post.objects(tags='other').refine(limit_check=['title'])
Modified code for getting both EmbeddedDocumentField and EmbeddedDocumentListField to allow None. It looks like this will have little impact on performance.
Looks pretty straightforward. I'll get the remaining fields fixed and add test cases. I'm guessing it will be about a week before I'm ready with a pull request.
As a bonus, the required kwarg actually means something now for EmbeddedDocumentField.
Awesome @JohnAD! Thank you very much for contributing so much! Could you open a work-in-progress PR so that it's easy to track the development and potentially offer help while working out some kinks?
If you use an empty Object, then MongoEngine 0.13.0 will fill in the object with default values defined in the EmbeddedDocument. This could be a workaround until the pull request #1470 is accepted.
Hi all, I think I've just hit up against this in v0.22.1. It seems like the same issue as here but while there are a few other tickets discussing similar problems I can't see if there has been a resolution?
mylist = ListField(StringField(null=True))
Setting mydoc.mylist = [None, 'hello', None] gives the error StringField only accepts string values.
Am I doing something wrong here or is there a workaround to get this working? Obviously I could store '' and convert when loading, but thought there might be a neater solution?
Most helpful comment
As it happens, when I started using mongoengine, I was bit confused by 'ListField' and it's naming.
In fact, I was also confused by some of MongoDB/JSON naming conventions. Technically, an array is an ordered sequence of homogeneous items. A list is an ordered sequence of items of any kind. In other words, a list can mix strings, numbers, nulls, booleans, whatever. Whereas an array forbids this. An array must have all elements be of the same type.
So, an array is always also a list. But the opposite is not always true.
Both Python and C/C++ get this correct.
But unfortunately, JSON (and thus MongoDB) get it wrong. They allow different types in what they call an array. But by doing so, it clearly violates the CompSci meaning of the word array. It's really a list.
So, when I saw the ListField I thought: "Oh, mongoengine has properly named this so-called-'array' as a list". So mongoengine will allow both StringFields and IntFields into the ListField. Except...no actually. mongoengine is requiring all of the elements of the list to be the same. So, it is treating it as an array. but calling it a list. I've been wondering why it wasn't being called an ArrayField to match it's array-like behavior.
TL:DR:
C/C++ has arrays that are arrays
Python has lists that are lists
JSON has arrays that are really lists
MongoEngine has lists that are really arrays
Way off-topic I suppose. I'm not suggesting we change anything.
(Also, I agree that ListField should forbid mixing types. It should be behaving as an array. All is good.)