Hi @rozza!
Just wanted to pick your brain about the default index creation. Seems like non-background indexes are only useful at the very early stage of a project. When any of the collections grow fairly large (say above 1M documents), createIndex blocks the database for a significant amount of time.
It's a bit annoying to add {'index_background': True} in each Document's meta and sometimes you're just not able to control it that way. E.g. changing allow_inheritance will create a new non-background index (although I may be wrong whether _cls-related indexes respect meta's index_background or not).
I'd argue that { background: true } should be a default option in MongoEngine. What do you think?
Hi @wojcikstefan I'm going to ponder this one!
I like it conceptually, however I'm not sold on have auto index generation as you say it can take time and maybe undesirable in production. Maybe one to review for 0.9.
By "auto index generation" I meant the current status quo, i.e. MongoEngine ensures all of the defined indexes are actually created in the db at runtime. If you add something to meta = {'indexes': [...]}, it basically means the new index will be automatically created (and lock the whole db for a while, if the collection is fairly large).
+1
I have one more request related to indexes - there should be an easy way to globally disable automatic index creation for all documents. While it's a useful feature during development, it's dangerous for production environments... You want full control over when and what index is created (and how).
It's also better to decouple index definitions from application logic if you have a reasonably big engineering team, because it allows indexes to be managed by a separate DevOps team.
Wanted to bring this one back again as we also have just experienced pretty significant downtime due to automated foreground index creation on production. Just as @wojcikstefan said, at some point it is desirable to disable automated index creation completely and let devops control this.
Maybe add this as an option to a connection definition (Among with one for background indexes)?
Few requirements for the updated index management:
_cls is quite implicit. We should scrutinize that logic and see if there's room for more clarity (even if through better documentation).Anything else y'all might think of @lafrech @thedrow @mclate @gukoff ?
cc @jtharpla, too :)
Speaking from experience, I'd definitely prefer no automated index creation as a sane default intended for production with the option to do automated background index creation if desired. Automated foreground index creation should be clearly documented as intended for development environments only.
@wojcikstefan I agree with your suggestions. What about also removing field arguments related to indexes, so that all index info would be in one place -- _meta (not sure how to deal with primary)?
not sure how to deal with primary
I think the _id: 1 index is the the only one that can be created implicitly, i.e. we don't need to explicitly define it in meta['indexes']. That's because we can't control it in any way or stop its creation. MongoDB docs say: "MongoDB creates a unique index on the _id field during the creation of a collection. (...) You cannot drop this index on the _id field."
What about also removing field arguments related to indexes, so that all index info would be in one place --
_meta
On the one hand, I agree. The more scattered index creation is, the harder it is to grasp what indexes are defined in your code (which is particularly dangerous if indexes are auto-created).
On the other hand, people might be used to index-related arguments on fields. For example, latest Django still offers Field options like db_index and unique.
Is dropping field options like unique, unique_with, and sparse, and forcing users to instead define such indexes in the meta dict too drastic of a change? It seems to me like it would make index definitions clearer and more centralized, but lmk what you think @bagerard, @erdenezul, @thomasst, @mpessas, @tsx.
Is dropping field options like unique, unique_with, and sparse, and forcing users to instead define such indexes in the meta dict too drastic of a change? It seems to me like it would make index definitions clearer and more centralized
I agree it would make index definitions more explicit, but it is against what most people coming to mongoengine from other "ORM"s would expect. That said, as long as it is fine breaking backwards compatibility, dropping those field options sounds good to me.
The more I think about it, the more I like the idea of defining all of the indexes in meta['indexes']. This makes scanning for indexes in the code easier and follows The Zen of Python in that "There should be one-- and preferably only one --obvious way to do it."
This is similar to defining all of your indexes via Meta.indexes in Django or via the Index construct in SQLAlchemy (and forgoing index-related field/column kwargs).
That said, as long as it is fine breaking backwards compatibility, dropping those field options sounds good to me.
I don't mind introducing a breaking change if it's done alongside a major version bump. Perhaps we could get to MongoEngine v1.0.0 after this and some performance/functional/cleanup improvements...
@bagerard what do you think?
Most helpful comment
Few requirements for the updated index management:
_clsis quite implicit. We should scrutinize that logic and see if there's room for more clarity (even if through better documentation).Anything else y'all might think of @lafrech @thedrow @mclate @gukoff ?