mongoengine killed mongodb performance when used with pymongo 3.x

Created on 27 Dec 2016  路  8Comments  路  Source: MongoEngine/mongoengine

The root problem is that mongoengine switched from pymongo ensure_index to create_index method inside mongoengine ensure_indexes.
ensure_index had simple caching mechanizm but create_index doesn't have it.

This switch plus that ensure_indexes is called on each doc.save() results in that mongodb need to handle with createIndex which is quite heavy operation and in our case this results in 50% drop in performance.

ensure_index was added here https://github.com/MongoEngine/mongoengine/issues/812 because there was problem with test and IMHO it was a workaround. Better solution could be to make proxy method drop_database similar to drop_collection which would reset cls._collection on Document.

Later I will prepare pull request with such change.

High Priority Performance

Most helpful comment

Issue is quite old but I'm trying to understand the impact. Mongo index documentation mentions that

Recreating an Existing Index
If you call db.collection.createIndex() for an index that already exists, MongoDB does not recreate the index.

So unless you modify the indexes, the subsequent calls to create_indexes aren't actually re-creating the indexes. That leaves us with the overhead of a few python calls (involved in the ensure_indexes dance), the create_index operation and associated round trip to the database server which should be a few milliseconds. A quick test with/without enabling auto_create_index gives me a factor of 2 to 3 performance boost (inserting 10.000 documents on a empty collection, ~ 4 vs 13 seconds).

Long story short, it is still valuable to improve this but it looks like skipping the call to create_indexes on every .save() will only be noticeable if this gets called thousands of times

All 8 comments

e6da9c27-28e8-4f6c-4439-afc1000ddaea
This is a chart from monitoring

Thanks for a thorough report @anih! I'm looking forward to a PR. IMHO, the entire index-ensuring logic is troubling right now and - while convenient during development - it kills production systems. I touched on it briefly in the comments on https://github.com/MongoEngine/mongoengine/issues/357

https://github.com/MongoEngine/mongoengine/pull/1457 First draft of changes, unfortunately I went to far and did change to switching db's and collections, but previous approach wasn't thread safe and was quite dirty. Test should pass but I didn't write any new as I prefer to get feedback if changes are going in right direction.

Any news about this? Any workaround?

Full history of the issue: Previously, ensure_indexes could be called many times over because PyMongo maintained a local cache of the indexes with a TTL of 5 minutes (see their v2.8 docs). Then, that method was deprecated and instead you could use create_index with a cache_for param (v2.9 docs). Finally, in v3.0 the cache_for param was removed (https://jira.mongodb.org/browse/PYTHON-861). As that issue said:

The difference between ensure_index and create_index is that ensure_index consults an index "cache" before sending a create index operation to the server. This causes hard to debug race conditions when dropping and immediately re-creating an index, and provides no real benefits. To avoid these problems we're deprecating the method. Use create_index instead.

Most likely the best way to fix this issue is to implement some of the ideas mentioned in #357.

Im also facing the same issue (based on . my understanding)

Im using mongoengine==0.13.0. i have a collection that 28million documents and everytime, a new document is created mongodb is reindexing the entire data.

Based on the db.currenOp its showing this message

            "query" : {
                "createIndexes" : "keyword",
                "indexes" : [ 
                    {
                        "unique" : true,
                        "background" : false,
                        "sparse" : false,
                        "key" : {
                            "text" : 1
                        },
                        "name" : "text_1"
                    }
                ],
                "writeConcern" : {}
            },
            "msg" : "Index Build Index Build: 26834459/28427263 94%",
            "progress" : {
                "done" : 26834459,
                "total" : 28427263
            },

my model is

class Article(Document):
      text = StringField(required=True, unique=True)

# im not using any indexing in meta 

How can I avoid re-indexing the entire data on every save(). This is actually block me from reading the database. I dont want to remove unique=True though.. any thoughts !!

Issue is quite old but I'm trying to understand the impact. Mongo index documentation mentions that

Recreating an Existing Index
If you call db.collection.createIndex() for an index that already exists, MongoDB does not recreate the index.

So unless you modify the indexes, the subsequent calls to create_indexes aren't actually re-creating the indexes. That leaves us with the overhead of a few python calls (involved in the ensure_indexes dance), the create_index operation and associated round trip to the database server which should be a few milliseconds. A quick test with/without enabling auto_create_index gives me a factor of 2 to 3 performance boost (inserting 10.000 documents on a empty collection, ~ 4 vs 13 seconds).

Long story short, it is still valuable to improve this but it looks like skipping the call to create_indexes on every .save() will only be noticeable if this gets called thousands of times

Any updates on this? I'm not seeing a big performance hit on my database, but I'm seeing at leas 15ms added to every request that creates a document.

Was this page helpful?
0 / 5 - 0 ratings