Mongoengine: Memory leak on large iterations

Created on 9 Dec 2013  Â·  2Comments  Â·  Source: MongoEngine/mongoengine

Hi!

I'm having some problems to iterate on large mongoengine querysets. Let me show it...

I have a Document like this:

class Contact(mongo.DynamicDocument):
    customer_id = mongo.IntField()
    name = mongo.StringField(max_length=100)
    email = mongo.EmailField(unique_with='customer_id')
    campaigns = mongo.ListField(mongo.DictField())
    tag_list = mongo.ListField(mongo.StringField())

I'm using objgraph lib [1] to help me on memory leak investigation, then I have the following situation:

>>> cs = Contact.objects()[0:100000]
>>> objgraph.show_most_common_types()
function                   14454
dict                       7176
tuple                      6378
list                       3832
weakref                    2622
cell                       2068
type                       1848
getset_descriptor          1612
wrapper_descriptor         1447
builtin_function_or_method 1245
>>>
>>> for c in cs:
...    print c.email


>>> objgraph.show_most_common_types()
dict           607125
list           403833
instancemethod 200464
partial        200016
ObjectId       100000
Contact        100000
weakproxy      100000
DynamicField   100000
SON            100000
BaseList       100000

The only way to clean up this instances is manually delete all references, including the document reference "Contact", and call GC.

I'm using:

  • mongoengine 0.8.6
  • python 2.7.4
  • mongoDB 2.4.6

[1] - http://mg.pov.lt/objgraph/

Most helpful comment

I think you are hitting the caching queryset which is default in 0.8.4 but
will be flipped to a non-caching one in 0.9

Can you try with: Contact.objects.no_cache()[0: 100000]

On Mon, Dec 9, 2013 at 5:16 PM, Rafael Novello [email protected]:

Hi!

I'm having some problems to iterate on large mongoengine querysets. Let me
show it...

I have a Document like this:

class Contact(mongo.DynamicDocument):
customer_id = mongo.IntField()
name = mongo.StringField(max_length=100)
email = mongo.EmailField(unique_with='customer_id')
campaigns = mongo.ListField(mongo.DictField())
tag_list = mongo.ListField(mongo.StringField())

I'm using objgraph lib [1] to help me on memory leak investigation, then I
have the following situation:

cs = Contact.objects()[0:100000]>>> objgraph.show_most_common_types()function 14454dict 7176tuple 6378list 3832weakref 2622cell 2068type 1848getset_descriptor 1612wrapper_descriptor 1447builtin_function_or_method 1245>>>>>> for c in cs:... print c.email

objgraph.show_most_common_types()dict 607125list 403833instancemethod 200464partial 200016ObjectId 100000Contact 100000weakproxy 100000DynamicField 100000SON 100000BaseList 100000

The only way to clean up this instances is manually delete all references,
including the document reference "Contact", and call GC.

I'm using:

  • mongoengine 0.8.6
  • python 2.7.4
  • mongoDB 2.4.6

[1] - http://mg.pov.lt/objgraph/

—
Reply to this email directly or view it on GitHubhttps://github.com/MongoEngine/mongoengine/issues/535
.

All 2 comments

I think you are hitting the caching queryset which is default in 0.8.4 but
will be flipped to a non-caching one in 0.9

Can you try with: Contact.objects.no_cache()[0: 100000]

On Mon, Dec 9, 2013 at 5:16 PM, Rafael Novello [email protected]:

Hi!

I'm having some problems to iterate on large mongoengine querysets. Let me
show it...

I have a Document like this:

class Contact(mongo.DynamicDocument):
customer_id = mongo.IntField()
name = mongo.StringField(max_length=100)
email = mongo.EmailField(unique_with='customer_id')
campaigns = mongo.ListField(mongo.DictField())
tag_list = mongo.ListField(mongo.StringField())

I'm using objgraph lib [1] to help me on memory leak investigation, then I
have the following situation:

cs = Contact.objects()[0:100000]>>> objgraph.show_most_common_types()function 14454dict 7176tuple 6378list 3832weakref 2622cell 2068type 1848getset_descriptor 1612wrapper_descriptor 1447builtin_function_or_method 1245>>>>>> for c in cs:... print c.email

objgraph.show_most_common_types()dict 607125list 403833instancemethod 200464partial 200016ObjectId 100000Contact 100000weakproxy 100000DynamicField 100000SON 100000BaseList 100000

The only way to clean up this instances is manually delete all references,
including the document reference "Contact", and call GC.

I'm using:

  • mongoengine 0.8.6
  • python 2.7.4
  • mongoDB 2.4.6

[1] - http://mg.pov.lt/objgraph/

—
Reply to this email directly or view it on GitHubhttps://github.com/MongoEngine/mongoengine/issues/535
.

That's right!

With no_cache I can iterate without memory problems.

Thanks for the help!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lamiskin picture lamiskin  Â·  4Comments

kevin0571 picture kevin0571  Â·  3Comments

knoxxs picture knoxxs  Â·  3Comments

kushalmitruka picture kushalmitruka  Â·  5Comments

tenspd137 picture tenspd137  Â·  4Comments