Django-rest-framework: [Serializer] Cannot use QuerySet Iterator

Created on 30 Jul 2014  路  12Comments  路  Source: encode/django-rest-framework

I am using DRF to provide both API (mostly paginated resources) but also export large data files (not paginated resources).

I am confronted to performance issues. I made many optimizations (prefetch_related, and others) and found many articles dealing with large database perfs using a queryset iterator (Model.objects.all().iterator()).

It seems that DRF don't use it, has no options for using it, and can't accept this iterator as a queryset (as it is a generator, and does not contain the same information as the queryset Object).

Is it planned, or considered to improve performance of large DB serialization ?

Most helpful comment

I'm currently doing this with djangorestframework==3.1.1:

class GeneratorListSerializer(serializers.ListSerializer):
    """
    Return data as a generator instead of a list
    """

    def to_representation(self, data):
        """
        List of object instances -> List of dicts of primitive datatypes.
        """
        # Dealing with nested relationships, data can be a Manager,
        # so, get a queryset from the Manager if needed
        # Use an iterator on the queryset to allow large querysets to be
        # exported without excessive memory usage
        if isinstance(data, models.Manager):
            iterable = data.all().iterator()
        elif isinstance(data, query.QuerySet):
            iterable = data.iterator()
        else:
            iterable = data
        # Return a generator rather than a list so that streaming responses
        # can be used
        return (self.child.to_representation(item) for item in iterable)

    @property
    def data(self):
        # Note we deliberately return the super of ListSerializer to avoid
        # instantiating a ReturnList, which would force evaluating the generator
        return super(serializers.ListSerializer, self).data


class MyModelSerializer(serializers.ModelSerializer):

    class Meta:
        model = MyModel
        list_serializer_class = GeneratorListSerializer

class MyViewSet(ModelViewSet):
    queryset = MyModel.objects.all()
    serializer_class = MyModelSerializer

In theory the real ListSerializer could also be updated to using the QuerySet.iterator() method, and/or to return a generator rather than a list. I don't know if there would be any downside to doing that, but it works for my use case, which is largely XML exports using StreamingHttpResponse

All 12 comments

and can't accept this iterator as a queryset (as it is a generator, and does not contain the same information as the queryset Object).

As I understand it, the interface for a queryset after iterator() has been called is _exactly_ the same, but the implementation details of how it's being evaluated are different - there shouldn't be any issue with using querysets that have iterator() applied, although measuring the performance implications of doing so wont necessarily be super-easy.

It seems that DRF don't use it

We'd need a more concrete example of an issue in order to investigate further (eg a failing test case)

Hi Tom,
Before I add iterator() everything works fine. My view looks like this:

class ThingsViewSet(viewsets.ModelViewSet):
    queryset = Things.objects.all()

However when I add an iterator:

class ThingsViewSet(viewsets.ModelViewSet):
    queryset = Things.objects.all().iterator()

I get the following error:

File "/home/fabrice/projects/app/testing/testing/urls.py" in <module>
  58. router.register(r"things", views.ThingsSerializerViewSet)
File "/home/fabrice/projects/app/myvenv/lib/python2.7/site-packages/rest_framework/routers.py" in register
  59.             base_name = self.get_default_base_name(viewset)
File "/home/fabrice/projects/app/myvenv/lib/python2.7/site-packages/rest_framework/routers.py" in get_default_base_name
  138.             model_cls = queryset.model

Exception Type: AttributeError at /testing/things/
Exception Value: 'generator' object has no attribute 'model'

I'm using latest or almost versions:

Django==1.7
django-filter==0.8
djangorestframework==2.4.3

Am I doing something wrong, in which case how should I proceed?
or does ArTiSTiX have a point and it is not supported yet?
While DRF is definitely great and the right tool for the job, I really can't afford to load all rows
in RAM for caching..

I'm currently doing this with djangorestframework==3.1.1:

class GeneratorListSerializer(serializers.ListSerializer):
    """
    Return data as a generator instead of a list
    """

    def to_representation(self, data):
        """
        List of object instances -> List of dicts of primitive datatypes.
        """
        # Dealing with nested relationships, data can be a Manager,
        # so, get a queryset from the Manager if needed
        # Use an iterator on the queryset to allow large querysets to be
        # exported without excessive memory usage
        if isinstance(data, models.Manager):
            iterable = data.all().iterator()
        elif isinstance(data, query.QuerySet):
            iterable = data.iterator()
        else:
            iterable = data
        # Return a generator rather than a list so that streaming responses
        # can be used
        return (self.child.to_representation(item) for item in iterable)

    @property
    def data(self):
        # Note we deliberately return the super of ListSerializer to avoid
        # instantiating a ReturnList, which would force evaluating the generator
        return super(serializers.ListSerializer, self).data


class MyModelSerializer(serializers.ModelSerializer):

    class Meta:
        model = MyModel
        list_serializer_class = GeneratorListSerializer

class MyViewSet(ModelViewSet):
    queryset = MyModel.objects.all()
    serializer_class = MyModelSerializer

In theory the real ListSerializer could also be updated to using the QuerySet.iterator() method, and/or to return a generator rather than a list. I don't know if there would be any downside to doing that, but it works for my use case, which is largely XML exports using StreamingHttpResponse

Routers will try to automatically determine the name of the model to use in the urls and view names. When that's not possible (as in this case where the .model attribute is no longer accessible) you need to use 'basename=' when registering the viewset with the router.

But by changing to_representation to use the iterator and leaving the queryset attribute on the ViewSet unchanged, shouldn't the routers work as expected?

Yup that'd work too :)

I'd be cautious about changing to use QuerySet.iterator() by default, but I can't see any downside to using a generator comprehension instead of a list one - i.e.

return (self.child.to_representation(item) for item in iterable)

instead of the current:

return [
    self.child.to_representation(item) for item in iterable
]

Assuming all the tests run, would you be interested in a PR?

Sure, yup.

Another issue is that even if we return an iterator from to_representation the problem is that data is then creating a ReturnList which forces it to be evaluated. Ideally, it would be better if ReturnList would accept any iterator rather than extending list.

@rhunwicks or @tomchristie
i'd really like to have a solution for using generators for serializer data - have you gotten any further in implementing this? i might try to pick up where you left off if not

This isn't on my list at the moment, no.

@rhunwicks want to say thank you you saved my production on 500 meg digital ocean vm

do you know destiny of your comprehension return (self.child.to_representation(item) for item in iterable) proposal?

Was this page helpful?
0 / 5 - 0 ratings