Django-rest-framework: Missing/duplicate records when using `ordering` and `LimitOffsetPagination`

Created on 23 Aug 2019  路  6Comments  路  Source: encode/django-rest-framework

Checklist

  • [x] I have verified that that issue exists against the master branch of Django REST framework.
  • [x] I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • [x] This is not a usage question. (Those should be directed to the discussion group instead.)
  • [x] This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • [x] I have reduced the issue to the simplest possible case.
  • [ ] I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

  1. Create a simple model and order by date:
class Event(models.Model):
    event_date = models.DateTimeField(null=True, blank=True)

    class Meta:
        ordering = ['-event_date']
  1. Create simple serializer and viewset:
class EventSerializer(serializers.ModelSerializer):
    class Meta:
        model = Event
        fields = ('event_date',)
class EventViewSet(viewsets.ModelViewSet):
    queryset = Event.objects.all()
    serializer_class = EventSerializer



md5-16aa5c1270f75a88f4e75125966f792f



REST_FRAMEWORK = {
    ...,
    'DEFAULT_PAGINATION_CLASS': 'rest_framework.pagination.LimitOffsetPagination'
}



md5-1ec9fc53e219a86a59456a61746b2298



router = routers.DefaultRouter()
router.register(r'events', views.EventsViewSet)

urlpatterns = [
    path('', include(router.urls))
]
  1. Make paginated calls:
    /api/events/?limit=25&offset=0
    /api/events/?limit=25&offset=25
    /api/events/?limit=25&offset=50

Expected behavior

Since I have 61 records in my test database, each unique record should be displayed on one of the pages. E.g. the event with the event date should be visible on the first page.

Actual behavior

Some records are not visible on any of the pages, and some records are duplicated across pages. The total count on the response objects is 61 as expected. But some records are simply not displayed.

Setting ordering=['-event_date', 'id'] solves the problem.

NB: my event model actually subclasses an abstract model that has many more fields. I don't know if that has an actual impact but I tried to simplify the example for this issue.

Thanks.

Most helpful comment

For those also hitting this problem, you can workaround it by configuring this class in the DEFAULT_FILTER_BACKENDS configuration (instead of rest_framework.filters.OrderingFilter):

from rest_framework.filters import OrderingFilter


# Work around DRF issue #6886 by always adding the primary key as last order field.
class StableOrderingFilter(OrderingFilter):
    def get_ordering(self, request, queryset, view):
        ordering = super(StableOrderingFilter, self).get_ordering(request, queryset, view)
        pk = queryset.model._meta.pk.name

        if ordering is None:
            return ('-' + pk, )

        return list(ordering) + ['-' + pk]

All 6 comments

Hi @pierremonico - it sounds like something is causing your queryset to become randomly ordered (hence the duplicate items across pages). Looking at the code for LimitOffsetPagination, there isn't anything that modifies the queryset beyond the application fo the limit/offset, so my assumption is that issue is caused elsewhere.

https://github.com/encode/django-rest-framework/blob/ec1b14174f8717337fc4402c7cea43f647e2586e/rest_framework/pagination.py#L362

I would try logging the sql queries to see if the event query looks sensible.

Make sure your models have a default ordering field or pagination will return seemingly random ordering. This is documented more clearly for CursorPagination but applies to all pagination AFAIK.

class Something(models.Model):
    field_one = models.CharField(max_length=14)
    filed_two = models.TextField()

    class Meta:
        ordering = ['field_one']

If adding 'id' to the ordering then it likely means you have NULL event_date.
NULL values would not be considered identique by PostgresQL but might be returned in any order I think.
Unless we have a reproductible test case we can analyse, there isn't much we can do.

@rpkilby thanks for the logging tip, I'll try that.
@n2ygk thanks, but as stated in my example, I have ordering set. Adding id solves the problem (last paragraph of my description).
@xordoquy that is indeed correct and probably the cause of the problem!

I just also hit this issue, and it's not limited to NULL values. If you order by any set of fields that isn't unique for every record, it's not guaranteed that the results are always in the same order (see e.g. Django or MySQL documentation). This is particularly nasty when rows are moved across a page boundary, as this causes missing or duplicate records when moving between pages.

It's almost impossible to create a reproducible test case for this, as you need to hit exactly the right conditions for your database engine to shuffle rows around between queries. In my case the load on the server seems to influence this (probably related to whether the data remains in the cache between the queries), but there might be other factors as well.

Nevertheless, I think this is at least undesired behaviour and arguably a bug. Especially if you use OrderingFilter, there's no way to be sure that a unique set of fields. I can't (and don't want to) require all clients to always specify a unique field set, so this should be handled serverside.

Proposal to resolve this:

  • Document this properly. Currently only CursorPagination has the requirement for a unique ordering field documented, while the order schemes require it as well.
  • Add some way to add a "fallback" field(s) that is always added to order_by call (as last field), to ensure that the field set is unique. Most people will probably use the primary key for this, so maybe set that as default value as well.

For those also hitting this problem, you can workaround it by configuring this class in the DEFAULT_FILTER_BACKENDS configuration (instead of rest_framework.filters.OrderingFilter):

from rest_framework.filters import OrderingFilter


# Work around DRF issue #6886 by always adding the primary key as last order field.
class StableOrderingFilter(OrderingFilter):
    def get_ordering(self, request, queryset, view):
        ordering = super(StableOrderingFilter, self).get_ordering(request, queryset, view)
        pk = queryset.model._meta.pk.name

        if ordering is None:
            return ('-' + pk, )

        return list(ordering) + ['-' + pk]
Was this page helpful?
0 / 5 - 0 ratings