Django-rest-framework: Race condition in validators

Created on 22 Jan 2018  Â·  11Comments  Â·  Source: encode/django-rest-framework

Checklist

  • [x] I have verified that that issue exists against the master branch of Django REST framework.
  • [x] I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • [x] This is not a usage question.
  • [x] This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • [x] I have reduced the issue to the simplest possible case.
  • [x] I have included a failing test as a pull request. encode/django-rest-framework#5761

Steps to reproduce

Preparation

Create these 2 models with the serializer and the view:

class Collection(Model):
    pass

class Document(Model):
    collection = models.ForeignKey(Collection, on_delete=models.CASCADE)
    uid = models.TextField()
    blob = postgres_fields.JSONField(default=dict)

    class Meta():
        unique_together = (('collection', 'uid'), )

class DocumentSerializer(serializers.ModelSerializer):
    class Meta:
        model = Document
        fields = '__all__'
        validators = [
            UniqueTogetherValidator(
                queryset=Document.objects.all(),
                fields=('collection', 'uid')
            )
        ]

class DocumentView(viewsets.ModelViewSet):
    serializer_class = DocumentSerializer

Exploit

Create two Documents (can be in the same collection) and send update requests for both documents in parallel:

# drf-race-condition.py
import sys
from datetime import datetime, timedelta

import requests

document_ids = [1, 2]  # TODO: adjust
base_url = 'http://example.org/'  # TODO: adjust

def send_request(index):
    document_update_response = requests.patch(
        url=f'{base_url}documents/{document_ids[index]}/',
        headers=headers,
        json={'blob': ''},
    )
    if document_update_response.status_code == 200:
        print(index, document_update_response.status_code)

if __name__ == '__main__':
    index = int(sys.argv[1])
    end = datetime.now() + timedelta(seconds=5)
    while end > datetime.now():
        send_request(index=index)

#!/bin/bash

python3 drf-race-condition.py 0 &
python3 drf-race-condition.py 1 &

Expected behavior

The documents are updated without any side effects.

Actual behavior

We run into 500 status codes. (ie. IntegrityError: duplicate key value violates unique constraint "document_document_collection_id_1234_uniq"; DETAIL: Key (collection_id, uid)=(1, 'foo') already exists.)

It's even worse, when the documents are in different collections. Then one of them gets the other one's uid assigned.

The problem

Serializers are instances of rest_framework.fields.Field and the following code (inside of run_validators) is responsible for running validators:
https://github.com/encode/django-rest-framework/blob/78367ba1024ec43349aba362758e826fdeafbb03/rest_framework/fields.py#L533-L538

validator.set_context() sets validator.instance:
https://github.com/encode/django-rest-framework/blob/78367ba1024ec43349aba362758e826fdeafbb03/rest_framework/validators.py#L106-L112

Then validator() uses this instance. validator is the same instance for every instance of the serializer: DocumentSerializer(instance=d).validators[0] is DocumentSerializer(instance=d2).validators[0]. If two serializers call validator.set_context(self) before any one of them calls validator(value), both use the same instance.

Changing uid

It's even worse, when the documents are in different collections. Then one of them gets the other one's uid assigned.

validator.filter_queryset() adds uid (not sure why collection did not behave as I expected) to attr, which reference the serializers validated_data. This way uid of the wrong instance ends up in serializer and serializer.save() uses it.
https://github.com/encode/django-rest-framework/blob/78367ba1024ec43349aba362758e826fdeafbb03/rest_framework/validators.py#L130-L139

Is this only theoretical?

  • The IntegrityError happened nearly 3000 times in the last 8 weeks on our production plattform. Every time a customer sent multiple PATCH requests to our plattform.
  • The IntegrityError and the changing uid happened reliably on every execution of the script against our staging plattform.

Possible fixes

  • Pass serializer/serializer_field to __call__ and drop set_context.

    Django's validators do not expect a second argument. According to drf's documentation one “can use any of Django's existing validators“.

    This can be solved by using inspect (which has differents APIs for Python 2 and 3) or by catching the TypeError (which could come from within the call). => Both solutions have their downsides.

    This would be a breaking change!

  • Clone the validator before calling set_context.

Most helpful comment

@dennypenta Here's the issue related to your case: https://github.com/encode/django-rest-framework/issues/3876

The issue was closed with this note:

If anyone is actually hitting this in production it'd be useful to know, but either way around I don't see it as likely that we'll do anything other than allowing the 5xx response by default - if you need something else, write some view code, and catch the exceptional case.

If this happens only for one of your endpoints, you can catch the IntegrityError in the serializer or view and return 4xx instead of 5xx.

If you want to solve it in a generic way, you can write a custom exception handler. Here's an example:

from django.db.utils import IntegrityError
from rest_framework.status import HTTP_400_BAD_REQUEST
from rest_framework.views import exception_handler


def my_exception_handler(exc, context):
    if isinstance(exc, IntegrityError) and \
            'duplicate key value violates unique constraint' in str(exc):
        return Response(
            'Unique constraint violated',
            status=HTTP_400_BAD_REQUEST,
        )

    return exception_handler(exc, context)

All 11 comments

Nice catch and very nice work.
Most of the time, this will be handled by the fact that field's are deepcopied, except that indeed, root's serializers don't got through it.

Sorry for the noise, but recently I thought that problem of race condition is that validation was working on while it didn't really make a query. So it looks like this, correct me pls if I will wrong.

Step 1 - retrieve the value with the unique constraint for a checking.
Step 2 - insert record - this we are thinking that we checked it.

But between step 1 and step 2 another record has been inserted.

@dennypenta your use case is different from the OP's.
Yours may happen and is hard to prevent although not impossible.

@dennypenta Here's the issue related to your case: https://github.com/encode/django-rest-framework/issues/3876

The issue was closed with this note:

If anyone is actually hitting this in production it'd be useful to know, but either way around I don't see it as likely that we'll do anything other than allowing the 5xx response by default - if you need something else, write some view code, and catch the exceptional case.

If this happens only for one of your endpoints, you can catch the IntegrityError in the serializer or view and return 4xx instead of 5xx.

If you want to solve it in a generic way, you can write a custom exception handler. Here's an example:

from django.db.utils import IntegrityError
from rest_framework.status import HTTP_400_BAD_REQUEST
from rest_framework.views import exception_handler


def my_exception_handler(exc, context):
    if isinstance(exc, IntegrityError) and \
            'duplicate key value violates unique constraint' in str(exc):
        return Response(
            'Unique constraint violated',
            status=HTTP_400_BAD_REQUEST,
        )

    return exception_handler(exc, context)

Is there anything I can do to move my PR #5762 forward?

@michael-k: Not yet. It's milestoned for 3.8, which means we'll resolve it one way or the other before then. Most likely it's fine and we'll take it as-is — it looks great at first glance — but I just need to sit down with it to review it properly.

Good job catching this issue! I guess we need some more eyes and opinions to look over the changes in PR #5762 ?

Presumably ATOMIC_REQUESTS = True would resolve this issue, right?

I've tried ATOMIC_REQUESTS = True with rest-framework 3.9.0 and Django 2.1, but it did not solve the issue.

@michael-k You are running DRF in multiple threads in the same process, right? And the bug happens when Python decides to release the GIL just between the set_context and the call?

This is a really bad bug, potentially causing silent data corruption. @michael-k's PR #6172 fixes it, and makes the code cleaner and safer in the process by removing the unneeded set_context mutability. I hope it gets merged for the next release.

Removing from the milestone, as we're tracking through the PR.

Was this page helpful?
0 / 5 - 0 ratings