Django-rest-framework: Caching of Field.root causes incorrect values to be returned

Created on 20 Apr 2017  路  7Comments  路  Source: encode/django-rest-framework

Checklist

  • [x] I have verified that that issue exists against the master branch of Django REST framework.
  • [x] I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • [x] This is not a usage question. (Those should be directed to the discussion group instead.)
  • [x] This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • [x] I have reduced the issue to the simplest possible case.
  • [ ] I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

I am getting an issue due to this excessive caching without the possibility to clean the cache when a field/serializer is bound. This bug is caused by #3288.

Minimal example:

field = MyField()
tmp = field.root  # or field.context (as it uses self.root)
field.bind('name', parent)

Expected behavior

After the minimal example runs, field.context and field.root should return correct values, based on the parent where the field was just bound to. So either there should be no caching (did someone measure this and found it to be a performance bottleneck?) or the cache should be invalidated after the field is bound.

Actual behavior

After the minimal example runs, both field.context and field.root now point to the wrong thing due to caching.

Most helpful comment

A couple of ideas:

  • Disable __setattr__ on the binding dict, encouraging users to override .get_fields() instead of directly modifying .fields. I have a branch here with this change. The downsides are:

    • This is a breaking change.

    • The docs don't currently cover .get_fields(), but do demonstrate altering .fields.

  • Revert #3288 and add a test demonstrating this bug.

    • Seems reasonable - the caching is just a couple of attribute accesses. I don't think there will be a tangible performance difference.

    • Users can alter .fields safely.

All 7 comments

the cache should be invalidated after the field is bound.

That could be a solution.
However note that field.root / field.context are not supposed to be called before the fields are bound.
I'd be interested to hear more about the use case behind.

The use case is a serializer mixin that enables limiting the fields which are serialized, with support for nested serializers (e.g., so you can say fields=foo__bar__moo in query arguments and only get that field in the output).

It overrides fields and in there, it accesses self.context to get the request. The problem is with serializers which happen to use something like self.fields['foo'] = MyField() in the constructor. This causes fields to be evaluated before the serializer is bound, which then try to get self.context to check if the request is already available.

A workaround is to just discover the root and context manually, without invoking self.root and triggering an incorrect value to be cached.

The problem is with serializers which happen to use something like self.fields['foo'] = MyField() in the constructor.

It seems like the serializer should override get_fields() instead of modifying fields in the constructor. This should prevent the premature caching/binding issue.

It seems like the serializer should override get_fields() instead of modifying fields in the constructor. This should prevent the premature caching/binding issue.

Yes, I agree about this point (in my case, the serializers are outside my control as this is a general mixin).

But I still think that something should be done about correct cache invalidation as this behavior is really unexpected. The most problematic thing, API-wise, is that simply accessing an attribute at the wrong time (directly or indirectly) can cause the state to be corrupted (for later unrelated access), without any warning.

I think this should not happen and should be avoided in a good API. I believe handling cache invalidation correctly would be best, but at worst, it would be much better if, for example, accessing such an attribute at the wrong time would raise an error instead of silently corrupting state.

A couple of ideas:

  • Disable __setattr__ on the binding dict, encouraging users to override .get_fields() instead of directly modifying .fields. I have a branch here with this change. The downsides are:

    • This is a breaking change.

    • The docs don't currently cover .get_fields(), but do demonstrate altering .fields.

  • Revert #3288 and add a test demonstrating this bug.

    • Seems reasonable - the caching is just a couple of attribute accesses. I don't think there will be a tangible performance difference.

    • Users can alter .fields safely.

My feeling on this is to close this as won't fix
In my opinion it's too specific to deal with extra cache invalidation.
I need to think further about reverting #3288.
Altering the documentation means we're move the get_field on the public API which means it'll have to move through deprecation process if changed.

@xordoquy. Agreed, yup.

Was this page helpful?
0 / 5 - 0 ratings