Django-filter: Problem with chained filters and Django ORM

Created on 4 Jul 2017 · 20Comments · Source: carltongibson/django-filter

As related in the issue #537, the Django ORM generate duplicated INNER JOINS for each .filter() .

For example:
Model.objecs.filter(table1__attr1=value1).filter(table1__attr2=value2)

The built query is:

SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
INNER JOIN table1 T5 ON (model_table.id = T5.model_table_id)
WHERE (table1.attr1=value1 AND T5.attr2=value2)

But the correct query should be:

SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
WHERE (table1.attr1=value1 AND table1.attr2=value2)

The first query may return unwanted or unexpected results.

Reading the code I found the method def qs(self) in class BaseFilterSet from the file filterset.py and I realized the following workflow:

Get all the model objects using qs = self.queryset.all()
Iterates in the list of declared filters (fields)
In other words, for each field is done the same as qs = self.queryset.filter(field_n=value_n), what causes the described problem.
When the iteration finishes, the queryset (qs) is returned

The solution that I've developed for the chained filters problem is:

    def qs(self):
        kwargs = {}
        if not hasattr(self, '_qs'):
            if not self.is_bound:
                self._qs = self.queryset.all()
                return self._qs

            if not self.form.is_valid():
                if self.strict == STRICTNESS.RAISE_VALIDATION_ERROR:
                    raise forms.ValidationError(self.form.errors)
                elif self.strict == STRICTNESS.RETURN_NO_RESULTS:
                    self._qs = self.queryset.none()
                    return self._qs
                # else STRICTNESS.IGNORE...  ignoring

            # start with all the results and filter from there
            qs = self.queryset.all()
            for name, filter_ in six.iteritems(self.filters):
                value = self.form.cleaned_data.get(name)

                if value is not None and value:  # valid & clean data
                    # If all o these conditions are true, it means that
                    # field (filter_) is filtering the queryset
                    # So, it must be added to the filtering dict
                    kwargs[filter_.name] = value

            # The same as qs.filter(filter_.name1=value1, filter_.name2=value2, ...)
            self._qs = qs.filter(**kwargs)

        return self._qs

Instead of making queries for each field, all of them are stored in a dictionary and the queryset is filtered only one time using this dict.

What do you think about this solution? Would it break other parts of the code?

Source

eduardocalil

👍1

Most helpful comment

If I'm not mistaken, this specifically refers to filtering across to-many relationships, which returns different results for .filter(related__a=1).filter(related__b=2) and .filter(related__a=1, related__b=2). There are no issues for to-one relationships.

If I make a double join, is that not a no-op?

Nope - the second join is aliased as T5, which is what the second where clause applies to. This is different from the latter example, where both where clauses operate on table1. The docs topic give a good example of the effective differences between the two queries. While both are technically valid, 90% of the time the latter query is what's intended, while the former is what django-filter constructs.

why is this a DF issue?

There really isn't a trivial way to implement the desired behavior at the moment.

rpkilby on 4 Jul 2017

👍3

All 20 comments

What do you think about this solution?

Still not yet convinced this is a DF issue, rather than an ORM one — what do the Django people say about this? This behaviour is as old as the ORM itself, so it's come up before.

Would it break other parts of the code?

There's an easy way to check that. 🙂 — Open an PR and see what breaks.

carltongibson on 4 Jul 2017

what do the Django people say

Also, what do the DB people say? If I make a double join, is that not a no-op? (Maybe it isn't — but, if not, why not? And, why is this a DF issue?)

carltongibson on 4 Jul 2017

If I make a double join, is that not a no-op?

why is this a DF issue?

There really isn't a trivial way to implement the desired behavior at the moment.

rpkilby on 4 Jul 2017

👍3

Couple of side notes:

I thought about changing filters to return Q objects, however this would prevent filters from using other parts of the queryset API. Filters like OrderingFilter would not be possible.
Another solution is to use a subquery, but you need something to collect the related filters and create the nested query that they operated on. This is something I've started working on in philipn/django-rest-framework-filters#99, but progress is slow.

rpkilby on 5 Jul 2017

I think that I've finally found the solution for this issue. Inside the model QuerySet exists a method called _next_is_sticky(). It prevents the duplicated INNER JOINS. You will undestand it better after reading a very short documentation about this method in the following link:

https://kite.com/docs/python/django.db.models.query.QuerySet._next_is_sticky

Here is my Pull Request: https://github.com/carltongibson/django-filter/pull/753

eduardocalil on 1 Aug 2017

But it has not passed in django-rest tests. Can someone who understands how it works help me?

eduardocalil on 1 Aug 2017

As a quick win it would be really great to adapt this behavior until a more consistent feature was available (as a config parameter to determine if it is desired to work with stiky or not in the current filter).

Current behavior breaks the possibility to use this magnificient plugin in project where is neccessary to filter between relations as these, a real shame : (

josert on 3 Oct 2017

I agree that the _next_is_sticky() approach looks promising; however, I found the implementation in PR #753 brakes the desired behavior of ModelMultipleChoiceFilter where conjoined=True. As you might imagine, it executed an SQL query along these lines and returns the empty set.

SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
WHERE (table1.attrID=value1 AND table1.attrID=value2)

Having a solution to this to-many chaining issue would greatly simplify my REST API that uses django-filter to filter results across many-to-many relationships. Please let me know how I can lend a hand.

swkasica on 5 Oct 2017

I'm essentially -1 on making a change here.

First off, I'm not going to add anything on the basis of it being "a quick win" — that way lays trouble later.

Second, more to the point, it seems to me you should be handling this with a custom filter, custom field using a multi-widget. i.e. a single field expecting multiple values which is then used in a single filter call on the queryset.

I'd be interested in seeing what people come up with that way. No doubt there's something we can add, which doesn't involve setting state flags on the core filtering behaviour.

carltongibson on 6 Oct 2017

For what it's worth, I've been making a lot of headway on the nested filtering issue. Basically, queries will go from

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

to something like

Blog.objects.filter(
    entry=Entry.objects.filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

In addition to enabling the correct behavior for to-many relationships, this will allow us to enable request-based filtering on related queries. eg, those entries that are currently published or are in the process of being authored by the user.

def visible_entries(request):
    return Entry.objects.filter(Q(is_published=True) | Q(author=request.user))

Blog.objects.filter(
    entry=visible_entries(request).filter(
        headline__contains='Lennon',
        pub_date__year=2008,
    ),
)

That said, I'm beating my head against the wall in regards to some potential performance issues. Unsurprisingly, instantiating n FilterSets is n times as expensive.

rpkilby on 9 Oct 2017

@rpkilby Push your branch. Let's see what you've got. 🙂

I don't want anything too complex. The point of DF is to _avoid common boilerplate_. There's nothing wrong in writing the filtering logic by hand in non-core cases. I much prefer that to _magic_ no-one can debug.

carltongibson on 10 Oct 2017

Closing this as per #753

carltongibson on 17 Oct 2017

@carltongibson, I'm fairly close to pushing out a PR, but it's for the drf-filters project.

rpkilby on 18 Oct 2017

OK. Cool. Drop a note here when it’s ready. I’ll have a look.

carltongibson on 18 Oct 2017

Okay, a first pass is up (see: https://github.com/philipn/django-rest-framework-filters/pull/197).

rpkilby on 24 Oct 2017

I'm also encountering this issue. What was wrong with the dictionary approach? It seems much better than mucking around with next_is_sticky or nested querysets.

Edit: Ah, I think I understand the issue rpkilby is describing in the "couple of side notes" post. There are a lot of django ORM features like "distinct" that are locked off by sticking to a kwargs dictionary or even a series of Q objects.

jonathan-golorry on 30 Jan 2018

I'm also blocked by this issue. Including three or more queries of fields on a many-to-many relationship (e.g. /api/things?other__foo=1&other__bar=2&other__qux=3) cases my application to time out due to the repeated INNER JOIN statements.

Applying @eduardocalil's fix the the current version of django-filter is still a one-line change: https://github.com/citizenlabsgr/django-filter/commit/7e228f52d300558c2c939a2ea09cf04f307a236e

jacebrowning on 3 Nov 2019

Hey, I also encountered some issues related to filtering on the relationships. When I want to use 2 filters on a related field I always expect the filters to be applied on the same table. I wasn't able to find a real world example, where django's approach would be justified and seems like this is problematic for many users.

As @jacebrowning mentioned sometimes it generates so many JOINs that it's not even possible to run the query (e.g. for multi-word search term). This is still an issue in django admin's search when it uses the relationships. There was even an attempt to change it quite a long time ago, but unfortunately it was reverted.

Would it be possible to add this _next_is_sticky one-line fix which @jacebrowning mentioned to django-filter? It looks promising. Maybe we could check if there is a FILTERS_USE_STICKY = True option in django settings and then apply the _next_is_sticky to filters? This would preserve the current behavior, but would also provide a convenient solution for users, for whom the default behavior is not suitable. This shouldn't harm anyone. @carltongibson what do you think?

If this sounds reasonable I could prepare a PR with the adjustment + docs update and some tests.

pawelzar on 20 Nov 2019

👍1

Ok, so, the approach that would be worth looking at here is a multi-filter that wrapped up the whole multi-widget and single filter call business. That’s the solve it, but folks don’t know how to configure that themselves. It would be a nice addition.

carltongibson on 28 Nov 2019

@carltongibson It's currently pretty difficult to set up because you need to subclass MultiValueField and either subclass MultiWidget or pass it widgets through the filter.

A MultiWidget by itself doesn't give you type conversion. If I run a widgets=[forms.TextInput, forms.NumberInput] widget through a CharFilter, both values are strings.

If I create a filter with field_class = forms.MultiValueField and pass it fields and a widget, then I still run into a NotImplementedError on compress.

I think you basically need to implement all of the following:

class GroupedWidget(SuffixedMultiWidget):
    suffixes = ["my_string", "my_number"]

    def __init__(self):
        super().__init__(widgets=[forms.TextInput, forms.NumberInput])


class GroupedField(forms.MultiValueField):
    widget = GroupedWidget

    def __init__(self, *args, **kwargs):
        fields = [forms.CharField(), forms.DecimalField()]
        super().__init__(fields=fields, *args, **kwargs)

    def compress(self, data_list):
        return data_list


class GroupedFilter(filters.Filter):
    field_class = GroupedField

I've written some helpers to automatically put the nested field's widgets onto a SuffixedMultiWidget and filter on multiple related values in a single call:

class SuffixedMultiField(forms.MultiValueField):
    widget_parent = SuffixedMultiWidget

    def __init__(self, suffixes, fields, *args, **kwargs):
        if "widget" not in kwargs:
            widgets = [f.widget for f in fields]

            class ChildWidget(self.widget_parent):
                def __init__(self, *args, **kwargs):
                    super().__init__(widgets=widgets, *args, **kwargs)

            ChildWidget.suffixes = suffixes

            kwargs["widget"] = ChildWidget
        super().__init__(fields=fields, *args, **kwargs)

    def compress(self, data_list):
        return data_list


class SuffixedMultiFilter(filters.Filter):
    field_class = SuffixedMultiField

    def filter(self, qs, value):
        lookups = {}
        widget = self.field.widget
        name = self.field_name
        for suffix, v in zip(widget.suffixes, value):
            if v not in EMPTY_VALUES:
                # lookup = widget.suffixed(name, suffix)
                lookup = "__".join([name, suffix]) if suffix else name
                lookups[lookup] = v

        if not lookups:
            return qs
        if self.distinct:
            qs = qs.distinct()

        qs = self.get_method(qs)(**lookups)
        return qs

This simplifies usage to:

    my_group = SuffixedMultiFilter(
        suffixes=["my_string", "my_number"],
        fields=[forms.CharField(), forms.DecimalField()],
        method="my_method",  # optional
    )

It's not perfect, but it's a lot more usable. Note that SuffixedMultiFilter.filter by default groups the lookups into a single filter/exclude, solving the discussed issue.

I actually subclass SuffixedMultiWidget to put 2 underscores between the name and suffix so that url parameters follow the same exact naming as django filters, but I left that configurable via parent_widget. Speaking of SuffixedMultiWidget, shouldn't the empty decompress return [None]*len(self.suffixes) instead of assuming a length of 2?

I also considered writing a method that could transform a filterset into a single grouped filter. This approach is cool, but we'd need to split the filter kwarg generation logic out into a separate function on all the existing filters for it to really be worth it.

jonathan-golorry on 16 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings