As related in the issue #537, the Django ORM generate duplicated INNER JOINS for each .filter()
.
For example:
Model.objecs.filter(table1__attr1=value1).filter(table1__attr2=value2)
The built query is:
SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
INNER JOIN table1 T5 ON (model_table.id = T5.model_table_id)
WHERE (table1.attr1=value1 AND T5.attr2=value2)
But the correct query should be:
SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
WHERE (table1.attr1=value1 AND table1.attr2=value2)
The first query may return unwanted or unexpected results.
Reading the code I found the method def qs(self)
in class BaseFilterSet
from the file filterset.py and I realized the following workflow:
qs = self.queryset.all()
qs = self.queryset.filter(field_n=value_n)
, what causes the described problem.The solution that I've developed for the chained filters problem is:
def qs(self):
kwargs = {}
if not hasattr(self, '_qs'):
if not self.is_bound:
self._qs = self.queryset.all()
return self._qs
if not self.form.is_valid():
if self.strict == STRICTNESS.RAISE_VALIDATION_ERROR:
raise forms.ValidationError(self.form.errors)
elif self.strict == STRICTNESS.RETURN_NO_RESULTS:
self._qs = self.queryset.none()
return self._qs
# else STRICTNESS.IGNORE... ignoring
# start with all the results and filter from there
qs = self.queryset.all()
for name, filter_ in six.iteritems(self.filters):
value = self.form.cleaned_data.get(name)
if value is not None and value: # valid & clean data
# If all o these conditions are true, it means that
# field (filter_) is filtering the queryset
# So, it must be added to the filtering dict
kwargs[filter_.name] = value
# The same as qs.filter(filter_.name1=value1, filter_.name2=value2, ...)
self._qs = qs.filter(**kwargs)
return self._qs
Instead of making queries for each field, all of them are stored in a dictionary and the queryset is filtered only one time using this dict.
What do you think about this solution? Would it break other parts of the code?
What do you think about this solution?
Still not yet convinced this is a DF issue, rather than an ORM one — what do the Django people say about this? This behaviour is as old as the ORM itself, so it's come up before.
Would it break other parts of the code?
There's an easy way to check that. 🙂 — Open an PR and see what breaks.
what do the Django people say
Also, what do the DB people say? If I make a double join, is that not a no-op? (Maybe it isn't — but, if not, why not? And, why is this a DF issue?)
If I'm not mistaken, this specifically refers to filtering across to-many relationships, which returns different results for .filter(related__a=1).filter(related__b=2)
and .filter(related__a=1, related__b=2)
. There are no issues for to-one relationships.
If I make a double join, is that not a no-op?
Nope - the second join is aliased as T5
, which is what the second where
clause applies to. This is different from the latter example, where both where
clauses operate on table1
. The docs topic give a good example of the effective differences between the two queries. While both are technically valid, 90% of the time the latter query is what's intended, while the former is what django-filter constructs.
why is this a DF issue?
There really isn't a trivial way to implement the desired behavior at the moment.
Couple of side notes:
Q
objects, however this would prevent filters from using other parts of the queryset API. Filters like OrderingFilter
would not be possible.I think that I've finally found the solution for this issue. Inside the model QuerySet exists a method called _next_is_sticky()
. It prevents the duplicated INNER JOINS. You will undestand it better after reading a very short documentation about this method in the following link:
https://kite.com/docs/python/django.db.models.query.QuerySet._next_is_sticky
Here is my Pull Request: https://github.com/carltongibson/django-filter/pull/753
But it has not passed in django-rest tests. Can someone who understands how it works help me?
As a quick win it would be really great to adapt this behavior until a more consistent feature was available (as a config parameter to determine if it is desired to work with stiky
or not in the current filter).
Current behavior breaks the possibility to use this magnificient plugin in project where is neccessary to filter between relations as these, a real shame : (
I agree that the _next_is_sticky()
approach looks promising; however, I found the implementation in PR #753 brakes the desired behavior of ModelMultipleChoiceFilter
where conjoined=True
. As you might imagine, it executed an SQL query along these lines and returns the empty set.
SELECT *
FROM model_table
INNER JOIN table1 ON (model_table.id = table1.model_table_id)
WHERE (table1.attrID=value1 AND table1.attrID=value2)
Having a solution to this to-many chaining issue would greatly simplify my REST API that uses django-filter to filter results across many-to-many relationships. Please let me know how I can lend a hand.
I'm essentially -1 on making a change here.
First off, I'm not going to add anything on the basis of it being "a quick win" — that way lays trouble later.
Second, more to the point, it seems to me you should be handling this with a custom filter, custom field using a multi-widget. i.e. a single field expecting multiple values which is then used in a single filter
call on the queryset.
I'd be interested in seeing what people come up with that way. No doubt there's something we can add, which doesn't involve setting state flags on the core filtering behaviour.
For what it's worth, I've been making a lot of headway on the nested filtering issue. Basically, queries will go from
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
to something like
Blog.objects.filter(
entry=Entry.objects.filter(
headline__contains='Lennon',
pub_date__year=2008,
),
)
In addition to enabling the correct behavior for to-many relationships, this will allow us to enable request-based filtering on related queries. eg, those entries that are currently published or are in the process of being authored by the user.
def visible_entries(request):
return Entry.objects.filter(Q(is_published=True) | Q(author=request.user))
Blog.objects.filter(
entry=visible_entries(request).filter(
headline__contains='Lennon',
pub_date__year=2008,
),
)
That said, I'm beating my head against the wall in regards to some potential performance issues. Unsurprisingly, instantiating n FilterSet
s is n times as expensive.
@rpkilby Push your branch. Let's see what you've got. 🙂
I don't want anything too complex. The point of DF is to _avoid common boilerplate_. There's nothing wrong in writing the filtering logic by hand in non-core cases. I much prefer that to _magic_ no-one can debug.
Closing this as per #753
@carltongibson, I'm fairly close to pushing out a PR, but it's for the drf-filters project.
OK. Cool. Drop a note here when it’s ready. I’ll have a look.
Okay, a first pass is up (see: https://github.com/philipn/django-rest-framework-filters/pull/197).
I'm also encountering this issue. What was wrong with the dictionary approach? It seems much better than mucking around with next_is_sticky or nested querysets.
Edit: Ah, I think I understand the issue rpkilby is describing in the "couple of side notes" post. There are a lot of django ORM features like "distinct" that are locked off by sticking to a kwargs dictionary or even a series of Q objects.
I'm also blocked by this issue. Including three or more queries of fields on a many-to-many relationship (e.g. /api/things?other__foo=1&other__bar=2&other__qux=3
) cases my application to time out due to the repeated INNER JOIN
statements.
Applying @eduardocalil's fix the the current version of django-filter
is still a one-line change: https://github.com/citizenlabsgr/django-filter/commit/7e228f52d300558c2c939a2ea09cf04f307a236e
Hey, I also encountered some issues related to filtering on the relationships. When I want to use 2 filters on a related field I always expect the filters to be applied on the same table. I wasn't able to find a real world example, where django's approach would be justified and seems like this is problematic for many users.
As @jacebrowning mentioned sometimes it generates so many JOINs that it's not even possible to run the query (e.g. for multi-word search term). This is still an issue in django admin's search when it uses the relationships. There was even an attempt to change it quite a long time ago, but unfortunately it was reverted.
Would it be possible to add this _next_is_sticky
one-line fix which @jacebrowning mentioned to django-filter
? It looks promising. Maybe we could check if there is a FILTERS_USE_STICKY = True
option in django settings and then apply the _next_is_sticky
to filters? This would preserve the current behavior, but would also provide a convenient solution for users, for whom the default behavior is not suitable. This shouldn't harm anyone. @carltongibson what do you think?
If this sounds reasonable I could prepare a PR with the adjustment + docs update and some tests.
Ok, so, the approach that would be worth looking at here is a multi-filter that wrapped up the whole multi-widget and single filter call business. That’s the solve it, but folks don’t know how to configure that themselves. It would be a nice addition.
@carltongibson It's currently pretty difficult to set up because you need to subclass MultiValueField
and either subclass MultiWidget
or pass it widgets through the filter.
A MultiWidget
by itself doesn't give you type conversion. If I run a widgets=[forms.TextInput, forms.NumberInput]
widget through a CharFilter
, both values are strings.
If I create a filter with field_class = forms.MultiValueField
and pass it fields and a widget, then I still run into a NotImplementedError
on compress
.
I think you basically need to implement all of the following:
class GroupedWidget(SuffixedMultiWidget):
suffixes = ["my_string", "my_number"]
def __init__(self):
super().__init__(widgets=[forms.TextInput, forms.NumberInput])
class GroupedField(forms.MultiValueField):
widget = GroupedWidget
def __init__(self, *args, **kwargs):
fields = [forms.CharField(), forms.DecimalField()]
super().__init__(fields=fields, *args, **kwargs)
def compress(self, data_list):
return data_list
class GroupedFilter(filters.Filter):
field_class = GroupedField
I've written some helpers to automatically put the nested field's widgets onto a SuffixedMultiWidget
and filter on multiple related values in a single call:
class SuffixedMultiField(forms.MultiValueField):
widget_parent = SuffixedMultiWidget
def __init__(self, suffixes, fields, *args, **kwargs):
if "widget" not in kwargs:
widgets = [f.widget for f in fields]
class ChildWidget(self.widget_parent):
def __init__(self, *args, **kwargs):
super().__init__(widgets=widgets, *args, **kwargs)
ChildWidget.suffixes = suffixes
kwargs["widget"] = ChildWidget
super().__init__(fields=fields, *args, **kwargs)
def compress(self, data_list):
return data_list
class SuffixedMultiFilter(filters.Filter):
field_class = SuffixedMultiField
def filter(self, qs, value):
lookups = {}
widget = self.field.widget
name = self.field_name
for suffix, v in zip(widget.suffixes, value):
if v not in EMPTY_VALUES:
# lookup = widget.suffixed(name, suffix)
lookup = "__".join([name, suffix]) if suffix else name
lookups[lookup] = v
if not lookups:
return qs
if self.distinct:
qs = qs.distinct()
qs = self.get_method(qs)(**lookups)
return qs
This simplifies usage to:
my_group = SuffixedMultiFilter(
suffixes=["my_string", "my_number"],
fields=[forms.CharField(), forms.DecimalField()],
method="my_method", # optional
)
It's not perfect, but it's a lot more usable. Note that SuffixedMultiFilter.filter
by default groups the lookups into a single filter/exclude, solving the discussed issue.
I actually subclass SuffixedMultiWidget
to put 2 underscores between the name and suffix so that url parameters follow the same exact naming as django filters, but I left that configurable via parent_widget
. Speaking of SuffixedMultiWidget
, shouldn't the empty decompress return [None]*len(self.suffixes)
instead of assuming a length of 2?
I also considered writing a method that could transform a filterset into a single grouped filter. This approach is cool, but we'd need to split the filter kwarg generation logic out into a separate function on all the existing filters for it to really be worth it.
Most helpful comment
If I'm not mistaken, this specifically refers to filtering across to-many relationships, which returns different results for
.filter(related__a=1).filter(related__b=2)
and.filter(related__a=1, related__b=2)
. There are no issues for to-one relationships.Nope - the second join is aliased as
T5
, which is what the secondwhere
clause applies to. This is different from the latter example, where bothwhere
clauses operate ontable1
. The docs topic give a good example of the effective differences between the two queries. While both are technically valid, 90% of the time the latter query is what's intended, while the former is what django-filter constructs.There really isn't a trivial way to implement the desired behavior at the moment.