device_countdevice_countI expect the device_count to be correctly updated
When adding an existing device to the role, the role's device_count stays at null.
When a new device is created and added to the role directly, the device_count is still null.
I'm not able to reproduce this using the development server alone. It smells like a caching problem.
This likely stems from the use of subqueries to count the related devices and VMs. @lampwins you've done some work around this; can you take a look?
Ran into this issue with /dcim/regions. Is disabling caching on that count query an option or would you rather invalidate the count on updates to the children?
@jeremystretch this is related to caching and the use of subqueries. Basically cacheops just doesn't support them (see 7 here)
There are a couple of ways we can work around this but want to know your thoughts?
FYI I am looking into what it will take to solve this upstream in cacheops by supporting subqueries natively.
After looking into it, implementing support for subqueries upstream appears pretty involved and over my head, so I am now investigating doing this with annotations.
Actually, the comment on subqueries "one can just break queries into simpler ones, which cache better" made me think. Usually subqueries are used to make things faster by having fewer separate queries, but with cacheops having several separately cacheable queries may make more sense.
Something to test and play with…
I tried removing the subqueries
class DeviceRoleViewSet(ModelViewSet):
queryset = DeviceRole.objects.all()
...
And adding the counts as simple properties:
class DeviceRole(ChangeLoggedModel):
...
@property
def device_count(self):
return self.devices.count()
@property
def virtualmachine_count(self):
return self.virtual_machines.count()
And that actually works well. The count queries are cached, and refreshed when necessary. The upside is that they are even available outside the API code.
@lampwins Shall I implement this for all subqueries? I'd be happy to take this issue.
The alternative is to make one query that returns a {role_id: count} mapping, and use that in the property. That would be one slightly bigger query that caches for all the devices roles, instead of cacheing each count by itself.
@steffann this is actually the way we used to do it a long time ago. I am not apposed to it, in light of a unified internal API and clear operability with caching. A downside though is it really clutters the models.
@jeremystretch's call though.
…and the best way to get a {role_id: count} mapping is to use annotations, which brings us back to @lampwins' solution :)
Tested it, works!
class DeviceRoleViewSet(ModelViewSet):
queryset = DeviceRole.objects.annotate(
device_count=Count('devices'),
virtualmachine_count=Count('virtual_machines')
)
Which makes me wonder why we didn't use a simple Count() in the first place. @jeremystretch do you remember?
For a more unified internal API we might even move those annotations to the Model's Manager so they are available everywhere. It may also help with the cacheing to use the same query everywhere.
Another possible implementation:
@property
def device_count(self):
device_count_map = dict(Device.objects
.order_by('device_role')
.values_list('device_role')
.annotate(Count('pk')))
return device_count_map.get(self.pk)
@property
def virtualmachine_count(self):
from virtualization.models import VirtualMachine
virtualmachine_count_map = dict(VirtualMachine.objects
.order_by('role')
.values_list('role')
.annotate(Count('pk')))
return virtualmachine_count_map.get(self.pk)
It looks very inefficient (and would be without cacheops) but because the query is constant it is cached efficiently.
I think it will be up to @jeremystretch to choose :)
@property
def device_count(self):
return self.devices.count()
This approach doesn't scale. Each call to device_count() on an object in the list results in a new SQL query. So if you have a list of 1000 objects, that's 1000 queries. If you have two such fields per object, that's 2000 queries.
class DeviceRoleViewSet(ModelViewSet):
queryset = DeviceRole.objects.annotate(
device_count=Count('devices'),
virtualmachine_count=Count('virtual_machines')
)
Using Count() is more efficient, however due to its use of GROUP BY it does not work when we need to count multiple relations: The reported count for both relations is the product of both. For example, if you have a DeviceRole assigned to ten devices and ten virtual machines, the count for each will be reported as 100.
We can work around this by setting distinct=True, but that absolutely destroys performance to the point where the first proposal above may actually be preferable.
That leaves us with subqueries. Subqueries work great, but as this issue points out, django-cacheops does not support invalidation for them. We actually use subqueries in many places today, so the scope of this bug is much greater than just device roles.
Blocked by Suor/django-cacheops#365
django-cacheops 5.1 has been released
Most helpful comment
FYI I am looking into what it will take to solve this upstream in cacheops by supporting subqueries natively.