I know I can get the number of keys in a directory by listing it but I think that might take more time and more bandwidth if there are a lot of keys in a directory.
Knowing how many keys there are in a directory can allow backend services to know how much tasks they can execute in a way that load balances them.
For example if I know that I have 10 nodes and 1000 tasks I can distribute 100 tasks for each node.
I think that a fuzzy count would also be acceptable (as long as it does not return 0 when there are no nodes).
+1
/cc @heyitsanthony Shall we add another option to range in v3 that range will only return the number of keys?
We could change the (currently unused) More field to indicate the number of keys remaining.
@heyitsanthony I think the use case @thedrow mentioned is to provide a fast way to calculate total number of keys inside one prefix. Then user can allocate resources to consume these keys independently.
And yes. We can utilize More field. To make it an Int?
@heyitsanthony One concern is that the count operation is expensive. I am not sure if it is good to always return the remaining number of keys in More.
If it's slow to give a precise value, the More value can be a hint by default unless some precise flag is set.
@heyitsanthony As long as 0 is not provided if there is at least one key, that's also fine.
@thedrow Then probably a bool More is good enough for you? You can range a prefix and limit the number of keys to return. If more is true, you can do a consistent range for the next batch of keys you care about.
There's one use case where knowing that there are more keys is enough but that's not always the case.
I don't exactly recall the use case I had in mind when I opened this issue but having the ability to count the keys (even if the number is not exact) was useful.
In that sense, the counter has to be eventually consistent and getting an inexact number is still acceptable.
@heyitsanthony I plan to work on this one.
I remember now what I had in mind. When you want to load balance between different servers some algorithms like Client Side Random Load Balancing require the implementor to know the number of available nodes ahead of time.
An estimation can also work as long as the estimation is either equal or below to the real number of nodes.
Also, as mentioned above, we can use the number to evenly divide the number of jobs to each server we have in the cluster.
That's why a boolean won't be as useful as an actual number.
@thedrow Now, we provide count with exact number of keys in the range.
Most helpful comment
@thedrow Now, we provide count with exact number of keys in the range.