When kdbGet() has a cache hit but the supplied keyset was not empty, we need to append the cached keyset to the non-empty keyset. In cases where the appended keyset is much larger than the destination it could be beneficial to append the smaller keyset to the larger one. (less ksAppendKey operations).
Add a ksMerge() function, which returns only one resulting keyset and ksDel()etes the other keyset. This is for the cases, where the other keyset would be thrown away anyway.
Thank you for writing the proposal!
When kdbGet() has a cache hit but the supplied keyset was not empty
The same situation occurs when the command-line arguments are appended to proc/ in the global plugin done by @kodebach, see #2317
In cases where the appended keyset is much larger than the destination
For this specific situation, I think an if that swaps the arguments of ksAppend in the user code is acceptable.
I would like to avoid to have many API calls that appear to be highly optimized but actually only contain trivial (slow) operations. I would very much see optimizations for ksAppend. In its current form, it was maybe a mistake that we provide ksAppend.
I agree that improving ksAppend is more important than adding another API function.
For this specific situation, I think an if that swaps the arguments of ksAppend in the user code is acceptable.
The problem is that the semantics of ksAppend do not allow for this optimization. ksAppend has to append source to destination. Swapping the arguments would not result in an equivalent output, because we must not append destination to source.
Therefore the idea only works for a new API function, that only promises to merge two sets to one, and throws away the other set.
I would very much see optimizations for ksAppend
Two things that I think are quite simple to implement, but might make a huge difference:
ksAppendKey calls ksSearchInternal, which does a complete binary search. This can be optimized by first comparing against the last Key and immediately returning, if the new Key has to be inserted at the end. This would make ksAppendKey calls much faster, when Keys are inserted in the correct order. (Currently we do log(ksSize) Key comparisons to arrive at this result.)
ksAppend(ks1, ks2) should basically operate like this:
ks1->array and ks2->array, always appending the "smaller" (according to keyCmp) Key to the new array and advancing the corresponding iterator.ks1 with the new one.That way all ksAppendKey calls we make are in the correct order and ksSearchInternal will only do a single comparison. Doing it this way would also eliminate any concerns about argument order, because the array inside ks1 would be completely recreated, instead of inserting into the existing array. It would also keep the current semantics. The only guarantees that we make are: 1) ks2 is unchanged (true, if we restore the cursor correctly), 2) ks1 contains all the keys
@kodebach thank you for weighing in! I think your approach sounds quite nice.
I would have to add one small improvement: when merging two sorted lists it should suffice to simply always append the smaller element. In that case we need no ksSearchInternal at all. It will always be inserted at the last position.
My approach would still be calling ksAppendKey so ksSearchInternal is basically unavoidable, but of course you could do the appending directly in ksAppend. However, then you have to make that any other stuff (Opmphm, keyRef, etc) that gets updated in ksAppendKey is still updated correctly.
I like both approaches. I think we should first try @kodebach's optimization (O(1) appending in ksAppendKey). If we see that this is not enough, we can also follow @mpranj approach to swap the KeySets (so that always the smaller keyset gets appended to the larger one). I am not sure if this actually needs an API change. Maybe we can simply internally first duplicate the KeySet, and then append the small one to the large one within ksAppend; and then swap the internal array ks holds with the duplicated KeySet (holding the merged keysets).
I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions :sparkling_heart:
I closed this issue now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue.
Thank you for your contributions :sparkling_heart:
Most helpful comment
I like both approaches. I think we should first try @kodebach's optimization (O(1) appending in
ksAppendKey). If we see that this is not enough, we can also follow @mpranj approach to swap the KeySets (so that always the smaller keyset gets appended to the larger one). I am not sure if this actually needs an API change. Maybe we can simply internally first duplicate the KeySet, and then append the small one to the large one withinksAppend; and then swap the internal array ks holds with the duplicated KeySet (holding the merged keysets).