A common thing with Bigtable is the ability to scan based on a prefix, which is basically a range with start=prefix and end=someAlteration(prefix), but its tricky to do manually.
Any chance we can add a prefix filter that "just works"?
Go does this a bit trickily (see https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/bigtable/bigtable.go#L307), the gist seems to be (for those not familiar with go):
abcde can increment position 4, abc\xff\xff can increment position 2.start.substr(0, n-1) and append the incremented character (start[n]+1)I think -- in JS -- this would look something like....
var getPrefixEndRange = function(start) {
var maxChar = String.fromCharCode(0xff);
var position = start.length-1;
// Walk backwards until we get to a character we can increment.
while (start[position] == maxChar && position >= 0) position--;
// If the position is -1, there is no reasonable end range for the prefix.
if (position == -1) return '';
var nextChar = String.fromCharCode(start.charCodeAt(position)+1)
return start.substring(0, position) + nextChar;
}
Some test cases...
getPrefixEndRange('start'); // -> 'staru'
getPrefixEndRange('X' + String.fromCharCode(0xff)); // -> 'Y'
getPrefixEndRange('xoo' + String.fromCharCode(0xff)); // -> 'xop'
getPrefixEndRange('com.google.'); // -> 'com.google/'
getPrefixEndRange(String.fromCharCode(0xff)); // -> ''
getPrefixEndRange(''); // -> ''
I can dig a prefix option. I might misunderstand, but couldn't we just use a row key filter?
Using the current implementation, I believe it would look similar to this
table.getRows({
filter: {
key: /^start/
}
}, function(err, rows) {});
Implementing a prefix row scan via row key regex filter is inefficient, because it will scan the entire table, but return only the rows that match. A prefix row scan is typically implemented by computing the exact (start, end) rows, so it will only scan the subset of the table that will actually be returned back to the caller.
Gotcha, ok, I'll get started on this then :)
Yes -- what @mbrukman said. We want to look at keys only, and stop when we get to "the next one". The little snippet I jotted down gets us "where to stop" (ie, the first non-matching lexicographic key prefix)
Most helpful comment
Gotcha, ok, I'll get started on this then :)