Google-cloud-node: Bigtable doesn't have a magical prefix range helper?

Created on 14 Nov 2016 · 4Comments · Source: googleapis/google-cloud-node

A common thing with Bigtable is the ability to scan based on a prefix, which is basically a range with start=prefix and end=someAlteration(prefix), but its tricky to do manually.

Any chance we can add a prefix filter that "just works"?

Go does this a bit trickily (see https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/bigtable/bigtable.go#L307), the gist seems to be (for those not familiar with go):

if the string is empty, the end range marker should just be empty string
otherwise, find the last character in the string that can be incremented (ie, abcde can increment position 4, abc\xff\xff can increment position 2.
using that pivot, take start.substr(0, n-1) and append the incremented character (start[n]+1)

I think -- in JS -- this would look something like....

var getPrefixEndRange = function(start) {
  var maxChar = String.fromCharCode(0xff);
  var position = start.length-1;

  // Walk backwards until we get to a character we can increment.
  while (start[position] == maxChar && position >= 0) position--;

  // If the position is -1, there is no reasonable end range for the prefix.
  if (position == -1) return '';

  var nextChar = String.fromCharCode(start.charCodeAt(position)+1)
  return start.substring(0, position) + nextChar;
}

Some test cases...

getPrefixEndRange('start'); // -> 'staru'
getPrefixEndRange('X' + String.fromCharCode(0xff)); // -> 'Y'
getPrefixEndRange('xoo' + String.fromCharCode(0xff)); // -> 'xop'
getPrefixEndRange('com.google.'); // -> 'com.google/'
getPrefixEndRange(String.fromCharCode(0xff)); // -> ''
getPrefixEndRange(''); // -> ''

bigtable

Source

jgeewax

Most helpful comment

Gotcha, ok, I'll get started on this then :)

callmehiphop on 14 Nov 2016

🎉1 👍1

All 4 comments

I can dig a prefix option. I might misunderstand, but couldn't we just use a row key filter?

Using the current implementation, I believe it would look similar to this

table.getRows({
  filter: {
    key: /^start/
  }
}, function(err, rows) {});

callmehiphop on 14 Nov 2016

Implementing a prefix row scan via row key regex filter is inefficient, because it will scan the entire table, but return only the rows that match. A prefix row scan is typically implemented by computing the exact (start, end) rows, so it will only scan the subset of the table that will actually be returned back to the caller.

mbrukman on 14 Nov 2016

👍2

Gotcha, ok, I'll get started on this then :)

callmehiphop on 14 Nov 2016

🎉1 👍1

Yes -- what @mbrukman said. We want to look at keys only, and stop when we get to "the next one". The little snippet I jotted down gets us "where to stop" (ie, the first non-matching lexicographic key prefix)

jgeewax on 15 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings