Google-cloud-node: Bigtable doesn't have a magical prefix range helper?

Created on 14 Nov 2016  路  4Comments  路  Source: googleapis/google-cloud-node

A common thing with Bigtable is the ability to scan based on a prefix, which is basically a range with start=prefix and end=someAlteration(prefix), but its tricky to do manually.

Any chance we can add a prefix filter that "just works"?


Go does this a bit trickily (see https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/bigtable/bigtable.go#L307), the gist seems to be (for those not familiar with go):

  • if the string is empty, the end range marker should just be empty string
  • otherwise, find the last character in the string that can be incremented (ie, abcde can increment position 4, abc\xff\xff can increment position 2.
  • using that pivot, take start.substr(0, n-1) and append the incremented character (start[n]+1)

I think -- in JS -- this would look something like....

var getPrefixEndRange = function(start) {
  var maxChar = String.fromCharCode(0xff);
  var position = start.length-1;

  // Walk backwards until we get to a character we can increment.
  while (start[position] == maxChar && position >= 0) position--;

  // If the position is -1, there is no reasonable end range for the prefix.
  if (position == -1) return '';

  var nextChar = String.fromCharCode(start.charCodeAt(position)+1)
  return start.substring(0, position) + nextChar;
}

Some test cases...

getPrefixEndRange('start'); // -> 'staru'
getPrefixEndRange('X' + String.fromCharCode(0xff)); // -> 'Y'
getPrefixEndRange('xoo' + String.fromCharCode(0xff)); // -> 'xop'
getPrefixEndRange('com.google.'); // -> 'com.google/'
getPrefixEndRange(String.fromCharCode(0xff)); // -> ''
getPrefixEndRange(''); // -> ''
bigtable

Most helpful comment

Gotcha, ok, I'll get started on this then :)

All 4 comments

I can dig a prefix option. I might misunderstand, but couldn't we just use a row key filter?

Using the current implementation, I believe it would look similar to this

table.getRows({
  filter: {
    key: /^start/
  }
}, function(err, rows) {});

Implementing a prefix row scan via row key regex filter is inefficient, because it will scan the entire table, but return only the rows that match. A prefix row scan is typically implemented by computing the exact (start, end) rows, so it will only scan the subset of the table that will actually be returned back to the caller.

Gotcha, ok, I'll get started on this then :)

Yes -- what @mbrukman said. We want to look at keys only, and stop when we get to "the next one". The little snippet I jotted down gets us "where to stop" (ie, the first non-matching lexicographic key prefix)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hvolschenk picture hvolschenk  路  4Comments

positlabs picture positlabs  路  3Comments

chadbr picture chadbr  路  3Comments

stephenplusplus picture stephenplusplus  路  3Comments

charly37 picture charly37  路  3Comments