Google-cloud-go: bigtable: support the use of '\C' to match binary characters in column_qualifier_regex in the emulator

Created on 25 Apr 2019  路  11Comments  路  Source: googleapis/google-cloud-go

Hi guys, In my project we use '\C' to match binary characters in the column_qualifier_regex filter. In documentation is given info that it is not supported within the Bigtable Emulator:(https://cloud.google.com/bigtable/docs/emulator): The use of '\C' to match binary characters is not supported. Though bigtable emulator is very important part of our test environment we build for permanent usage and will need support on it.

Could you please tell me if you plan to do it and possibly some end date if yes? Will appreciate any info on that!

Thanks in advance!

bigtable feature request

Most helpful comment

All 11 comments

Thank you for the patience @Valentin-Nikolov and welcome to the Go cloud project!

So I think that the reason that \C is not permitted is because Go doesn't understand that escape sequence nor does the Go regexp library https://golang.org/pkg/regexp/syntax/ seem to permit it (please feel free to correct me if am mistaken, @rsc @dsymonds)

I think perhaps the way to fix this would be to replace "\C" with "." right before we perform the regexp compilation i.e.

diff --git a/bigtable/bttest/inmem.go b/bigtable/bttest/inmem.go
index 97d320cb..b5c59947 100644
--- a/bigtable/bttest/inmem.go
+++ b/bigtable/bttest/inmem.go
@@ -741,6 +741,11 @@ func toUTF8(bs []byte) string {

 func newRegexp(patBytes []byte) (*regexp.Regexp, error) {
    pat := toUTF8(patBytes)
+   // Step 1. We need to filter out and translate any meta-characters
+   // Such as:
+   //      \C to . -- in order to match any arbitrary byte
+   // See Issue https://github.com/googleapis/google-cloud-go/issues/1405
+   pat = strings.Replace(pat, "\\C", ".", -1)
    re, err := regexp.Compile("^" + pat + "$") // match entire target
    if err != nil {
        log.Printf("Bad pattern %q: %v", pat, err)

and perhaps this repro can guide as our test

package main

import (
    "context"
    "log"

    "cloud.google.com/go/bigtable"
)

func main() {
    ctx := context.Background()
    projectID, instanceID := "odeke-sandbox", "issue-1399"
    adminClient, err := bigtable.NewAdminClient(ctx, projectID, instanceID)
    if err != nil {
        log.Fatalf("Failed to create NewAdminClient: %v", err)
    }
    defer adminClient.Close()

    client, err := bigtable.NewClient(ctx, projectID, instanceID)
    if err != nil {
        log.Fatalf("Failed to create NewClient: %v", err)
    }
    defer client.Close()

    tblName := "table-1405"
    err = adminClient.CreateTable(ctx, tblName)
    if err != nil && false {
        log.Fatalf("Creating table: %v", err)
    }

    // Create the column family
    if err := adminClient.CreateColumnFamily(ctx, tblName, "cf0"); err != nil && false {
        log.Fatalf("Failed to CreateColumnFamily: %v", err)
    }

    // Populate the table
    tbl := client.Open(tblName)
    mut := bigtable.NewMutation()
    mut.Set("cf0", "col", 1000, []byte("any"))
    if err := tbl.Apply(ctx, "row", mut); err != nil {
        log.Fatalf("Populating table: %v", err)
    }

    row, err := tbl.ReadRow(ctx, "row", bigtable.RowFilter(bigtable.ColumnFilter("co\\C")))
    if err != nil {
        log.Fatalf("ReadRows error: %v", err)
    }
    if len(row) == 0 {
        log.Fatalf("Did not return a result!")
    }
    log.Printf("Row: %+v\n", row)
}

which produces

go run main.go 
2019/05/21 18:29:16 Row: map[cf0:[{Row:row Column:cf0:col Timestamp:1000 Value:[97 110 121]}]]

@igorbernstein2 @sduskis and Bigtable could you please examine my response/repro and please help guide/update me and I can send a CL to fix this? Thank you!

Cloud Bigtable documents [1] that it uses the RE2 syntax [2], which does include \C. It is true that Go's regexp package supports the RE2 syntax except for \C [3], so there is a feature gap between the emulator and production.

However, this is mostly because Cloud Bigtable runs RE2 in raw byte mode (Latin1), whereas the Go regexp package runs in UTF-8 mode (so . will match a single byte for RE2, but will match a single rune in Go), so \C is a long way from the only difference. I'm not sure hacking the emulator to turn \C into . is sufficient.

[1] https://cloud.google.com/bigtable/docs/reference/data/rpc/google.bigtable.v2#google.bigtable.v2.RowFilter
[2] https://github.com/google/re2/wiki/Syntax
[3] https://golang.org/pkg/regexp/

Thank you @dsymonds for the updates and direction, and thank you @rsc for the making that binaryregexp and the CL!

Thank you very much, guys! @odeke-em, do you know what date is planned to be the next release with the fix within? Also could you confirm once release is available the way to update the emulator is just to run the command: gcloud components update? And in addition If you know which is the exact component I need to update will be great, thanks!

Hi guys! Unfortunately the initial problem still persists and is not fixed - after update to the new release in Google SDK v.249.0.0 (31.05.2019) and v0.40.0 of https://github.com/googleapis/google-cloud-go/releases - bigtable: Fix Latin-1 regexp filters in bttest, allowing \C.), I still observe that same error:

"status": {
"code": "INVALID_ARGUMENT",
"description": "Error in field 'column_qualifier_regex_filter' : error parsing regexp: invalid escape sequence: \\C",
"cause": null,
"ok": false
}

Will open new request to investigate further.

Hello @Valentin-Nikolov, @garye just confirmed that the update has been made in cloud sdk 251.0.0 that has just been released, it'll perhaps propagate outward and finally end up on https://cloud.google.com/sdk/docs/release-notes in a few hours or so.

Thank you for reporting and following up on this issue.

@Valentin-Nikolov to clarify, I misspoke, cloud sdk 251.0.0 will be released next week and NOT in a couple of hours. My apologies and thank you @garye for the clarification!

251.0.0 is released and the emulator documentation has been updated.

Thank you guys!

Hi, @odeke-em, @jadekler, @rsc! I'm sorry to say the bad news but issue is still there a little bit different (using both sdk - v. 251.0.0 and v.252.0.0):

"status": {
                        "code": "INVALID_ARGUMENT",
                        "description": "Error in field 'column_qualifier_regex_filter' : error parsing regexp: invalid UTF-8: `\ufffd槌禱ufffd\ufffd\\C*$`",
                        "cause": null,
                        "ok": false
                    }

Here is the output from the container with the bigtable emulator when check the installed google sdk version:

:bash-4.4# gcloud version Google Cloud SDK 252.0.0 beta 2019.05.17 bigtable bq 2.0.43 cbt core 2019.06.21 gsutil 4.39

To update I'm using gcloud components update --version 252.0.0

Please write me down if something I miss or if new ticket is required to be open?

Best Regards,
Valentin

Was this page helpful?
0 / 5 - 0 ratings

Related issues

junghoahnsc picture junghoahnsc  路  4Comments

rileykarson picture rileykarson  路  4Comments

rntk picture rntk  路  3Comments

dragan-cikic-shortcut picture dragan-cikic-shortcut  路  3Comments

deelienardy picture deelienardy  路  3Comments