Yugabyte-db: [CDC] GetChangesResponse in Java Client getChanges callback are incorrect/malformed

Created on 7 Feb 2020  路  10Comments  路  Source: yugabyte/yugabyte-db

When using the java yb client in order to access CDC events, the GetChangesResponse provided to the callback provided to https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-client/src/main/java/org/yb/client/AsyncYBClient.java#L371 has what appear to be merged changes with only a single primary key set.

E.g. given a table:

CREATE TABLE entity_attributes (id text, user_id text, type text, name text, value text, PRIMARY KEY (id, user_id)) WITH transactions = { 'enabled' : true }

I receive a change event:

2020-02-06 21:40:26,543 [INFO|org.yb.cdc.LogClient|LogClient] time: 6475948188414873600
operation: WRITE
key {
  key: "id"
  value {
    string_value: "Email"
  }
}
key {
  key: "user_id"
  value {
    string_value: "0036g000009n0I8AAI"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "b.levy@expressl&t.net"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "j.davis@expressl&t.net"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}
changes {
  key: "type"
  value {
    string_value: "string"
  }
}
changes {
  key: "name"
  value {
    string_value: "Email"
  }
}
changes {
  key: "value"
  value {
    string_value: "[email protected]"
  }
}

For the above event (which I receive both from a direct java invocation of getChanges or from the .jar kafka connector), I expect 27 distinct events with 27 distinct compound primary key pairs.

The table contents (with a single insert performed per row, generating the above event output):

cqlsh> select * from agc.consumer_entity_attributes WHERE id='Email';

 id    | user_id   | type   | name  | value
-------+--------------------+--------+-------+---------------------------------------
 Email | 0036g000009n0HoAAI | string | Email |                        [email protected]
 Email | 0036g000009n0HtAAI | string | Email |              [email protected]
 Email | 0036g000009n0HyAAI | string | Email |           [email protected]
 Email | 0036g000009n0I3AAI | string | Email |                [email protected]
 Email | 0036g000009n0I8AAI | string | Email |        [email protected]
 Email | 0036g000009n0IDAAY | string | Email | [email protected]
 Email | 0036g000009n0IIAAY | string | Email |                   [email protected]
 Email | 0036g000009n0INAAY | string | Email |               [email protected]
 Email | 0036g000009n0ISAAY | string | Email |     [email protected]
 Email | 0036g000009z3EPAAY | string | Email |                         [email protected]
 Email | 0036g000009z3EQAAY | string | Email |                         [email protected]
 Email | 0036g000009z3ERAAY | string | Email |                [email protected]
 Email | 0036g000009z3ESAAY | string | Email |                       [email protected]
 Email | 0036g000009z3ETAAY | string | Email |                 [email protected]
 Email | 0036g000009z3EUAAY | string | Email |              [email protected]
 Email | 0036g000009z3EVAAY | string | Email |             [email protected]
 Email | 0036g000009z3EWAAY | string | Email |                      [email protected]
 Email | 0036g000009z3EXAAY | string | Email |                        [email protected]
 Email | 0036g000009z3EYAAY | string | Email |                 b.levy@expressl&t.net
 Email | 0036g000009z3EZAAY | string | Email |                j.davis@expressl&t.net
 Email | 0036g000009z3EaAAI | string | Email |                     [email protected]
 Email | 0036g000009z3EbAAI | string | Email |                         [email protected]
 Email | 0036g000009z3EcAAI | string | Email |                        [email protected]
 Email | 0036g000009z3EdAAI | string | Email |                       [email protected]
 Email | 0036g000009z3EeAAI | string | Email |                        [email protected]
 Email | 0036g000009z3EfAAI | string | Email |                  [email protected]
 Email | 0036g000009z3EgAAI | string | Email |                        [email protected]
(27 rows)

The events for every permutation of id + user_id in this table received in this problematic case only ever have the single cluster key value, e.g. "0036g000009n0I8AAI" below, with different partition key id values.

[INFO|org.yb.cdc.LogClient|LogClient] time: 6475948188414873600
operation: WRITE
key {
  key: "id"
  value {
    string_value: "Email"
  }
}
key {
  key: "user_id"
  value {
    string_value: "0036g000009n0I8AAI"
  }
}

This occurs across tables for us, we are typically performing a drop table, recreate, then insert with a couple second delay after completion of each step when the issue most noticeably arises. We do not yet have enough mileage to know if this occurs with low volume edits without a drop and recreate table -- this has so far seemed to work (at least more) reliably.

Its critical for our use case that we reliably receive all events here (as I understand it to be for your planned use of CDC for e.g. 2DC etc.).

Are there existing issues that characterize this malformed message? E.g. at a minimum, i'd only expect a single changes { key: <value here> per CDC event (certainly per primary-key contents). If so, I'd be very interested in an ETA on a fix/resolution.

In the interim, is there any guidance around how to prevent this from occurring?

I'm happy to try to provide more info on how to replicate, but i struggled to reliably reproduce in a small toy app.

arecdc communitrequest kinbug

Most helpful comment

Hi @Rkiouak, I have a code change in review right now and should be able to land by today or early tomorrow.

All 10 comments

@rahuldesirazu This seems to be a bug in CDCProducer, likely in PopulateWriteRecord.

I was able to reproduce in code I can share: https://github.com/Rkiouak/cdc-yb-issue. Sending the inserts in a batch is what finally reproduced -- so it could be JUST batching thats sufficient, or some combination of drop & recreate, compound keys and batching.

@Rkiouak It looks like the issue is related to compound keys- we batch all compound keys with the same partition key together in the same response, even if they have different range keys. It's a straightforward fix so should have a code change out soon.

Great news @rahuldesirazu, appreciate the follow up, and i look forward to trying out the change

Good find @rahuldesirazu!

Good find @rahuldesirazu!

Any ETA on fix, update or related changeset/PR/MR i can monitor?

Im not sure what the standard YB process is here, so also happy to review any document on the bug report and resolution process.

Hi @Rkiouak, I have a code change in review right now and should be able to land by today or early tomorrow.

Hi @rahuldesirazu , I see: https://docs.yugabyte.com/latest/contribute/core-database/build-from-src/, but is there a dockerfile I somewhere I can use to bundle the build binary output into an image to use with the yugabyte helm chart? Thats my preferred deployment method, and will be the easiest way for me to test this change.

Thanks again for the help.

Hi @Rkiouak, we are releasing 2.1 version for YugabyteDB in this week which will include @rahuldesirazu 's fix. It might be easiest for you to wait for the release.
If you need the fix more urgently, then @bmatican or @Arnav15 might be able to provide guidance on your question on dockerfile + helm.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rahuldesirazu picture rahuldesirazu  路  3Comments

rohitjoshi picture rohitjoshi  路  4Comments

rkarthik007 picture rkarthik007  路  5Comments

joeblew99 picture joeblew99  路  5Comments

IS-Josh picture IS-Josh  路  3Comments