Amplify-cli: Streaming function and ddb_to_es.py generate different _id in elastic search.

Created on 27 Mar 2020  路  9Comments  路  Source: aws-amplify/amplify-cli

Describe the bug

This conversation was started on https://github.com/aws-amplify/amplify-cli/issues/3705.

Relevant comments are:
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605118467
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605125513
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605128083
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605151484
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605160134
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605202981
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605206802
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605209743

The ddb_to_ess.py (from https://github.com/aws-amplify/amplify-cli/pull/3712) causes the IDs for elasticsearch to be different from what the autogenerated streaming lambda creates.

When I ran the ddb_to_ess.py script to delete duplicates it did indeed delete duplicates. However, it made the "_id" for the items to be formatted as ID|SLUG.

However, the streaming lambdas create the "_id" as just ID. This is happens after updating, even with the lambda that checks for the key.

So if I use the ddb_to_ess.py to try and migrate up, then any time I update an item it creates a new elastic search record with the _id being just ID.

Example difference:
For an item with the ID of D-436-36 and slug of d-436-miniseal-crimp-splice-26-20-awg-gauge-wire-d-436-36.

  • ddb_to_ess.py ID - D-436-36|d-436-miniseal-crimp-splice-26-20-awg-gauge-wire-d-436-36
  • Streaming Lambda ID - D-436-36

I am using a @key for a search by slug. But it should not be the primary key. The ID is still a valid primary key.

Amplify CLI Version

Amplify v4.17.1

To Reproduce

Have a key for the slug, but this is just to create a query by slug @key(name: "BySlug", fields: ["slug"], queryField: "productsBySlug")

Run the ddb_to_es python script to remove duplicates. This makes all items create with the _id of ID|SLUG.

Update one of the items. This creates a new item with the _id being ID

Expected behavior

The _id generated byt the ddb_to_es.py function should be the same as the ones created by the streaming function.

Desktop (please complete the following information):

  • OS: macOS Catalina 10.15.3
  • Node Version. Node 10.15.3 (Yup, happens to be the same as the OS.)

An email was sent to the amplify team with the pertinent code.

@searchable bug graphql-transformer

All 9 comments

@SwaySway

@simeon-smith I am experiencing the same issue as you do.

DdbToES Lambda function as the same with v4.17.2's.

The _id generated by the DdbToEsFn together with the ddb_to_es.py are not aligned with the lambda trigger.

For now, i am hardcoding my ddb_to_es.py and DdbToEsFn to my table's primary key.

For those who need's a quick fix:

in ddb_to_es.py change:
ddb_keys_name = [a['AttributeName'] for a in table.attribute_definitions]
to
ddb_keys_name = [a['AttributeName'] for a in [{ 'AttributeName': "<YOUR TABLE PRIMARY KEY HERE" }]]
e.g.
ddb_keys_name = [a['AttributeName'] for a in [{ 'AttributeName': "id" }]]

What this does is to hardcode the 'keys' that is sent to DdbToEsFn. The output _id format will be the same as @SwaySway's PR.

This solution will only work if:

  • You are using v4.16.1 and up
  • Your DdbToEsFn is updated

Any updates on this?

Hello @simeon-smith
Looking at the other linked issue it is possible that you saw different behavior do a change in stream arn. Was that the case in the other issue you listed?

No, this caused the troubleshooting that ended up in the stream arn getting changed.

@SwaySway Having the same issue on version 4.26.0

generated script _id:
2020-07-21T18:57:30.424Z|P-ABC-123123123123123|91a8d41b-81c0-4287-a1b4-6a0d10e00a27

if I update an item in dynamo, generates new entry with _id:
2020-07-21T18:57:30.424Z|91a8d41b-81c0-4287-a1b4-6a0d10e00a27

This appears to be due to a secondary index @key(name: "bySomeKey", fields: ["someUniqueID", "createdAt"]) on this model.

If I log the ddbToES function, my ddb['Keys'] (that are used in compute_doc_index) are:

{'createdAt': {'S': '2020-07-21T18:57:30.424Z'}, 'someID': {'S': '91a8d41b-81c0-4287-a1b4-6a0d10e00a27'}}

if I log ddb_keys_name in the script I get:

['createdAt', 'someUniqueID', 'someID']

Looking at the Key portion, which is what the script determines as the primary id for the record, includes keys from given GSI/LSI by checking the attribute schema. Labeling this as a bug to prioritize.

Closing this issue as the fix is merged. The latest migration script will work on tables with LSI/GSIs.

Was this page helpful?
0 / 5 - 0 ratings