This conversation was started on https://github.com/aws-amplify/amplify-cli/issues/3705.
Relevant comments are:
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605118467
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605125513
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605128083
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605151484
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605160134
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605202981
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605206802
https://github.com/aws-amplify/amplify-cli/issues/3705#issuecomment-605209743
The ddb_to_ess.py (from https://github.com/aws-amplify/amplify-cli/pull/3712) causes the IDs for elasticsearch to be different from what the autogenerated streaming lambda creates.
When I ran the ddb_to_ess.py script to delete duplicates it did indeed delete duplicates. However, it made the "_id" for the items to be formatted as ID|SLUG.
However, the streaming lambdas create the "_id" as just ID. This is happens after updating, even with the lambda that checks for the key.
So if I use the ddb_to_ess.py to try and migrate up, then any time I update an item it creates a new elastic search record with the _id being just ID.
Example difference:
For an item with the ID of D-436-36 and slug of d-436-miniseal-crimp-splice-26-20-awg-gauge-wire-d-436-36.
I am using a @key for a search by slug. But it should not be the primary key. The ID is still a valid primary key.
Amplify v4.17.1
Have a key for the slug, but this is just to create a query by slug @key(name: "BySlug", fields: ["slug"], queryField: "productsBySlug")
Run the ddb_to_es python script to remove duplicates. This makes all items create with the _id of ID|SLUG.
Update one of the items. This creates a new item with the _id being ID
The _id generated byt the ddb_to_es.py function should be the same as the ones created by the streaming function.
An email was sent to the amplify team with the pertinent code.
@SwaySway
@simeon-smith I am experiencing the same issue as you do.
DdbToES Lambda function as the same with v4.17.2's.
The _id generated by the DdbToEsFn together with the ddb_to_es.py are not aligned with the lambda trigger.
For now, i am hardcoding my ddb_to_es.py and DdbToEsFn to my table's primary key.
For those who need's a quick fix:
in ddb_to_es.py change:
ddb_keys_name = [a['AttributeName'] for a in table.attribute_definitions]
to
ddb_keys_name = [a['AttributeName'] for a in [{ 'AttributeName': "<YOUR TABLE PRIMARY KEY HERE" }]]
e.g.
ddb_keys_name = [a['AttributeName'] for a in [{ 'AttributeName': "id" }]]
What this does is to hardcode the 'keys' that is sent to DdbToEsFn. The output _id format will be the same as @SwaySway's PR.
This solution will only work if:
DdbToEsFn is updatedAny updates on this?
Hello @simeon-smith
Looking at the other linked issue it is possible that you saw different behavior do a change in stream arn. Was that the case in the other issue you listed?
No, this caused the troubleshooting that ended up in the stream arn getting changed.
@SwaySway Having the same issue on version 4.26.0
generated script _id:
2020-07-21T18:57:30.424Z|P-ABC-123123123123123|91a8d41b-81c0-4287-a1b4-6a0d10e00a27
if I update an item in dynamo, generates new entry with _id:
2020-07-21T18:57:30.424Z|91a8d41b-81c0-4287-a1b4-6a0d10e00a27
This appears to be due to a secondary index @key(name: "bySomeKey", fields: ["someUniqueID", "createdAt"]) on this model.
If I log the ddbToES function, my ddb['Keys'] (that are used in compute_doc_index) are:
{'createdAt': {'S': '2020-07-21T18:57:30.424Z'}, 'someID': {'S': '91a8d41b-81c0-4287-a1b4-6a0d10e00a27'}}
if I log ddb_keys_name in the script I get:
['createdAt', 'someUniqueID', 'someID']
Looking at the Key portion, which is what the script determines as the primary id for the record, includes keys from given GSI/LSI by checking the attribute schema. Labeling this as a bug to prioritize.
Closing this issue as the fix is merged. The latest migration script will work on tables with LSI/GSIs.