Amplify-cli: Elasticsearch / searchable updates broken for "old" Amplify installations

Created on 19 Mar 2020  Â·  17Comments  Â·  Source: aws-amplify/amplify-cli

@SwaySway I believe PR #3391 which was discussed in #3359 broke all our Elasticsearch indexes.

Before Amplify v4.16.1 (or a few versions before) a record would land in ES something like this:

{
       .... // more
        "_index": "member",
        "_type": "doc",
        "_id": "8f5ea23e-40e3-4bad-7cd9-35e36b4cda26",
        "_score": 32.273067,
       .... // more
}

After upgrading to 4.16.1 it will land like this:

{
       .... // more
        "_index": "member",
        "_type": "doc",
        "_id": "id=8f5ea23e-40e3-4bad-7cd9-35e36b4cda26",  <-------- note id= prepended in the value
        "_score": 32.273067,
       .... // more
}

_id is now prepended with the key field where it used to be just the value.

The result is very damaging and completely broke our entire application as ES will create a new document for any update to an existing item in the index. The result of that is "duplicate" items all over our UI listing, one for the old outdated record and one for the new one.

How can this be committed and enter Amplify without a HUGE warning AND a migration method documented? Do you think we are just playing around with non-production applications still?

After updating to latest CLI last night, I'm sitting here pulling my hair out trying to figure out what is happening for many hours today and by using Kibana I was able to get a closer look at the index and realize this "simple" change which has huge effects on all existing users of a @searchable.

Do you have any suggestions for me @SwaySway? Delete the entire index and then code a migration job for ALL DynamoDB tables that are using @searchable to have every single item updated and re-triggered so they get re-created in Elasticsearch seems like a lot of work.

bug graphql-transformer

Most helpful comment

Thanks for your feedback @SwaySway and sorry for being direct, but you guys need to a * grip and start respecting your users customers, or Amplify will be abandoned by anyone who needs anything else than a playground/mockup app.

I have done web development for 22 years now and I have never experienced more platform instability in relation to new releases than I have on Amplify and you largely operate like the early 2000s in terms of quality, transparency, and documentation.

This is still wrong:

  • You have no policy on releasing breaking changes. You fix one broken feature, and in the process break every single app using one of the tentpole features of the platform the @searchable directive. Changing the key ES stores the data should not surprise anyone with just a tiny bit of insight, that this will be a BREAKING CHANGE and should be treated as such. Using one more day to make a warning and add the 4 lines of code needed to fix backward compatibility should not be something you'd even consider not doing to get a bugfix out.

  • You still have no reasonable changelog. It may have improved slightly because it's no longer just repeating the same changes on new versions, but it's still not outlining changes in a decent fashion. Elasticsearch changes are not mentioned in any of the most recent versions, but if it would have, then I may have reviewed that part — and maybe not I — but someone else would have spotted that this is a breaking change (though that should have been spotted by your team, that is why you have reviews of code right?).

  • There seems to be no project management on Amplify.

  • Issues in this repository is at an all-time high and keeps increasing. It seems there's just not enough resources to keep up and that is worrying for companies that rely on the stability of this platform.

I will now end up spending two long dev days to first find the reason for and then correct a regression that should never happen. And in addition to the development cost and delay on other missing deadlines in our app, I will pay for AWS resources for doing these updates. I'm happy I don't have millions of items in DynamoDB, but if I did, this would be very costly to correct and the options outlined may not be feasible.

All 17 comments

@SwaySway I see your PR and not sure exactly what it does for me. What is your suggestion for me to do, as my PRODUCTION app is currently broken and is breaking more and more by the hour.

Is my best option to delete the index and run a job on all items in all tables which uses @searchable to have the Lambda re-add to the index?

Hello @houmark I understand your frustration and we’re working on documenting a migration workflow to be made available in our docs for customers to migrate existing data from DynamoDB into their Elasticsearch cluster with the change we introduced. At the same time, we’re working on a smoother upgrade experience in the CLI to inform our customers about this change and point them to the script to re-index if they hit a scenario such as this.

This change in the CLI was made because @searchable did not work with @key in the previous versions of the CLI (before we merged the fix for this) resulting in queries from the client not working with @key and it was a bug. This change was our attempt to fix this erroneous behavior.

This change here - PR #3712 would effectively remove duplicates and add the correct format into the es index.

To address your current issue in your production app, a potential workaround would be to delete the index and run a script, such as this one, to re-send the data from the DynamoDB Table to the ES Cluster.

If you would like to re-index instead of deleting the index, to minize downtime, below is a Gist outline of a script, mentioned above, only slightly modified so it runs locally per table along with code that resides in the lambda as well and does the re-indexing as well. Once this current pr is merged the new lambda will handle this change accordingly as well.

Gist: ddb_to_es.py

An example of calling the gist outline script would be

rn = region
tn = table_name
esd = elasticsearch endpoint
py ddb_to_ess.py --rn 'us-east-1' --tn 'Post-xxxxxxx-me' --esd 'https://search-post-elasti-xxxx-xxxx.us-east-1.es.amazonaws.com'

Thanks for your feedback @SwaySway and sorry for being direct, but you guys need to a * grip and start respecting your users customers, or Amplify will be abandoned by anyone who needs anything else than a playground/mockup app.

I have done web development for 22 years now and I have never experienced more platform instability in relation to new releases than I have on Amplify and you largely operate like the early 2000s in terms of quality, transparency, and documentation.

This is still wrong:

  • You have no policy on releasing breaking changes. You fix one broken feature, and in the process break every single app using one of the tentpole features of the platform the @searchable directive. Changing the key ES stores the data should not surprise anyone with just a tiny bit of insight, that this will be a BREAKING CHANGE and should be treated as such. Using one more day to make a warning and add the 4 lines of code needed to fix backward compatibility should not be something you'd even consider not doing to get a bugfix out.

  • You still have no reasonable changelog. It may have improved slightly because it's no longer just repeating the same changes on new versions, but it's still not outlining changes in a decent fashion. Elasticsearch changes are not mentioned in any of the most recent versions, but if it would have, then I may have reviewed that part — and maybe not I — but someone else would have spotted that this is a breaking change (though that should have been spotted by your team, that is why you have reviews of code right?).

  • There seems to be no project management on Amplify.

  • Issues in this repository is at an all-time high and keeps increasing. It seems there's just not enough resources to keep up and that is worrying for companies that rely on the stability of this platform.

I will now end up spending two long dev days to first find the reason for and then correct a regression that should never happen. And in addition to the development cost and delay on other missing deadlines in our app, I will pay for AWS resources for doing these updates. I'm happy I don't have millions of items in DynamoDB, but if I did, this would be very costly to correct and the options outlined may not be feasible.

@houmark totally agree with you..
indeed I raised almost same issue and have experienced EXACTLY same frustration #3602

@houmark and @rarira - Can we jump on a call with you to walk you through the solution that @SwaySway is proposing to ensure a successful resolution. DM me on Twitter @renebrandel to find a good time to call.

This change wasn't intended as a breaking change but we recognize the impact in hindsight. In parallel, @SwaySway is readying a self-serve solution and amending our documentation to cover this issue moving forward.

@brene the amplify team acknowledged this as a breaking change 9 days ago. It would re-assure me if the process for resolving breaking changes can be published, and whether the process was followed in this instance. If there is not a current policy, I’d urge one to be created and published, or at least a public outline of one. If your process was not followed, I’d like to hear why and what type of remediation is being put in place. If your process is being followed, I’d like to hear why you think the process is acceptable, and please consider that by default amplify console configures builds to pull the latest version of the cli, and the issue has been acknowledged for over a week, and was raised to the team two weeks ago.

This was a bug fix that resulted in an unintended change. For intended breaking changes, we update Amplify CLI's major versions.

@houmark - We have fixed this issue starting in CLI versions > 4.16.1 and prepared a Python script to remove duplicate records for anyone affected by the issue (CLI version 4.14.1 - 4.16.1). We have documented the migration step here in our documentation.

In addition, we have implemented a warning in the CLI flow to make our customers aware of this potential issue. In CLI versions > 4.16.1, when we detect that customers are affected, we show them the following prompt:

The behavior for @searchable has changed after version 4.14.1.
Read here for details: https://aws-amplify.github.io/docs/cli-toolchain/graphql#migration-warning'

Also if you would like to hop on a call and go through this manually with us, DM me on Twitter and we can find a suitable time.

From the SemVer spec:

As soon as you realize that you’ve broken the Semantic Versioning spec, fix the problem and release a new minor version that corrects the problem and restores backwards compatibility

Outstanding problems:

  • The amplify docs for this issue has a broken link for the example for how to run the migration script.
  • The amplify docs for this issue has duplicate links for the two python scripts. I'm not sure how the first one is supposed to delete records since it creates a synthetic MODIFY event, but that may be down to the example of how to call it returning a 404, or the script links being wrong.
  • I'd guess most people first experience new amplify-cli versions through the amplify console, as it defaults to pull the latest version. The CLI warning won't be seen by people using git hooks to deploy. The console should not auto-upgrade beyond 4.14.1 for any user who has not yet deployed since this release.
  • The console should no longer default projects to always use the latest version of the CLI. Anyone with "latest" should be switched to the current major version.

What I would like to have seen:

  • Revert the change. Add a custom resource lambda to check if elasticsearch ids have already been updated (query the id field starts with "id="). If so, scan and re-stream these items back in elasticsearch with the original id format while deleting the changed items.
  • Re-apply the PR with breaking changes in v5.

The ddb_to_ess.py (from https://github.com/aws-amplify/amplify-cli/pull/3712) causes the IDs for elasticsearch to be different from what the autogenerated streaming lambda creates.

When I ran the ddb_to_ess.py script to delete duplicates it did indeed delete duplicates. However, it made the "_id" for the items to be formatted as ID|SLUG.

However, the streaming lambdas create the "_id" as just ID.

Example difference:
For an item with the ID of D-436-36 and slug of d-436-miniseal-crimp-splice-26-20-awg-gauge-wire-d-436-36.

  • ddb_to_ess.py ID - D-436-36|d-436-miniseal-crimp-splice-26-20-awg-gauge-wire-d-436-36
  • Streaming Lambda ID - D-436-36

So if I use the ddb_to_ess.py to try and migrate up, then any time I update an item it creates a new elastic search record.

@simeon-smith Which version of the CLI did you last deploy your GraphQL schema with?

@kaustavghosh06 I used 4.17.1.

Hi @simeon-smith the ddb_to_es.py will grab the data from your dynamodb table. Regarding the formating of how that will look the script defers to your deployed streaming lambda function that exists in your amplify project. So what you are seeing from the script would be correct in that it uses the hash key, ID, and the sort key , SLUG, delimited by |.

Regarding this issue where it only contains ID was a byproduct of the old lambda function which did not support @key fully. This seems like an inconsistent result here, does your deployed lambda function check for keys, refer to L202?

@SwaySway Yes, the function is checking for keys for the environment I'm working in. My master environment hasn't been updated yet, but I can confirm that the DynamoDB table is streaming to the staging function.

@simeon-smith
If you used the lambda function prior to 4.14.1 with @key using the id as the primary key the following would happen.

if you had two records with the same ID
id='123' & slug = 'slug0'
id='123' & slug = 'slug1'
the es index would only contain one instance of that id
doc_id='123' would have info of slug1.

You can edit the streaming lambda to remove id='123' so the table can add the records for slug0 and slug1. You overwrite it by including a zipped lambda function, ElasticSearchStreamingLambdaFunction.zip into a functions folder.
This will override the function inside the build folder.
This change will remove the old id format when inserting the new one.
View L218
Sample change below:

            # migration step remove old key if it exists
            if ('id' in doc_fields) and (event_name == 'MODIFY') :
                action = {'delete': {'_index': doc_es_index_name, '_type': doc_type,
                    '_id': doc_fields['id'] }}
                es_actions.append(json.dumps(action))

@SwaySway, I think there is some confusing around order of operations for my issue. I am using a @key for a search by slug. But it should not be the primary key. The ID is still a valid primary key.

The issue is, that the streaming function with a check for key is actually creating a new record with the _id being just the id. So deleting records that are just the id would delete any previous updates.

Steps to run into this issue:
Have a key for the slug, but this is just to create a query by slug @key(name: "BySlug", fields: ["slug"], queryField: "productsBySlug")

Run the ddb_to_es python script to remove duplicates. This makes all items create with the _id of ID|SLUG.

Update one of the items. This creates a new item with the _id being ID

In that case the issue here is that the lambda is not being consistent. Which I find strange since the ddb_to_es.py should effectively call the same lambda function. Could you open a different issue for this? I'd also like to see if I can reproduce this. Could you email us your schema? [email protected]

@SwaySway, happy to. Issue is above and email coming shortly.

Was this page helpful?
0 / 5 - 0 ratings