Amplify-cli: A way to migrate data when updating app

Created on 7 May 2019 · 11Comments · Source: aws-amplify/amplify-cli

Is your feature request related to a problem? Please describe.
When developing our app, we use 3 environments : dev, preproduction, production. Often there is :
• a need to alter the schemas to add required fields => after push, existing data have this field set to null
• a need to add a new data schema that should be populated at first (ie app parameters) => after push the dynamoDB table is empty

It seems to be missing a feature in amplify cli to migrate the databases so we can achieve seamless push in new environments

Describe the solution you'd like
It would be great to have the ability to describe migrations of data in the amplify folder so that the migrations are executed upon push

Describe alternatives you've considered
• Using dynamoDB interface to input the data by hand => difficult if there is a lot of data
• Using a custom external script to trigger mutations with the data needed to modify or input => sometimes you want to disable mutations on this particular schema (i.e. for a list of Countries) so you cannot do this easily. This also requires more boilerplate code.
• using a custom script with aws js sdk => seems the way to go for now

Additional context
Some great things exist with other frameworks, I will only link there some I have used : for meteor, for laravel
I think version numbering is a must have for such functionality

If you have other alternatives, please comment here, I would be happy to test other solutions

feature-request pending-review storage

Source

vparpoil

👍40 ❤6 👀4

Most helpful comment

It's really a surprise that on amplify team member provides any useful information for this request. The feature is a MUST-HAVE feature for a data related solution.

It seems the data model evolution and data migration in amplify are completely forgotten.

ivenxu on 9 Jun 2020

👍6

All 11 comments

Is there any comment on this from the Amplify team? Or suggested steps for migrating DB information (are Data Pipeline or Custom CSV functions our only option?)

blazinaj on 16 Sep 2019

Migrations mechanism also could help with GSI updates issues .

Ravirael on 20 Dec 2019

Not sure if this helps anyone but I created a process for running migrations via an npm run command:

const common = require('./common.js');
const AWS = require('aws-sdk');
const migrations = [
// ensure migratons are in date order (oldest at the top)
require('./migrations/20200201-lea-180'),
require('./migrations/20200210-lea-184')
];
global.fetch = require('node-fetch');

/**

This file is used to data migrations and only data migrations. Schema changes are handled by amplify.
- In order to run a migration:
1. Add the file into the migrations folder (copy the template)
1. Require the reference to the BOTTOM of the migrations array above
- Best practice, Make no changes to schema that are going to cause backwards compatibility issues.
e.g no deleting columns/tables
- Yes, I realise this will create technical debt with rouge unused colums everywere but Amplify is changing the schema itself
We can run clean up at a later date when we know the data that you are migrating has been changed.
- NOTE: The schema only changes in Appsync not dynamoDB itself, do not expect new columns to appear.
  
  */
  
  const environmentName = common.getCurrentEnv();

(async () => {
AWS.config.update({region: 'eu-west-2'});

// if we heve no CI vars then use the local creds
if (process.argv.length === 2) {
AWS.config.credentials = new AWS.SharedIniFileCredentials({profile: 'PROFILE NAME'});
} else {
// if CI then use env vars
AWS.config.credentials = {
accessKeyId: process.argv[ 2 ],
secretAccessKey: process.argv[ 3 ]
};
}

let dbConnection = new AWS.DynamoDB({apiVersion: '2012-08-10'});
try {
// Make sure there is a migrations table
console.log('Getting migration table');
let migrationTableName = await common.findTable(dbConnection, 'Migration-' + environmentName, null, true, true);

// If it doens't exist, create it
if (!migrationTableName) {
  console.log('Migration table not found...creating');
  migrationTableName = await createMigrationTable(dbConnection, 'Migration-' + environmentName);
  console.log('Migration created');
}

// Get all migrations that have been ran
const previousMigrationsRaw = await common.getAllItems(dbConnection, migrationTableName);
const previousMigrations = previousMigrationsRaw.map((migration) => migration.migrationName.S);
const successfulMigrations = [];
let rollBack = false;

for (const migration of migrations) {
  // Do I run the migration?
  if (previousMigrations.some((m) => m === migration.name)) {
    console.log('Already ran migration: ' + migration.name);
  } else {
    console.log('Running migration: ' + migration.name);

    // Try to run migration
    try {
      await migration.up(dbConnection, environmentName);
      successfulMigrations.unshift(migration);
      console.log('Successfully ran: ', migration.name);
    } catch (e) {
      console.error('Up Error: ', migration.name, e);
      console.error('Breaking out of migration loop');
      // Push the failed migration so we can run the down
      successfulMigrations.unshift(migration);
      rollBack = true;
      break;
    }
  }
}

// Was there an error? if so run all downs
if (rollBack) {
  console.error('Attempting to revert ' + successfulMigrations.length + ' migrations');
  for (const migration of successfulMigrations) {
    console.error('Attempting to revert ' + migration.name);
    try {
      // Need to down all
      await migration.down(dbConnection, environmentName);
    } catch (e) {
      console.error('Down Error: ', migration.name, e);
    }
  }
} else {
  // Save migration completion
  console.log('Saving migrations to server', successfulMigrations);
  for (const migration of successfulMigrations) {
    await common.putItem(dbConnection, migrationTableName, {
      'migrationName': {
        S: migration.name
      },
      'migrationDate': {
        S: new Date().toISOString()
      }
    });
  }
}

} catch (e) {
throw (e);
}
})();

async function createMigrationTable (dbConnection, tableName) {
var params = {
AttributeDefinitions: [
{
AttributeName: 'migrationName',
AttributeType: 'S'
},
{
AttributeName: 'migrationDate',
AttributeType: 'S'
}
],
KeySchema: [
{
AttributeName: 'migrationName',
KeyType: 'HASH'
},
{
AttributeName: 'migrationDate',
KeyType: 'RANGE'
}
],
TableName: tableName,
BillingMode: 'PAY_PER_REQUEST'
};

// Call DynamoDB to create the table
await dbConnection.createTable(params).promise();
return tableName;
}

Not the cleanest code but now I just have a folder which contains js files that export a name and an up and a down function which talk to dynamoDB directly. as in the docs: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/GettingStarted.JavaScript.html

FunkotronKing on 6 Feb 2020

Really?? No comment on this? I don't understand how you're supposed to make any changes if you have an app in production, other than completely ejecting Amplify and managing your stacks et. al. completely yourself once you have live data and users in your app - which isn't a completely unreasonable idea, but I have not seen any mention of this being a purely development-stage only tool.

lukeramsden on 11 May 2020

👍1

It's really a surprise that on amplify team member provides any useful information for this request. The feature is a MUST-HAVE feature for a data related solution.

It seems the data model evolution and data migration in amplify are completely forgotten.

ivenxu on 9 Jun 2020

👍6

I've switched to using Postgraphile w/ graphile-migrate for my backend, once you get the hang of writing your schema (playing around with graphile-starter helped a lot) it's really very nice. Forward-only migrations seem to be working well for me, and a real relational database means I can offload most of the work from the client to the server - a core premise of GraphQL is supposed to be eliminating client data processing, as it get's the data in exactly the format it wants. I still use Amplify to manage my Auth and S3, and for that purpose it works very well.

lukeramsden on 9 Jun 2020

👍1

No responses yet ?

luisenaguero on 16 Nov 2020

Trying.

cawfree on 20 Nov 2020

I have started to invest in the platform but an 18 month old issue like this, with no official comment, doesn't convince me that I would be able to manage a serious production application using amplify/appsync.

markau on 12 Dec 2020

👍1

Not by any means a scalable/robust migration system for a team but fwiw I have been using an AWS::CloudFormation::CustomResource with a a setupVersion and a setup lambda function.

        "Version": {
          "Ref": "setupVersion"
        },
        "ServiceToken": {
          "Ref": "function..."
        }

Then I've been making idempotent changes on version change via the lambda...works ok for dynamo/etc since you can't make substantial changes anyways but wouldn't be great for sql changes.

cdunn on 12 Dec 2020

👍1

My approach has been the same as @cdunn. To elaborate a little, here are some more implementation details:

I have created a lambda called MigrationService. In the resources section of the template, I have the following custom resource:

"CustomMigrationService": {
      "DependsOn": [
        "AmplifyResourcesPolicy",
        ...
      ],
      "Type": "Custom::MigrationService",
      "Properties": {
        "ServiceToken": {
          "Fn::GetAtt": [
            "LambdaFunction",
            "Arn"
          ]
        },
        "TriggerVersion": 5
      }
    }

The most important thing in this custom resource is the TriggerVersion. If it is incremented, then the lambda will be executed upon deployment. So if you deployed with version 1, then made changes to your code and redeployed without incrementing the TriggerVersion, your lambda will not be executed.

Be sure to give the lambda the necessary access so it can make all the necessary migrations. I have done that by editing the AmplifyResourcesPolicy section and adding statements to the AmplifyResourcesPolicy > Properties > PolicyDocument > Statement. E.g.:

{
              "Effect": "Allow",
              "Action": [
                "cognito-idp:AddCustomAttributes",
                "cognito-idp:AdminAddUserToGroup",
                "cognito-idp:ListUsers"
              ],
              "Resource": [
                {
                  "Fn::Join": [
                    "",
                    [
                      "arn:aws:cognito-idp:",
                      {
                        "Ref": "AWS::Region"
                      },
                      ":",
                      {
                        "Ref": "AWS::AccountId"
                      },
                      ":userpool/",
                      {
                        "Ref": "authcognitoUserPoolId"
                      }
                    ]
                  ]
                }
              ]
            },

{
              "Effect": "Allow",
              "Action": [
                "dynamodb:Get*",
                "dynamodb:BatchGetItem",
                "dynamodb:List*",
                "dynamodb:Describe*",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:Update*",
                "dynamodb:RestoreTable*"
              ],
              "Resource": [
                {
                  "Ref": "storageddbBlogArn"
                },
                {
                  "Fn::Join": [
                    "/",
                    [
                      {
                        "Ref": "storageddbBlogArn"
                      },
                      "index/*"
                    ]
                  ]
                }
              ]
            }

Next up, the handler of the lambda needs to account for the creation of the custom resource. Here's the skeleton of my code:

exports.handler = async (event) => {
    const cfnCR = require('cfn-custom-resource');
    const physicalResourceId = "physicalResourceId-MigrationService-112233"
    const { sendSuccess, sendFailure } = cfnCR;

    if (event.RequestType === "Delete") {
        const result = await sendSuccess(physicalResourceId, {}, event);
        return result;
    }

    try {
       // your code here 

        const result = await sendSuccess(physicalResourceId, {}, event);
        return result;
    } catch (err) {
        // your code here 
        const result = sendFailure(err, event);
        return result;
    }
};

Probably the most important thing here is to handle the Delete event. Your lambda will be executed if your stack is being rolled back so if your stack is rolling back because the lambda errored out when deploying then calling it again during rollback will end up hanging cloudformation.

Lastly, I've implemented versioning so I do not rerun migration scripts. (Keeping scripts idempotent and re-runnable is always a great idea however, it could get expensive if you have a long list of migration scripts so skipping the ones that have already executed comes in handy. If you have few re-runnable scripts you can potentially skip this.)

In my case, i have 3 environments so I store the latest deployed version number in a dynamodb table. When the lambda is triggered it will pull the latest deployed version number on that environment and will then load+run the migration scripts that have higher version.

My migration scripts folder structure is:
migrationScripts
| component
| version.js

(I have separated the project into a few components that could be deployed independently but you might not need that)

It would have been nice if there was a built-in feature to help with the migration but the good news is that this approach works (given adequate access) for any AWS resource change and not only data.