Amplify-cli: Best practices for seeding DynamoDB entries?

Created on 11 Oct 2019  路  7Comments  路  Source: aws-amplify/amplify-cli

* Which Category is your question related to? *
Amplify

* What AWS Services are you utilizing? *
DynamoDB
AppSync

* Provide additional details e.g. code snippets *
I am currently thinking about how I should seed my DynamoDB tables with entries and how to best structure this code.

I essentially want to seed a couple of hundred entries and dont want other devs to manually add entries one by one if we can structure this with code. How are others doing it?

What I am thinking about right now is just to write a seed.js node function and load up the tables directly in DynamoDB with an iteration through my entries. Wonder if there is a better more "Amplify" native way to do this.

Thanks

feature-request help-wanted pending-review storage

Most helpful comment

I am marking this as a feature request for Amplify CLI to support seeding data natively as we don't support this right now.

All 7 comments

I use an S3 bucket with a trigger to process csv files. There is a specific folder structure to keep all updates in order. This gives me a poor way to audit updates, its just a new file that appends / removes more information. The same lambda is hooked up as a custom resource so that when a dev creates a new environment it will take our base set of csv files.
I also pump all information through AppSync and not into dynamo directly. The lambda uses the generated types to prevent accidental breaking changes.
This is only done with data that changes once a year (political boundary information).

Interesting @RossWilliams . I assume that you in that case have one main dev environment and not an environment per developer?

Yes, thinking about it, I probably want to use AppSync directly. How do you handle authentication to AppSync in that Lambda function? Username and password or roles? I do not want to give IAM and Lambda full write access to everything in AppSync, but I guess I could pass a Cognito username and password to when triggering the seed.js file from command line, that would just not work on a Lambda function without me hardcoding credentials or storing them in environment variables...

We have an environment per developer and per backend feature. We store the same csvs in source code so that new environments get bootstrapped with proper boundary data via a custom resource. We also use this to bootstrap test data.

We use SSM to store service account credentials, no hard coded passwords needed. Another custom resource sets up the service account and store the password in SSM. My models include auth rules for the service account. This was setup before the new auth rules existed

I dug a little bit on this topic a few months ago and didn't find anything exciting. So I'm seeding my DynamoDB tables using a custom CLI and JSON files.
If you are interested I detailed that in a post: https://medium.com/@christophe.bougere/aws-amplify-beyond-the-quickstart-c389f8e44c92#45d8
However, I would love having this capability natively in amplify CLI with a data model closer to the GraphQL schema than DynamoDB tables (in my solution 1 JSON file = 1 DynamoDB table), because it can be cumbersome to create all relationship in many files on large datasets.

I am marking this as a feature request for Amplify CLI to support seeding data natively as we don't support this right now.

I did this with a function I wrote in Node. I run it locally. It's a BatchWrite to Dynamodb.

var AWS = require("aws-sdk");

AWS.config.update({
  region: "us-west-2"
});

// pulls creds from my local AWS Creds
const credentials = new AWS.SharedIniFileCredentials({ profile: "MY_COOL_PROFILE" });
AWS.config.credentials = credentials;

const docClient = new AWS.DynamoDB.DocumentClient();
let itemsToPut = [];

// In my case I loop through a provided JSON doc, and transform/normalize some data
// This is real stripped down for clarity.
for(let i = 0; i < SOURCE.length ; i++) {
    itemsToPut.push({
      PutRequest: {
        Item: { title: SOURCE[i].title }
      }
    });

    // Delete items instead....
    // itemsToPut.push({
    //   DeleteRequest: {
    //     Key: { id: newProduct.id }
    //   }
    // });

}

// Dynamo has a limit on batch requests of 25 items...
// Chunk it out into an array with 25 items a piece
const almostReady = chunkArray(itemsToPut, 25);

almostReady.forEach((items, index) => {
  const batchWriteParams = {
    RequestItems: {
      "TABLE_TO_PUT_ITEMS_IN": items
    }
  };

  docClient.batchWrite(batchWriteParams, function(err, data) {
    if (err) console.log(err);
    else console.log(JSON.stringify(data), new Date().toISOString());
  });
});


// Break the array into chunks
function chunkArray(myArray, chunk_size) {
  var results = [];

  while (myArray.length) {
    results.push(myArray.splice(0, chunk_size));
  }

  return results;
}

Takes me under a minute to process my data (JSON file is ~130mb, so we use a stream) and import ~25,000 items. These items do not have connections. FYI - If you have your @model setup with the @searchable directive it'll also stream into Elasticsearch. But that stream takes awhile (~15-30min) to fully catch up.

I'm currently using a Node.js script that calls API class of aws-amplify SDK. This call include my JSON file and auto seed DynamoDB through AppSync API. Code example:

// AWS Amplify required imports
const {
  Amplify,
  Auth,
  API,
  graphqlOperation
} = require('aws-amplify');
// Mock to authenticate to API with Cognito (If you're using Cognito)
const authconfig = require('./auth_config.json')
// AWS Configuration - I recommend that's you use aws-exports file
const awsconfig = require('../../src/aws-exports');
// My graphQL Mutations
const {
  createTag
} = require('../../src/graphql/mutations')

//Aditional lib to generate UUID
const uuid = require('uuid')

//My JSON with local data that will be seed in DynamoDB
const Tags = require('../../src/data/Tag.json')

// Repeat this function for all tables, or implement another logic to become this very dynamic (In my case, i don't need)
function getTags() {
  console.log('Inserindo registros na tabela Tags...')
  Tags.forEach(async (item) => {
    const id = uuid.v4();
    const inputData = {
      id: id,
      name: item.name
    }
    console.log(`- Inserindo tag: ${item.name}`);
    try {
      // This is a core of function, here the Amplify SDK do the "magic"
      await API.graphql(graphqlOperation(createTag, {
        input: inputData,
      }));
    } catch (error) {
      console.error(`${inputData}: \n${error}`);
    }
  })
}

// Initialize tables seed
async function initialize() {
  try {
    //Configure AWS and authenticate in Cognito (If is required)
    Amplify.configure(awsconfig)
    await Auth.signIn(authconfig.username, authconfig.password);
    console.log('Semeando banco de dados...');
  } catch (error) {
    console.error('Authentication failed!', error);
  }
  getTags();
  console.info('Dados inseridos com sucesso!');
}

initialize();

If you'll interact with modules that's use import and export, you must install babel-node as dev dependencies, and edit your .babel.config.jsto:

module.exports = {
  presets: ['@babel/preset-env']
};

So, you can execute script with command: babel-node --presets env scripts/seeder/seed.js, and configure script inside package.json, example: "db:seed": "babel-node --presets env scripts/seeder/seed.js". (npm run db:seed)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

darrentarrant picture darrentarrant  路  3Comments

jkeys-ecg-nmsu picture jkeys-ecg-nmsu  路  3Comments

adriatikgashi picture adriatikgashi  路  3Comments

ReidWeb picture ReidWeb  路  3Comments

jexh picture jexh  路  3Comments