Amplify-cli: Total counts in GraphQL queries

Created on 19 Jul 2019  Â·  64Comments  Â·  Source: aws-amplify/amplify-cli

Is your feature request related to a problem? Please describe.
I think it's way past due that Amplify supports total counts in GraphQL queries. Other GraphQL based platforms has this built in by simply adding totalCount (or similar) in the query and no matter the limit, they'll get back the total count in addition to the (filtered) data.

Describe the solution you'd like
This should at least work for DynamoDB backed models and of course also for search based queries that pass by ElasticSearch.

Describe alternatives you've considered
Making a Lambda function that is a field in each model using the @function directive, but since we are both using listItems and searchItems with filters added, the implementation is not simple as we have to reapply those filters on the lambda function for getting the correct count.

Making custom resolvers seems like another not very fun route and not very scaleable or maintainable, and once again, this should be a "out of the box one liner" to have available as a developer. With either a Lambda or some other custom resolver I'm looking at hours or days of development.

Additional context
This is a must have feature and there's not really any workaround for displaying total counts for systems with many items — at least that I know of. I read several bug reports, but none of them seems to have a simple solution. That it has not yet been developer by AWS is beyond my understanding, as pulling out counts is one of the most common things to do when developing web apps.

feature-request graphql-transformer pending-review

Most helpful comment

Associating the word "impossible" for a query count in 2019 makes me cringe a bit. And more than that, it makes me wonder if selecting Amplify and all its (current) dependents was a very wrong choice.

The fact that DynamoDB does not do counts for its queries (besides a full table estimated count every ~6 hours) is simply a limitation the team working on DynamoDB should solve. Every single competitor to DynamoDB handles this without issues, so I'm sure those smart people can also come up with a solution that does not just benefit AppSync and Amplify users, but also people using DynamoDB directly. Maybe it will be near correct counts if millions and more precise when thousands like MySQL / InnoDB, and that would be way way better than having no clue whatsoever.

I am aware that using the nextToken I can make pagination but that paginator is somewhat less cool to look at from a UX perspective as I won't be able to show 1, 2, 3, 4....12 because I don't know how many pages I have. When someone wants to know how many items to we have fitting this filter, it cannot be that I have to pull them all out (only the id field) in the leanest way, and then count the array client-side?

I'm sure AWS compares themselves in some ways with other GraphQL services like Prisma etc. and they don't seem to have a problem supporting this.

This is a DynamoDB limitation. Attacking a solution on top of that for AppSync is the wrong angle, this needs to end on the table of AWS DynamoDB developers so they can come up with a sensible solution nearer to the root of the problem — everything else is a hack. Asking me to keep counts in a model/table myself when things update is even worse and not what you'd expect of a platform with an otherwise impressive feature set.

And if it's not possible for DynamoDB to solve this, then the Amplify / AppSync team should start considering built-in support for other major database players such as MongoDB, MySQL, Postgres, etc. so they are not being held down by a half-baked database that is backing the entire thing, but when that is considered, I am sure it looks way more interesting to just figure out a solution to counts and other minor limitations DynamoDB currently has.

All 64 comments

There is a workaround for this and a reason why it is not automatically included as part of amplify - DynamoDB does not have an efficient way to count items, and can't in general perform aggregate operations. If you want to count items, write a custom lambda to read from the dynamo table stream and conditionally increment or decrement a counter on the table. Add a lambda resolver to read this counter value from the table, and add it to your schema.

Because dynamo can't perform aggregate operations, and you potentially want to get item counts after a filter expression is applied, having the cli provide this feature is a tall order.

Yeah, I aware of DynamoDB limitations, but asking the developer to develop correct counts is also a tall order. As I can have any combination of filters, I would need to count based on that and that seems like a huge implementation.

A count of items is something most apps use, so while I may be able to develop something that works decently, this feature request really is about either DynamoDB fixing this or Amplify creating some kind of sensible workaround that does not put all the workload on the developer.

Thanks for your feedback, the things you outline has been in my thoughts in relation to a workaround on this, but trying to come up with a sensible workaround that respects any type of filter on any table seems like something I could spend a week or two developing and may still be prone to wrong counts in edge situations.

Maybe ElasticSearch has counts of total results, even if there's a limit applied and this could be included in results as a decent workaround? We use ElasticSearch for most queries, as the default filters are too limited anyways.

Elastic does provide this information, you can access it in the resolver template with: $ctx.result.hits.total.

There is a difficult disconnect between graphql and dynamodb development mindset. Dynamo DB requires planning of most access patterns in the design phase, while graphql makes an iterative approach seem straightforward. While amplify backed by dynamodb does offer some flexibility, it does require more ahead of time planning than other platforms, in this case with aggregates. If you know your aggregates now, and know they are stable, development of lambdas is doable. Solving the general case sounds much more difficult as long as your data is backed by dynamodb.

I agree that it would be helpful to have templates to speed up creation of lambda aggregate creators (including backfilling data).

Elastic does provide this information, you can access it in the resolver template with: $ctx.result.hits.total.

That sounds promising. Any hints on how I could create a @function directive to map to a resolver returning the total hits, without having to overwrite my default resolvers? I have not yet done any custom resolvers or overwritten in this project and I'm trying to avoid that. I have done a few @function directives that run a Lambda for other things (and that works well).

Elasticsearch by default returns the total hit count, and by default its accurate up to around 10,000. So if you make a rest api call in your lambda you will get this back as a response. Just update you schema to include it as a property. You will likely need a new type to hold the items, nextToken, and total property.

I do think that $ctx.result.hits.total should be exposed by default since it is readily available.

In terms of DynamoDB support, I'm not too sure. I mean it would be nice to have, but if you have a large dataset just getting the count on every request will be expensive. Because every time you go over 1 MB of data, you have to paginate through to get totals. In reality, you could scan through a table made up of millions of records before you can get a final result, which obviously makes no sense even in terms of performance.

If we were to build this into Amplify in the future with DynamoDB, could you give a bit more detail on some things:

  • Is it just total count of all items in the table?
  • Is it total count for items matching a query? E.G. similar to SELECT COUNT(*) FROM ...
  • Would you be ok with Amplify attaching Lambda functions to your DynamoDB table, thus increasing your costs?

Any other specifics on your requirements or use cases would help us look into this in the future.

@undefobj

I would say it would have to be the total count of items matching a query. This would be primarily needed for pagination purposes, it's very hard to give a good user experience if we are supposedly showing X results at a time out of Y number of records. This would also need to work with custom indexes, as most likely that's the most efficient setup when listing tables unless we are using searchable.

I wouldn't mind if there are additional lambda functions added to the GraphQL Query, however, I think it would be wise if we are able to choose whether to request/run this or not. If we are paginating across 1000 results with 10 results a time, I don't think we need to necessarily run the total length query 10 times, it could be that the first request is enough. Also, there could be instances where I wouldn't care much about the count and as a result wouldn't want to run the lambda function for no reason. Potentially another way to go around this could be to have a 'cache'/dynamo table store for counts; with rules of how long count results may be valid for.

I hope that makes sense.

I agree with @jonmifsud — the main interest for us is to have a count of the filtered query, so we can do a proper pagination and show the total results of that query.

A full count would be nice, but if you do a normal query, shouldn't that result in a count of all items? Also, maybe a way to just return the count would be nice instead of having the count as a part of the result dataset would be nice.

I also would like to be able to choose when to use this and when to not, to avoid extra processes running with load time and costs overhead.

You should be able to do pagination using nextToken without needing to know the count. However, what you propose above to have a count on the query results would be difficult. While not impossible, this feature would be quite difficult to do. I'll keep the issue open so that we can get other community feedback and measure interest, but for transparency it would need a lot of design and thought and wouldn't be delivered anytime soon.

Associating the word "impossible" for a query count in 2019 makes me cringe a bit. And more than that, it makes me wonder if selecting Amplify and all its (current) dependents was a very wrong choice.

The fact that DynamoDB does not do counts for its queries (besides a full table estimated count every ~6 hours) is simply a limitation the team working on DynamoDB should solve. Every single competitor to DynamoDB handles this without issues, so I'm sure those smart people can also come up with a solution that does not just benefit AppSync and Amplify users, but also people using DynamoDB directly. Maybe it will be near correct counts if millions and more precise when thousands like MySQL / InnoDB, and that would be way way better than having no clue whatsoever.

I am aware that using the nextToken I can make pagination but that paginator is somewhat less cool to look at from a UX perspective as I won't be able to show 1, 2, 3, 4....12 because I don't know how many pages I have. When someone wants to know how many items to we have fitting this filter, it cannot be that I have to pull them all out (only the id field) in the leanest way, and then count the array client-side?

I'm sure AWS compares themselves in some ways with other GraphQL services like Prisma etc. and they don't seem to have a problem supporting this.

This is a DynamoDB limitation. Attacking a solution on top of that for AppSync is the wrong angle, this needs to end on the table of AWS DynamoDB developers so they can come up with a sensible solution nearer to the root of the problem — everything else is a hack. Asking me to keep counts in a model/table myself when things update is even worse and not what you'd expect of a platform with an otherwise impressive feature set.

And if it's not possible for DynamoDB to solve this, then the Amplify / AppSync team should start considering built-in support for other major database players such as MongoDB, MySQL, Postgres, etc. so they are not being held down by a half-baked database that is backing the entire thing, but when that is considered, I am sure it looks way more interesting to just figure out a solution to counts and other minor limitations DynamoDB currently has.

@undefobj What I would like to see to support this type of feature is to utilise kinesis firehose and glue to send data in Redshift or S3 Parquet. Then I could connect to Athena as a serverless analytics system, query Redshift for a non-serverless solution, or have another lambda pick up the objects from S3 and send them into Postgres. Amplify is well placed here as the api category can pick up on schema changes to re-run glue, amplify can make life easier creating a new firehose for each table, and setup the lambdas to put dynamo stream data into firehose. I realise its a big ask, but the question can be generalised into "how can amplify better support analysis workloads"

I see a smaller and faster win on just providing an aggregation template lambda to deal with uncomplicated counters. I would not like aggregation counters enabled by default, as I don't want to pay for what I don't use.

@undefobj What I would like to see to support this type of feature is to utilise kinesis firehose and glue to send data in Redshift or S3 Parquet. Then I could connect to Athena as a serverless analytics system, query Redshift for a non-serverless solution, or have another lambda pick up the objects from S3 and send them into Postgres. Amplify is well placed here as the api category can pick up on schema changes to re-run glue, amplify can make life easier creating a new firehose for each table, and setup the lambdas to put dynamo stream data into firehose. I realise its a big ask, but the question can be generalised into "how can amplify better support analysis workloads"

We do support Kinesis Streams in the Analytics category today: https://aws-amplify.github.io/docs/js/analytics#using-amazon-kinesis
I have to check on the PR for supporting Firehose but I know it's in progress.

This seems to be an independent ask of the issue in this thread though as this is related to ingest. Total counts in a decoupled systems wouldn't be accurate on the pagination against DynamoDB as you'd run into consistency issues. For the analytics requests I think this would be a good ask in a separate Feature Request issue.

I wasn't as interested in strong consistency or using it for pagination, but the total count calculation and getting aggregations in general. The ask seems to be about bringing other database system capabilities into amplify, which is where ddb stream -> firehose would come in handy as a building block.

This got sidetracked by different ideas, but I'd like to know if we can expect total counts for at least Elasticsearch based queries soon?

@undefobj $ctx.result.hits.total is in the resolver response template already, so there should be very little code needed to get this wired in all the way. This would help with a decent workaround for overall total counts for now. It seems doing this against normal list queries against DynamoDB is not anything to expect soon.

It seems like #2600 made #2602 happen, so here's to hoping that PR gets accepted fast.

@houmark isn't that PR related to @searchable attribute? I want to know how many items are in a collection in total.. don't know if that PR solves this issue.

For example, a user has many books in their library, and I want to show on their profile page that how many books are there in their library. I don't want to load all of their books, just to count it. It might be thousands.

Yeah, I first thought that PR would solve it due to the original code that was halfway baked in at the time. But that PR changed to just show the total of the returned items which is more or less useless because it's very easy to count the amount of results returned client side.

I don't think the Amplify team is working on a total results value due to the limitations and complications of passing through @auth.

@houmark I wouldn't call it "useless" since if you're getting paginated results say 10 results at a time, you might want to get the total hits (if you have more than 10 results) to display information like the number of pages on your UI.

@kaustavghosh06 But how can I do that? I get 10 when I limit 10 in the query. If I limit 20, total is 20? If I am on the "last page" and limit 20 but there are only 5 results left I get 5, but all of that I can do with result.whatever.items.length client-side.

If I try to pull out 1000 then I hit another limitation which is the result set being too large (don't remember the exact error, but it errors, and if I query a lot but not too much to hit the limit, then it takes, of course, many seconds for result which in a UI leads to racing situations, and I cannot cancel queries due to another limitation, which has a PR but that PR has been stuck now for a good long time). Anyways, If I have 3k items, then it will still give me 1000 in total, so that does not really help me for pagination does it?

Am I missing something here?

@houmark Do you have @auth on @searchable?

I'll speak to @SwaySway about this behavior.
I would expect the total hits to be available to the client for pagination purposes.

That was what I expected also, but this changed in the PR because it would leak data about the total results if @auth is filtering out results due to "access control".

See his comment here: https://github.com/aws-amplify/amplify-cli/pull/2602#issuecomment-543844821

For us, first of all we would not have a leak because we don't have advanced access control in our models, so we would get the right amount of results, but even if we did, I would have liked to at least have the flexibility to allow "leaking" and getting the result as a developer. This could come with a warning by Amplify, but as there's no alternative to get totals in any type of way in Amplify (besides maybe writing and maintaining resolvers ourselves which seems somewhat a bad path) it would be nice to have some indicator to be able to show in the UI (for example "around 2.604 results").

Of course, the optimal solution would be that @auth hooked on to the result and added in the total after filtering, but I understand that's a major undertaking and would potentially slow down all queries.

I'm flabbergasted. How can a database system have a support for pagination, but give no information regarding the total results? How do you make the pagination UI then? Sure you can show "Next" until there isn't a nextToken anymore or do infinite scrolling, but those don't work for everyone. I almost cant believe it that DynamoDB can't give you a simple count.

Too bad i wasn't aware of that limitation when the project had started. Looks like I have to switch over to RDBS like Aurora and do all the hard work of custom resolvers.

@psporysz dynamodb doesn’t prevent this UI pattern, it only requires that you do additional work. You can setup a streams trigger to keep a record count yourself. You can then fetch the count alongside the fetch for data to display a total count.

Or, you can sort the items with a monotonically increasing key and have a stream trigger insert a whole number count on the item, this would then allow a skip and limit pattern on a secondary index.

Unless there are other significant reasons, DynamoDB can still handle many patterns with a bit more work.

Or, you can sort the items with a monotonically increasing key and have a stream trigger insert a whole number count on the item, this would then allow a skip and limit pattern on a secondary index.

Let's say i have a secondary index for date field and I would like to know the number of items in between some date range. Can you explain how to apply above technique in this case?

@psporysz dynamodb doesn’t prevent this UI pattern, it only requires that you do additional work. You can setup a streams trigger to keep a record count yourself. You can then fetch the count alongside the fetch for data to display a total count.

Or, you can sort the items with a monotonically increasing key and have a stream trigger insert a whole number count on the item, this would then allow a skip and limit pattern on a secondary index.

Unless there are other significant reasons, DynamoDB can still handle many patterns with a bit more work.

What happens with 2 simultaneous posts?
Is this safe under DynamoDB?

Can anyone just give an example of how to get the the total hits from elastic search?

You can't get totals from Elasticsearch, Amplify/AppSync strips that information as the @auth directive can be filtering the final result of items if the user does not have access.

Effectively, at this time, there's no out of the box way to get even close to a precise count of a query result.

Is there any update on this issue? I'm just trying to get the total count on ElasticSearch using @searchable, but the service is not returning the total count. It just returns the paginated total count. If I send a no limit, then the total count becomes 10, since it seems to be the default limit... I'm just trying to count the number of items hit by some filter...

In _AppSync Console_, I've tried to add a total in the schema

type ModelUserConnection {
    items: [User]
    nextToken: String
    total: Int
}

And attach a Resolver to the ModelUserConnection -> total

In "Configure the request mapping template." section

{
    "version" : "2017-02-28",
    "operation" : "Scan",
    "filter" : {
        "expression" : "#return( $util.toJson($ctx.source.scannedCount) )"
    }
}

In "Configure the response mapping template." section

$util.toJson($ctx.result)

then I can query the total

query userList{
  listUsers{
    total
  }
}

result would be

{
  "data": {
    "listUsers": {
      "total": 6
    }
  }
}

I'm new to Amplify and AppSync, so not very sure whether this is a correct method to do this.

Reference:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.Count
https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html#aws-appsync-resolver-mapping-template-reference-dynamodb-filter
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.ScanFilter.html

I've tried to add a total in the schema

type User @model {
  id: ID!
  name: String!
  total: Int
}

In _AppSync Console_, attach a Resolver to the ModelUserConnection -> total

In "Configure the request mapping template." section

{
    "version" : "2017-02-28",
    "operation" : "Scan",
    "filter" : {
        "expression" : "#return( $util.toJson($ctx.source.scannedCount) )"
    }
}

In "Configure the response mapping template." section

$util.toJson($ctx.result)

then I can query the total

query userList{
  listUsers{
    total
  }
}

result would be

{
  "data": {
    "listUsers": {
      "total": 6
    }
  }
}

I'm new to Amplify and AppSync, so not very sure whether this is a correct method to do this.

Reference:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.Count
https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html#aws-appsync-resolver-mapping-template-reference-dynamodb-filter
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LegacyConditionalParameters.ScanFilter.html

It is a solution yes. But a scan operation is expensive ans slow. If you have 6 items, its okay. But if you have 100k its not good to use this option.

@davidbiller
Thanks for the reply. In your opinion, what is the good option? Thanks!

@danielcai1987 scan is limited to reading 1MB of stored data at a time, if you ever expect this table to hold more data, the above solution will not work

I can't believe this is still unresolved

This is a technical limitation. Don't think anything's gonna happen here anytime soon.

Anything on this?

Yeah.

Any kind of app will require this unless the solution is for playground only

It's possible to have scannedCount and count as result from the GraphQL query

read here

Thanks. I wonder why is there no official documentation about this.

Thanks. I wonder why is there no official documentation about this.

Because the 1 MB limit. So it’s not really a Option.

I hope that this technical limit from DynamoDB gets solved, but modifying the Graphql scheme is not an option for me, in the meanwhile I've created a new DynamoDB table with the owner and count fields, this table is updated vía lambda trigger after each insert/delete in the other DynamoDB table. After that when I need to know the total count by owner, I do in parallel the query to the counter table and to the data from the other DynamoDB table (limit of 10 results), with this I have isolated the counts by owner without to do a full scan and I have the total pages and total elements available for a good UX experience.

P.D: You could use the @function in the Graphql scheme to do that.

Maybe this solution could helps to you.

Just finished reading through this steam of comments. Many of you guys have done a ton of research, thank you! Really saved me a lot of time!

I'm gonna join the choir here, and ask that this issues is being recognised soon. It's really a pain to not have this functionality.

I've chosen to follow the path with a separate table where i save the aggregated (incremented) results base of the triggers. Not perfect, and dosen't solve the totalCount with filters like others have mentioned.

I've tried the @ricsirigu suggestion here but when limit = 10 i have:

{
  items: [10 items],
  nextToken: "asdasdasd",
  scannedCount: 10,
  count: null
}

The length of my result set (without filtering) is about 200 rows and the full table cardinality is >200k rows.

¿Any suggestion?

My query is based on a Secondary Index named "byMerchant" and the query is "transactionsbyMerchant"

My schema.graphl

type ModelTransactionConnection {
  items: [Transaction]
  nextToken: String
  scannedCount
  count: Int
}

type Transaction @model 
@key(name: "byMerchant", fields: ["merchant"], queryField: "transactionsByMerchant")
{
  id: ID!
  merchant: ID
  ...
}

@Hernanm0g limit is applied before filter in DynamoDB, this is the opposite from other databases. Using the blog post to pagination will not work. https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html

Some more information. DynamoDB will only ever read 1MB of data from disk from a query or scan. You cannot find the total count by reading the items from disk. If this were not the case, you could potentially ask DynamoDB to read 1PB of data and degrade performance for other users.

Counts are a type of database aggregation that you must manage manually with DynamoDB. Consider using a table stream lambda to manage the counts manually.

I hope that this technical limit from DynamoDB gets solved, but modifying the Graphql scheme is not an option for me, in the meanwhile I've created a new DynamoDB table with the owner and count fields, this table is updated vía lambda trigger after each insert/delete in the other DynamoDB table. After that when I need to know the total count by owner, I do in parallel the query to the counter table and to the data from the other DynamoDB table (limit of 10 results), with this I have isolated the counts by owner without to do a full scan and I have the total pages and total elements available for a good UX experience.

P.D: You could use the @function in the Graphql scheme to do that.

Maybe this solution could helps to you.

Glad to see a lot of research has already been done on this issue, really hope to have it resolved soon. In the meantime, this solution you provided really seem ideal for my current use case, however I am quite new to aws. If it's not much trouble could you further elaborate on the hint you provided here as to how to make a trigger out of the @function directive which I presume will be updating the count table by listening to the insert&delete event in the source table.

Is this already considered by any AWS team? Some words about it will give us hope to this limitation. Thanks!

I've been thinking about this one as its an area in our product development we are now getting to. I'd like to warm this up with a few options that hopefully might lead to something. Like a lot of others I'd just assumed that this would be trivial, a fault of my not understanding document databases well enough I guess. Its a bit of a trap to fall into when so far down the line of developing, try telling your client/manager that you need 5 days to get out the total records for something...its not a good look.

One issue with this whole thread seems to be the assumption that whatever the solution it has to cope with zillions of records when in fact a lot of apps might only be counting in the 100's or 1000's in some tables so I don't think there needs to be a one size fits all solution. I get that some of those solutions could cause some big issues if mis-used but with some knowledge of any constraints or suggested working parameters that should be safely navigated.

I'll start off with the smallest (and some might say naive) solution first and build up to the bigger ones. Remember, there isn't a one size fits all solution. My focus is on trying to make it easy to add this functionality with minimal hassle from the "developer building out the schema" point of view.

Using a scan to count records

If you don't have many records (< 1000 or so) you might be able to get away with using the scannedCount field as described here https://www.riccardosirigu.com/blog/aws-amplify-dynamodb-total-count-graphql-queries/ although it can "top out" when it hits its 1MB result set limit and so report a false total. Ideally it would be nice to know if there are more that could have been returned or if that is the total amount i.e. was there another page of results possible? That would at least give you the option of saying "record 1 of at least 246+" or something if we can't get an exact amount back.

Another option here is to do a full paged scan if I opt into it so I can get a "true total". This would loop through all the pages of results for a scan and return the total. I keep hearing that scans are expensive but again if I've only got a relatively small number of records that might be a hit I'm happy taking for now and just swallow the hit and push off doing it "properly for epic scale" in the future when it becomes a happy problem (ie we've grown so much we need to change it).

I can see this being a new directive that could be added to a model, something like:

type Posts @model {
    title: String!
    content: String!
    author: String!
    published: Boolean!
    totalPosts: Int! @scannedCount
}

This new @scannedCount directive would automatically do a scans of all the records in the Post table that could page through all the results until it finishes (thus masking from us the issues with the 1MB returned data limit). Used as shown above it would just give the total of all records for that table.

Now if we ignore the obvious performance traps that this directive sets up for us (we will discus those next) I think we can all agree that from a users point of view that looks like a simple and clean way of getting what we want, its a super quick way to get a total value out, problem solved from the users point of view initially. For fast prototyping or "good enough" features that might do me.

So now on to the problems this will cause or that need solving. Luckily now we have a directive we can add to it with some attributes to give us some finer grained control or indeed we could create multiple directives to give us the following functionality.

A quick note on Lambda functions, we are going to need some generating, would it be worth while if these are generated and stored in the /backoffice/functions/ folder as user generated functions are. That way should a user want to extend the function with additional logic they can. The functions this directive might generate should be all nicely stubbed out with pointers on where and how to add additional logic and would need pushing to deploy like any other user generated function. This directive shouldn't hide all the magic in the cloud and VTL templates, it should do the minimum to make the common case go a way and then give the user an extension point to add additional logic as their needs demand. This might mean that removing the directive from the schema won't delete the resources from the cloud automagically if you've edited them, I'd be happy with a warning about "Some functions appear to have been edited so you will need to manually delete the following or (A) delete all anyway".

Too many paged scans

If left unchecked this could get out of hand with the number of scanned pages growing over time so as a fail safe we should have a default number of pages that this will loop over before returning. The default can be agreed later but if the user wants to opt-in to setting it then they could in the directive like so:

totalPosts: Int! @scannedCount( maxPageDepth: 5 )

The maxPageDepth would be an optional param to set how many pages you are happy to do to get that total to stop it all grinding to a halt in the future (but hey we opted into this solution, its not ideal its pragmatic and we know the trade offs).

The trouble with the above is the same as the scannedCount solution from the blog post, it doesn't tell you if it reached the end of the pages or not, so you've now way of knowing if the reported number of results is the total amount or just the amount it got before hitting maxPageDepth. But it might be good enough for the right sized app and it sure is simple to do from a users point of view. As long as the limitations are documented they users should be able to make an informed choice.

Additionally maybe a trigger in CloudWatch could be created as part of using this directive to ping a notification when a field is taking too long to rescan? A nice bit of Developer Experience and another way to introduce a new feature to those of us still learning our way around what AWS has to offer.

Total results for filters, sort keys or conditions.

I want something other that the total count, I want a count of some filter or other. This ones a bit trickier but could we allow filters in the directive?

totalPosts: Int! @scannedCount
totalPublishedPosts: Int! @scannedCount( filter: "published=true") #something like that anyway

That would work but its not dynamic, I can't pass in a filter at runtime but it would be good for various counters/stats that could be handy to have. I wonder if we could generate a GraphQL query that would allow you to pass a filter through to this directive/resolver (the lines become blurred as to which this now is at this point, guessing its a resolver under the hood but then I don't know enough to know what a resolver does under the hood as yet so my terms might be wrong here, feel free to enlighten me).

Sort keys could be passed through too maybe to give you a total for a subset in a similar manner, its not my strong point so this part of the ideas is hazy on purpose but here to get you thinking.

Another issue that @houmark raises repeatedly is that the number of results that match a filter can only be worked out if we scan all the records and also it doesn't work if we have a limit. This @scannedCount would do both I think in its own brute force way.

Caching.

Do we really want or need to look this total up every time? If its a small number of records then why not, it wouldn't be all that slow nor costly (again we are talking about relatively small number of records here not eBay products or Uber rides size apps) and we could cache it on the frontend if needed.

But if you want to opt in to caching we could do it one of two ways that could be set via the directive at the expense of another table and a function.

You could do this with a simple boolean or enum (to allow for more options in the future):

totalPosts: Int! @scannedCount( cached: true )

I'm not 100% sure of the internal here so I'm going to propose what I think it could do with my understanding of the options, there might be other better ways but this should give the gist. We need to store the value somewhere and then pull it back out.

One way would be to do this under the hood:

  1. Spin up a new dynamodb table to store all the totals in, one row per requested "scannedCount" directive in the schema. If another @scannedCount directive has already created this table we can just re-use it? Along with storing the total would be also store some other fields like model name, target field, createdAt and updatedAt.
  2. Then we automatically generate a Lambda function that would invalid or recalculate the value for the cache. There are a few ways to do this that I'll discus below.

Recalculate the cached value for any CRUD operation

This version uses a Lambda trigger function using DynamoDB Streams (one of the options when creating a function via the CLI so a known thing) that hooks into this models table updates and will rescan the table whenever a record is added/removed. This is a bit brute force but it would work.

The trouble with his method is it could cause needless thrashing of the tables when doing an import, we don't need that number generated on every entry if we are adding 100's or 1000's of records. I might be that you can live with the hit if the numbers are small but an option to disable the caching while doing an import but still trigger it to do a final count at the end would be good. I sadly don't know enough of how to do that nicely. You could add a "disabled" boolean attribute to the directive maybe but that would need another push to get it up to disable it and then how to get it to recalculate after enabling it again without just "touching" an existing record afterwards? That might be enough?

Invalid the cache on any CRUD operation, recalc when requested if needed

totalPosts: Int! @scannedCount( invalidOnCRUD: true )   # open to suggestions on any of these names btw

Another option is similar to the above except we don't do the rescan when the record changes we just null the value in the cache table. Then our resolver that gets the total for that field will do the rescan for us if the value is null. Depending on the data model and workflow of the app this might be a better fit.

My only worry here is that we separate out the two bits of cache work to two separate areas, the function to clear the cache and the resolver to build the cache.

Generate the totals on a recurring timespan

An alternative if we are ok with the totals potentially not being 100% accurate at any given time is to have a timed Function do the rescan repeatedly after a set timespan. Again this could be set via the directive:

totalPosts: Int! @scannedCount( refreshCountInSeconds: 300 ) # regenerate every 5 minutes

I don't know the best practise for passing timespans around so that is up for debate but hopefully you get the idea.

How to get the value from the cache into our model?

How we get that cached value out into that model though I'm hazy on, I guess we just have a resolver generated which would go get the value from the "cache table" but better minds than mine will know how best to do that.

Rather than having these fields on the model themselves do we need a PostTotals model generated instead that can store all these top line numbers on and possibly pull these from the cache table via a resolver or when generated by amplify codegen etc? In that case does the directive move up to the whole table rather than the field level?

type Posts 
    @model 
    @scannedCount( fieldName: "totalPosts" ) {
    title: String!
    content: String!
    author: String!
    published: Boolean!
}

Running totals for larger record sets

What if we out grow the @scannedCount directive above, what options do we have? Scanning is too slow due to the number of records we've got. This might be where a @runningTotal directive might be the answer.

It has been suggested above that you can keep a rolling count of various totals by hooking into the "DynamoDB stream" which will fire an event every time you add/update/delete a record in DynamoDB. We could then use these to increment or decrease a totals counter.

This solution though has some instant problems:

  1. Its not all that user friendly to tell developers they have to do all this additional work for something that should in a lot of minds be "baked in" - that is not to say that Amazon should be expected to do all the work and I'm sure that it is doable in a morning or so to wire something together but it just kills productivity, adds doubt to "have we made the right choice here?" and Amplify is meant to be about speeding up the gluing together of these services to let you focus on the "building an amazing app" rather than becoming an expert in yet more Amazon services and DevOps from day one. Just testing that setup can take a while...and they how much testing do you need before you can truly "trust it" and sleep at night I wonder? Better I think that Amplify make some of pain go away for the obvious use cases and then give users with edge cases place to start from.
  2. In works for the top line numbers but not so much for edge cases - again these could be handled by the user but the boiler plate for this could be put in place and the functions for it stored in the solution so they can be edited and expanded on.

I'd like to see a directive that could wire all this up for me from the schema, this would spin up the functions needed (stored locally for editing) and store the values in a separate table similar to the cache table mentioned earlier if not the same table but different records so we don't clash? Lambda trigger functions hooked into DynamoDB streams would do the adding/subtracting. As these are built to run all the time we won't have the same potential scaling/performance problems we had with the @scannedCount directive.

type Posts 
    @model 
    @runningTotal( fieldName: "totalPosts" ) {
    title: String!
    content: String!
    author: String!
    published: Boolean!
}

Now what about that idea to have multiple counts on a table? Could we do this:

type Posts 
    @model 
    @runningTotal( fieldName: "totalPosts" ) 
    @runningTotal( fieldName: "totalPublishedPosts", filter: "published=true" ) {
    title: String!
    content: String!
    author: String!
    published: Boolean!
}

I don't see why not. Need more power? They you should be able to go edit the function code or copy it and now you've got a great starting point to adding in a few more lines to get you more counters as needed. Easy of development continues and I think we are still using all the tech that Amplify already has and uses so not a massive stretch of knowledge or comfort area?

I want more POWER!

Getting into wish list land with this last one, suggest the above gets done before this one gets looked at but...

Final option is something similar to what @RossWilliams suggested above that of "going large" and streaming the table data from dynamo into a Relational database so you can do full on SQL magic on that. This one currently is beyond my pay grade, I laughed when I first read Ross' comment and it sounded crazy complicated but after a bit of reading I can see how it all might stitch together...but again...should someone have to do all that to get some stats or counters out of their data? The answer might simply be "you are using the wrong database in the first place!" and I should go with a relational database from the off...but that isn't how Amplify is marketed or sold to me!

The basic idea is that by hooking into a DynamoDB Stream we can push record changes to a relational database and then from there we can pull out whatever data SQL will allow us to do. Sounds excessive but if its what we've got then lets use it. Could another directive handle this one?

type Posts 
    @model 
    @streamToSQL {  # Guess this would be RedShift?
    title: String!
    content: String!
    author: String!
    published: Boolean!
}

This would push any CRUD operations into a relational database, I guess you would need to add this database first as a resource via the CLI amplify add sql. It would create a table name the same as the model (could allow you to over-write this in the directive).

What to do about relations and @connections though. In theory it could include those and I guess that would be the dream.

This does raise the idea that do we need a @sql resolver similar to the @http resolver so we could quickly get the results of a sql query into our GraphQL? I'll leave that one for another issue I think ;)

What about auth?

@houmark raises the tricky point about authentication earlier and to be honest I don't know. I'm not all that sure if this is an issue with passing @auth to elastic search or just within the api. I'm hoping its an elastic search thing which would mean that the above solutions should work with the possible exception of the relational database idea, I can see that needing a lot more thought.

It should be possible to apply @auth to the model or the individual fields like normal which should cover it for the @scannedCount and @runningCount directives. Alternatively if we use a separate table like PostTotals a discussed above then @auth could be applied maybe to that? That should re-use the logic already in place I think?

Summing up

Sorry this is so long, it wasn't meant to be, its a tough problem and I've been giving it a lot of thought as I hope you can tell. I'm hampered though by only knowing some of what is needed to know the best route but hopefully this adds to the conversation and brings a users point of view to how I'd personally like it to roughly work and what constraints I might be happy living with to get some of the functionality I and others say we need.

I've tried really hard to leverage what Amplify already has or uses so that it shouldn't be a massive stretch of the imagination or skillset to get it to do what it needs to.

Lets keep the conversation going, what have the Amplify devs come up with in the time this has been open, where might it go and how can we get there (and ideally an idea of when)?

tl;dr; - You should create a Lambda resolver for this field and share your code for others to start from. I don't see an automated solutions fitting in well to the project without significant effort and problems to overcome

@PeteDuncanson you are an excellent writer and I thoroughly enjoyed reading this. My thoughts and responses are a lot more scattered, but let me try to explain my crazy ideas and somewhat address your proposals. I agree the proposals I have made in previous comments are not ideal, but still believe they are the best we have given the path Amplify has taken.

  1. I too have a love / hate relationship with DynamoDB. If I was designing Amplify I would not choose it for many of the reasons mentioned by other commenters in the issue. It is designed as a replacement for multi-node systems, and therefore has missing convenience features developers have come to expect. My main issue is the cost for developers to learn it and develop in-house patterns to use it is too high. I'm going to guess most Amplify projects plan to store under 100GB of data. At these levels the cost of an RDS solution is smaller than my developer costs. RDS will also have better performance on these lower utilisation levels. Alex DeBrie, author of The DynamoDB Book, has similar opinions. Upsides are that DynamoDB is in the free tier for developers to get started, and that has to help with adoption.

    1. Proponents of DynamoDB can be pretty condescending in their language. I've seen AWS employees talk about SQL users as getting left behind in the past, this is the future you are going to have to learn it anyways, when they teach customers they get it in under a week, if you only store 100GB you aren't ambitious or you have a hobby rather than a business. If in my comments I suggested DynamoDB is an any way easy, I'm sorry.

    2. IMO the main weakness of Amplify as compared to Firebase is dealing with DynamoDB instead of Firestore.

  2. The limitations are not just from DynamoDB, but also AppSync's templates. If you are going to rely on AppSync to query DynamoDB, you can't page through results. You get a single API call to DynamoDB and thats it. So any solution for counts (where tables are over 1MB) are going to need Lambda resolvers.
  3. Implementing the "simple" solution in Lambda is possible today. Attach a function resolver to a "tableCount" property, have the function read the context of the request, do an appropriate series of scans / queries, and return the result. But there are many reasons to not provide this "out-of-the-box".

    1. The support issues. Imagine a "cold" table with 20MB of data. Average cold scan time is 60ms. Suddenly people start asking why such a simple GraphQL request takes over a second. I not only think this is realistic, but likely something that is only exposed to developers once in production. Pre-release there might be only 1MB in a table, but there may be support issues of "my API is slowing down 20% a day".

    2. Your suggestion of a directive with a "max scanned count" is a good way to limit the downsides of a growing table, but gives me bad vibes from a development experience perspective. Especially since a filter could grossly undercount the true value depending on its distribution in the table. In the worst case, scanning the first 5MB might return zero results, and yet the true count could be greater than zero. Returning zero will get customer complaints if it is not accurate.

    3. Caching is a set of trade-offs that are hard to please everyone. Again also a support issues. "My users just added an item and now they can't see what they wrote because the paging is off". Caching also has a complexity explosion with authorisation. There might need to be per-user aggregate counts. Caching with filters causes another factorial explosion of combinations. The size of the undertaking to cover off the edge cases makes me think, if you are going to keep your data in DynamoDB, that a per-app solution is best. I don't know if there is much boilerplate that can be written that doesn't make certain wrong assumptions for an individual app.

    4. The pain of learning to read a DynamoDB stream and manually update counts is largely a one-time cost. I agree that there could be better boilerplate code and cfn templates to assist here. I'd go even further and say that the Javascript / Typescript Document Client is severely lacking in developer ergonomics. But once learned, on a fresh project it can be re-created in under an hour.

  4. One of the reasons I suggested using another system for total counts with filters is that DynamoDB is really best used when you can minimise the variability in the requests. If you know the access patterns then creating aggregates is annoying, but not weeks of effort. But the earlier commenters were trying to solve a solution with high variability in requests - accepting arbitrary filters and getting aggregates there.

    1. Side note - I don't think Amplify should have filters, or they should be opt-in and not enabled by default. There are too many issues created in GitHub where users treat filters as a SQL WHERE clause.

  5. Arbitrary filters and total counts can be done in Elasticsearch today, and there is a directive to get data steamed into Elasticsearch today. If there is a need for arbitrary filters and total counts, this is my default recommendation. There have been two issues pointed out.

    1. The counts may not be accurate because of authorisation rules. For my own work I'm not concerned about this because I don't have a project with 3rd party clients. I control the requests and I don't rely on authorisation rules to provide filtering, I only use authorisation rules to prevent bad actors.



      1. Auth-as-filter is a general bad pattern that should be stated better in the documentation



    2. Amplify doesn't expose the total hits count. First, users can write their own AppSync templates to expose this field. For offset pagination, I've also submitted a PR, #5098 to better support these users. And my memory is foggy here, but I believe the total count was added by swaysway a few months ago.

  6. Reporting and analytics within DynamoDB is extremely hard for "exploratory" work. With RDS I could have a secondary node to run my arbitrary SQL without impacting users, but DynamoDB doesn't have that nicety. Having Firehose, Glue, and Athena is a big help to me, but keeping it maintained is not fun. I use streams to aggregate data for known reporting metrics. But some days I get some very interesting requests to look into patterns, or see if a metric is useful to add into the reporting suite. Streams and Glue gives me my data as parquet files in S3 so I can then run arbitrary queries on data that is relatively fresh.
  7. If DynamoDB Streams -> Firehose -> Glue were a built-in pattern in Amplify, moving data to other data source would be a lot easier. When users run into problems such as needing aggregates, timeseries formats, graphs, and reporting they won't feel so stuck and frustrated. Its an overwhelming complexity of tools to manage and configure on its own, which is why I think Amplify needs to bring it into the CLI.
  8. At a higher level, I'd like to see Amplify documentation encourage users to branch out beyond the CLI tool and configure more with cfn, Lambdas, or SAM tools. A set of nice sample Lambdas for how to handle common use cases, along with the cfn templates needed would both solve these user issues and help as a teaching tool for further custom work. Relying on third parties to publish blog posts has so far not yielded much fruit.

At an even higher level, imagine a developer (e.g. me, or a start-up – this is the target audience of AWS Amplify, right?) who creates an app which is basically a social network. There people can follow each other, create posts and like and write comments for those posts. This is pretty basic scenario/logic. And for this kind of apps displaying follower, like and comment count is quite crucial. Of course, at the launch the numbers are not so big and fall under 1Mb, but then with growth – the more users, the bigger those numbers are. The developer would like to choose AWS Amplify because of the infrastructure, ease, integrity and velocity mainly which are so needed at the early stages of new projects (Free Tier is also much appreciated). But suddenly he has a dilemma, why doesn't such a robust tool support such a basic feature? Should he fight these limitations (and then maintaining them) or just search for a tool which just supports that or at least where it could be done without hacking?

@ilyagru this scenario is easily solved with a pipeline resolver or dynamodb stream. After adding a like, update the like count on the post. Where it gets tricky is paginated (even more difficult with filtererd) results

Does anyone on the AWS team even read this? So far I am missing an official statement on this topic. We have currently spent a lot of money on development and we are still faced with such a trivial problem.

I think the Amplify team only has two options. Either solve the problem or find an alternative to DynamoDB. It's not ok that nothing has happened on this subject for 13 months. Not even a responsible person has been assigned to this issue.

@Amplify Team, @aws-amplify-ops Please give us a statement.

@andre-scheffer (and others) this is good feedback and myself and @kaustavghosh06 have responded on this thread several times. We are both members of the Amplify team. The problem is non-trivial as others have noted in their descriptions above of potential implementations, all which we are looking into in the future which was noted earlier. As such we'll keep this thread open to continue to measure interest as well as potential thoughts on solutions such as those outlined by @PeteDuncanson, @duwerq , and @RossWilliams. Having further open dialogue here on requirements, DX options, and suggestions for implementation are welcome.

@undefobj, thank you for your answer.

The problem is non-trivial

I definitely believe you, as far as your part is concerned.

As such we'll keep this thread open to continue to measure interest as well as potential thoughts on solutions such as those outlined by @PeteDuncanson, @duwerq , and @RossWilliams.

  1. What time frame do you set yourself to collect enough feedback? Don't you think 16 months is enough?
  2. For me amplify is a blackbox. So I just want use amplify as a customer of AWS and I'm not interessted in the solutions inside the blackbox. Even if they are technically sophisticated but they are not in my business focus. Anyway, counting and also sorting and other stuff like dynamic filtering is just a regular requirement for a modern application. Don't you agree?

and suggestions for implementation are welcome.

Did I get something wrong and Amplify is an open source project? We as paying customers of AWS should only supply the requirements. Sorry if that sounds arrogant, But this in an Amazon project and not any little community project.

So I have to ask again. What is your timeline on this issue? What is the Roadmap? Why is nobody assigned to this issue in GitHub?

Did I get something wrong and Amplify is an open source project? We as paying customers of AWS should only supply the requirements. Sorry if that sounds arrogant, But this in an Amazon project and not any little community project.

It is open source but it is also quite a small team. Each team in Amazon is different and run their own way. Lets be grateful we have a team on this who are engaging and open to ideas and not tar them with the "big evil rich company" brush.

Yes at the end of the day this tech exists to get us to sign up and spend money with them but then so are the apps we build with it. In the meantime they are trying to build the best stack they can but there are pressures to doing that that need some consideration no doubt.

For my part I'm slowly getting up to speed on the problem some more and seeing if I can't come up with a @runningCount directive of my own, its a steep learning curve though. Hoping to get this up and running so I can dog food it a bit then offer it up here for discussion.

It is open source but it is also quite a small team

But someone at AWS needs to be responsible for this project? This someone needs to address issues like that. I don't understand why such fundamental problems have existed for 16 months. It's not good to see that there is such a long feedback loop without a final result like: 'Yes, we will fix that' or 'No, we won't implement it'.

open to ideas and not tar them with the "big evil rich company" brush

I fully agree, but it's still AWS with tons of resources. But yeah I stop complaining about it and try to find another solution like using an extra mirror database to do the math.

Hoping to get this up and running so I can dog food it a bit then offer it up here for discussion.

That would be awesome.

Found #2901 referenced in the docs (https://docs.amplify.aws/lib/graphqlapi/query-data/q/platform/js#frequently-asked-questions) so thought I'd link them together for anyone following the breadcrumbs

The thing I don’t understand here is if you don’t like dynamodb or it’s lack of counts, you have an option to use another database. Nader is about to put a video using a Postgres DB. I highly recommend it for anyone not interested in dynamodb https://twitter.com/dabit3/status/1328710735619268608?s=21

+1 @duwerq, there are definitely other things i'd rather have prioritized on their roadmap over this. There are plenty of other good db options.

@RossWilliams could you give an example of how to do a table stream lambda to implement the count. I am very new to amplify but really need this feature. Thanks

@snahammond Here are the steps. Send me an email, LAST_NAME.[email protected] if you get stuck.

At a high level, you are creating a lambda, connecting a lambda to the table change stream, adding permissions to allow the lambda to connect to the stream, and writing a lambda to handle create, update, and delete events. You are also going to create environment variables so you have the table names you need inside the lambda.

  1. Create a new lambda function. You can use the CLI to create it. Have the function depend on your graphQL API. This means your backend-config.json for this lambda should have in its "dependsOn"
{
  "category": "api",
  "resourceName": "XXX_APP_NAME",
  "attributes": ["GraphQLAPIIdOutput", "GraphQLAPIEndpointOutput"]
},
  1. Open the function's audit-cloudformation-template.json file, add an item to "Resources" to attach the lambda to the Dynamo Table:
"DynamoTableXXX_TABLE_NAMEStream": {
    "Type": "AWS::Lambda::EventSourceMapping",
    "Properties": {
        "BatchSize": 20,
        "MaximumBatchingWindowInSeconds": 100,
        "BisectBatchOnFunctionError": true,
        "StartingPosition": "LATEST",
        "Enabled": true,
        "EventSourceArn": {
            "Fn::ImportValue": {
                "Fn::Join": [
                    ":",
                    [
                        {
                            "Ref": "apiXXX_APP_NAMEGraphQLAPIIdOutput"
                        },
                        "GetAtt",
                        "XXX_TABLENAME",
                        "StreamArn"
                    ]
                ]
            }
        },
        "FunctionName": {
            "Fn::GetAtt": [
                "LambdaFunction",
                "Arn"
            ]
        }
    },
    "DependsOn": [
        "LambdaExecutionRole"
    ]
},
   ```
3. Your lambda function policy needs the following statement for permissions. you will need to add permissions for any additional operations you use. Keep the existing permissions for log groups that Amplify generates:
```json
{
    "Effect": "Allow",
    "Action": [
        "dynamodb:DescribeStream",
        "dynamodb:GetRecords",
        "dynamodb:GetShardIterator",
        "dynamodb:ListStreams"
    ],
    "Resource": {
        "Fn::Sub": [
            "arn:aws:dynamodb:${region}:${account}:table/XXX_TABLENAME-${apiId}",
            {
                "region": {
                    "Ref": "AWS::Region"
                },
                "account": {
                    "Ref": "AWS::AccountId"
                },
                                   "apiId": {
                                           "Ref": "apiXXX_APP_NAMEGraphQLAPIIdOutput"
                                   }
            }
        ]
    }
}

You can adjust the batch size and batch window I provided above.

  1. Add an environment variable for any tables you need to have access to inside the Lambda:
"LambdaFunction": {
    "Type": "AWS::Lambda::Function",
    "OtherItems": "existing items here",
    "Properties": {
       "OtherItems": "existing items here",
       "Environment": {
          "Variables": {
          "ENV": {
              "Ref": "env"
          },
          "REGION": {
              "Ref": "AWS::Region"
          },
         "XXX_DYNAMO_TABLE_FOR_UPDATES": {
        "Fn::Join": [
            "-",
            [
                "XXX_DYNAMO_TABLE_NAME",
                {
                    "Ref": "apiXXX_APP_IDGraphQLAPIIdOutput"
                },
                {
                    "Ref": "env"
                }
            ]
        ]
    },
}
  1. Now you have a lambda that can connect to the table stream, here is a very minimum lambda function (forgive any mistakes, i'm writing in github) to read the stream information and update an item (this requires adding "dynamodb:UpdateItem" to the permissions above:
import { DynamoDBStreamHandler, DynamoDBStreamEvent, DynamoDBRecord } from "aws-lambda";
import DynamoDB, { AttributeMap, DocumentClient } from "aws-sdk/clients/dynamodb";

const documentClient = new DocumentClient();
const TABLE_NAME = process.env.XXX_DYNAMO_TABLE_FOR_UPDATES;

export const handler: DynamoDBStreamHandler = async (event: DynamoDBStreamEvent) => {
  for (const record of event.Records) {
    if (record.dynamodb.NewImage && !record.dynamodb.OldImage) {
      const newItem = DynamoDB.Converter.unmarshall(record.dynamodb.NewImage);
      if (newItem.XXX_INTERSTING_FIELD) {
        await documentClient.updateItem({
          TableName: TABLE_NAME,
          Key: {
             table_pk_attribute: XXX_A_PARTITION_KEY,
             table_sk_attribute_if_exists: XXX_AN_OPTIONAL_SORT_KEY
          },
          UpdateExpression: "SET #xxx_my_counter_item = :xxx_my_counter_increase",
          ExpressionAttributeNames: { "#xxx_my_counter_item": "xxx_my_counter_item" },
          ExpressionAttributeValues: { ":xxx_my_counter_increase": newItem.XXX_INTERSTING_FIELD },
        }).promise();
        // TODO: handle failures
     }
    } else if (record.dynamodb.NewImage && record.dynamodb.OldImage) {
       // old and new item, meaning record has been updated. check if you need to increase a count or not
    } else {
      // no new item, meaning record has been deleted. you may need to decrement a count
    }
  }
}
Was this page helpful?
0 / 5 - 0 ratings