Google-cloud-go: spanner: Programmatically Calculating Spanner Max Mutation Set

Created on 8 Jan 2020  Â·  17Comments  Â·  Source: googleapis/google-cloud-go

I'm using the Spanner Go client in an application I'm building. I'm running this application on GKE using the alpine docker image.

Every now and then I get an error message like this

The transaction contains too many mutations. Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected. The total mutation count includes any changes to indexes that the transaction generates. Please reduce the number of writes, or use fewer indexes. (Maximum number: 20000)

I'm trying to make sense of this error message, and I can't find any documentation on it on Google's documentation pages.


I'd like to know how to programmatically calculate the maximum number of objects (e.g. rows) that I can insert given this mutation limit. For example, let's say I have a table with 10 columns and 2 of the columns compose the primary key. Additionally, there are 3 secondary indexes on the table.

As I understand from that error statement, if

  • R (number of rows/objects being inserted)
  • C (number of columns in the table)
  • I (number of secondary indexes on the table)
  • M (max number of mutations)

then

R <= floor((M - I) / C)

which means R <= floor((20000 - 3) / 10). So I should only be able to insert 1,999 objects into the table in one transaction. I know this formula isn't correct though, because I regularly still get this error even using this formula.

Can someone help me understand what the formula is? I don't understand how the row and column multiplicity comes into play with secondary indexes and primary keys and also interleaved tables. What's the formula??

spanner question

Most helpful comment

Someone on the Google team needs to document the effects of the new FK changes. As a client of Spanner, I need to be able to programmatically calculate the number of mutations I can insert, and it would be nice if that was a part of some official Google documentation (for example in the Spanner inserts/mutations doc). It's just too cryptic to try and figure out.

All 17 comments

After some testing, it seems to come out like this:

image

R (number of rows/objects being inserted)
C (number of columns in the table)
I_i (number of secondary indexes on the ith column)

Could someone confirm please?

@jon-whit This is is also how I understand this limit based on the documentation here: https://cloud.google.com/spanner/quotas#limits_for_creating_reading_updating_and_deleting_data and https://cloud.google.com/spanner/quotas#note2

I'll try to confirm it with someone from the backend team.

@olavloite I've tested it pretty thoroughly and I think we're right. It would be nice to get confirmation though.

Thanks!

I'm not on the backend team, but your formula is my understanding too. The mutation operation count of a single insertion is the number of columns, plus the sum of the columns in the secondary indexes.

I have also had an offline confirmation from someone on the backend team that the calculation is correct.

I am wondering about counting mutation of index caused by updating column value. It is not documented clearly.
I think that corresponding index is counted twice as delete and insert.

I am wondering about counting mutation of index caused by updating column value. It is not documented clearly.
I think that corresponding index is counted twice as delete and insert.

@apstndb
I'm not sure I understand your question/comment correctly, but if you update a column that is also part of one secondary index, that will count as two mutations:

  1. One mutation for the column itself
  2. One mutation for the secondary index

If the column you are updating is part of two secondary indexes, updating it will count as three mutations etc.

Ok, I have omitted 1 mutation for the column itself.
My previous hypothesis is that:
When a column value that is part of one secondary index is changed from 1 to 2, index is modified as two mutation(delete index entry from 1 and insert index entry from 2).
So, updating the column value should be three mutation(update column, insert index, delete index).

update column {val: 1} to {val: 2}
- 1 → {reference to actual row}
+ 2 → {reference to actual row}

But when I have examined, I got the inconsistent result.

CREATE TABLE MutationTest(pk INT64, cnt INT64) PRIMARY KEY(pk);
CREATE INDEX MutationTestCnt ON MutationTest(cnt)
INSERT INTO MutationTest(pk, cnt)
                       SELECT pk, 1 AS cnt
                       FROM UNNEST(GENERATE_ARRAY(1, 5000)) AS pk; // 5000 rows inserted
UPDATE MutationTest SET cnt = cnt + 1 WHERE TRUE; // success
DELETE FROM MutationTest WHERE TRUE;

INSERT INTO MutationTest(pk, cnt)
                       SELECT pk, 1 AS cnt
                       FROM UNNEST(GENERATE_ARRAY(1, 5001)) AS pk; // 5001 rows inserted
UPDATE MutationTest SET cnt = cnt + 1 WHERE TRUE;
// This DML statement exceeds the mutation limit for a single transaction (20000). To reduce the mutation count, try a transaction with fewer writes, or use fewer indexes. This can help because the mutation count for an operation is equal to the number of columns it affects. Reducing writes or indexes reduces the number of affected columns, bringing your mutation count under the limit. Alternatively, try a Partioned DML statement using the client libraries or gcloud command-line tool.

This result have shown the mutation is counted as 4 * effected rows.
It is not explainable by previous discussion.

(Note: Honestly, it is examined by DML on Cloud Console but google-cloud-go have feature which execute DML.)

I have examined same test using spanner.Mutation but it seems similar result.
When 5001 rows,

spanner: code = "InvalidArgument", desc = "The transaction contains too many mutations. Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected. The total mutation count includes any changes to indexes that the transaction generates. Please reduce the number of writes, or use fewer indexes. (Maximum number: 20000)"

func main() {
    type MutationTest struct {
        Pk int64 `spanner:"pk"`
        Cnt int64 `spanner:"cnt"`
    }
    ctx := context.Background()
    client, err := spanner.NewClient(ctx, fmt.Sprintf("projects/%s/instances/%s/databases/%s", project, instance, database))
    if err != nil {
        log.Fatal(err)
    }
    _, err = client.ReadWriteTransaction(ctx, func(tc context.Context, transaction *spanner.ReadWriteTransaction) error {
        it := transaction.Read(ctx, table, spanner.AllKeys(), []string{"pk", "cnt"})
        defer it.Stop()
        err := it.Do(func(r *spanner.Row) error {
            var mutationTest MutationTest
            err := r.ToStruct(&mutationTest)
            if err != nil {
                return err
            }

            mutationTest.Cnt += 1

            m, err := spanner.InsertOrUpdateStruct(table, &mutationTest)
            if err != nil {
                return err
            }

            return transaction.BufferWrite([]*spanner.Mutation{m})
        })
        return err
    })
    if err != nil {
        fmt.Println(err)
    }

}

I think that pk column is always treated as updated in both DML and Mutation.

@apstndb Thanks for the extensive testing. I'll have a closer look and see if I can figure out what's going on here. Your assumption regarding the primary key could be it.

@apstndb and @jon-whit
I've had some additional contact with someone from the backend team, and in addition to what @jon-whit wrote in https://github.com/googleapis/google-cloud-go/issues/1721#issuecomment-572273387, the following additionally applies to updates:

  • If a column that is part of a secondary index is updated, the update of that column counts 3 times: Once for the column itself, once for deleting the previous value in the index and once for the new value in the index.

It is also being looked into updating the documentation regarding this.

@olavloite
Thanks for formalizing mutation count for update!
IMO, the counting formula is not intuitive so example-based document is needed.
Moreover, if the commit response contains actual mutation count, it is nice.

I have tested another not-well-documented behaviors:

  • cascading delete is treated as range delete(count once)
  • NULL and NULL_FILTERED

Thanks for the updates everyone! I look forward to the documentation, that'll really help the Spanner community.

Recently, Foreign Keys feature is released.
FK backing index makes counting mutation more difficult!

Someone on the Google team needs to document the effects of the new FK changes. As a client of Spanner, I need to be able to programmatically calculate the number of mutations I can insert, and it would be nice if that was a part of some official Google documentation (for example in the Spanner inserts/mutations doc). It's just too cryptic to try and figure out.

FYI, I am testing about backing index of FK.
https://gist.github.com/apstndb/fb3405d9ed9878de6ce06965a2a6fbb8

Was this page helpful?
0 / 5 - 0 ratings