Scylla: Some data is lost when using scylla alternator put item

Created on 11 Sep 2020  ·  23Comments  ·  Source: scylladb/scylla

Installation details
Scylla version (or git commit hash): Scylla version 4.0.0-0.20200601.d95aa77b6
Cluster size: 9
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS

Hardware details (for performance issues) Delete if unneeded
Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores=10 hyperthreading= memory=50G
Disks: (SSD/HDD, count) 3.2TB NVMe SSD * 1

I use 10 concurrently to write data to scylla (using apisix load balancing to 9 nodes), these data have the same partition key, different clustering key. A total of about 10,000 pieces of data were uploaded, and then it was found that several pieces of data had missing fields.
I turned on the trace log of scylla alternator to track these alternate requests and ensure that the requests sent to the alternate are correct.

Sep 11 13:32:05 node13 scylla[471624]:  [shard 2] alternator-server - Request: PutItem {
Sep 11 13:32:05 node13 scylla[471624]:  [shard 2] alternator-server - Skipping authorization
Sep 11 13:32:05 node13 scylla[471624]:  [shard 2] alternator-executor - put_item {"TableName":"dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4","Item":{"bi":{"S":"dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4.0"},"head":{"B":"AQEqBwAACAAAAAwAAAB1c2VyLnJndy5hY2x7AAAAAgJ1AAAAAwIOAAAAAwAAAGxqeAMAAABsangEA1sAAAABAQAAAAMAAABsangPAAAAAQAAAAMAAABsangFAzIAAAACAgQAAAAAAAAAAwAAAGxqeAAAAAAAAAAAAgIEAAAADwAAAAMAAABsangAAAAAAAAAAAAAAAAAAAAAFQAAAHVzZXIucmd3LmNvbnRlbnRfdHlwZRkAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0ADQAAAHVzZXIucmd3LmV0YWcgAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmURAAAAdXNlci5yZ3cubWFuaWZlc3TiBAAACAbcBAAAAAAgAAAAAAAAAAAAAAYGjwAAAAoKdQAAAAoAAAB6eC1zY3lsbGE5LQAAADdmMWJmOWM4LThlNGQtNGQ0Yy05MTAzLWM4MzM5MWNiODRiMS40MzQxNjcuNC0AAAA3ZjFiZjljOC04ZTRkLTRkNGMtOTEwMy1jODMzOTFjYjg0YjEuNDM0MTY3LjQAAAAAAAAAAAAIAAAAZmlsZTQwMDUAAAAAAAAAAAAAAAAAAAAAAAAAACEAAAAuaDVZc2VKMVZ5bVFuRkZuNGNEZFc1d3pTMk9LRHU2WF8BAAAAAAAAAAAAAAACASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAEQAAAGRlZmF1bHQtcGxhY2VtZW50EQAAAGRlZmF1bHQtcGxhY2VtZW50DwAAAAEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2wxAQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDIBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sMwEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w0AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDUBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sNgEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w3AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDgBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sOQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMgEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMwEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNQcAAAABAAAAawIAAAAxMAEAAABtAQAAADUKAAAAcGFja2V0c2l6ZQEAAAA4AgAAAHJ0AgAAADEwCwAAAHN0cmlwZV91bml0BAAAADQwOTYJAAAAdGVjaG5pcXVlBgAAAGNhdWNoeQIAAAB3dAIAAAAxMAQAAABDWlJTAAAAAAAAAAAUAAAAdXNlci5yZ3cuc291cmNlX3pvbmUEAAAAAAAAABYAAAB1c2VyLnJndy5zdG9yYWdlX2NsYXNzCAAAAFNUQU5EQVJEEwAAAHVzZXIucmd3LngtYW16LWRhdGUgAAAARnJpLCAxMSBTZXAgMjAyMCAwNTozNjo1OSArMDAwMAAfAAAAdXNlci5yZ3cueC1hbXotbWV0YS1zM2NtZC1hdHRyc4UAAAB1aWQ6MC9nbmFtZTpyb290L3VuYW1lOnJvb3QvZ2lkOjAvbW9kZTozMzE4OC9tdGltZToxNTk2NjEzODc5L2F0aW1lOjE1OTk3MjI4MjgvbWQ1OjcyYTBkOThlZjZjN2VlZjY1NDc4NmE1ZDcxNTEwMDZlL2N0aW1lOjE1OTY2MTM4NzkAAAAAAA=="},"obj":{"S":"file4005"},"val":{"B":"CAO+AAAACAAAAGZpbGU0MDA1AAAAAAAAAAAABwN4AAAAAAAAIAAAAAAA+wxbX1khdAogAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmUDAAAAbGp4AwAAAGxqeBgAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0AACAAAAAAAAAAAAAIAAAAU1RBTkRBUkQAAAAAAAAAAAABAQoAAACI//////////8AAAAAAAAAAAAAAAAAAAAAAAAAAA=="}},"ConditionExpression":"attribute_not_exists(bi)"}
Sep 11 13:32:05 node13 scylla[471624]:  [shard 5] alternator-executor - put_item {"TableName":"dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4","Item":{"bi":{"S":"dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4.0"},"head":{"B":"AQEqBwAACAAAAAwAAAB1c2VyLnJndy5hY2x7AAAAAgJ1AAAAAwIOAAAAAwAAAGxqeAMAAABsangEA1sAAAABAQAAAAMAAABsangPAAAAAQAAAAMAAABsangFAzIAAAACAgQAAAAAAAAAAwAAAGxqeAAAAAAAAAAAAgIEAAAADwAAAAMAAABsangAAAAAAAAAAAAAAAAAAAAAFQAAAHVzZXIucmd3LmNvbnRlbnRfdHlwZRkAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0ADQAAAHVzZXIucmd3LmV0YWcgAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmURAAAAdXNlci5yZ3cubWFuaWZlc3TiBAAACAbcBAAAAAAgAAAAAAAAAAAAAAYGjwAAAAoKdQAAAAoAAAB6eC1zY3lsbGE5LQAAADdmMWJmOWM4LThlNGQtNGQ0Yy05MTAzLWM4MzM5MWNiODRiMS40MzQxNjcuNC0AAAA3ZjFiZjljOC04ZTRkLTRkNGMtOTEwMy1jODMzOTFjYjg0YjEuNDM0MTY3LjQAAAAAAAAAAAAIAAAAZmlsZTQwMDUAAAAAAAAAAAAAAAAAAAAAAAAAACEAAAAuaDVZc2VKMVZ5bVFuRkZuNGNEZFc1d3pTMk9LRHU2WF8BAAAAAAAAAAAAAAACASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAEQAAAGRlZmF1bHQtcGxhY2VtZW50EQAAAGRlZmF1bHQtcGxhY2VtZW50DwAAAAEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2wxAQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDIBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sMwEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w0AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDUBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sNgEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w3AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDgBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sOQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMgEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMwEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNQcAAAABAAAAawIAAAAxMAEAAABtAQAAADUKAAAAcGFja2V0c2l6ZQEAAAA4AgAAAHJ0AgAAADEwCwAAAHN0cmlwZV91bml0BAAAADQwOTYJAAAAdGVjaG5pcXVlBgAAAGNhdWNoeQIAAAB3dAIAAAAxMAQAAABDWlJTAAAAAAAAAAAUAAAAdXNlci5yZ3cuc291cmNlX3pvbmUEAAAAAAAAABYAAAB1c2VyLnJndy5zdG9yYWdlX2NsYXNzCAAAAFNUQU5EQVJEEwAAAHVzZXIucmd3LngtYW16LWRhdGUgAAAARnJpLCAxMSBTZXAgMjAyMCAwNTozNjo1OSArMDAwMAAfAAAAdXNlci5yZ3cueC1hbXotbWV0YS1zM2NtZC1hdHRyc4UAAAB1aWQ6MC9nbmFtZTpyb290L3VuYW1lOnJvb3QvZ2lkOjAvbW9kZTozMzE4OC9tdGltZToxNTk2NjEzODc5L2F0aW1lOjE1OTk3MjI4MjgvbWQ1OjcyYTBkOThlZjZjN2VlZjY1NDc4NmE1ZDcxNTEwMDZlL2N0aW1lOjE1OTY2MTM4NzkAAAAAAA=="},"obj":{"S":"file4005"},"val":{"B":"CAO+AAAACAAAAGZpbGU0MDA1AAAAAAAAAAAABwN4AAAAAAAAIAAAAAAA+wxbX1khdAogAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmUDAAAAbGp4AwAAAGxqeBgAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0AACAAAAAAAAAAAAAIAAAAU1RBTkRBUkQAAAAAAAAAAAABAQoAAACI//////////8AAAAAAAAAAAAAAAAAAAAAAAAAAA=="}},"ConditionExpression":"attribute_not_exists(bi)"}
Sep 11 13:32:05 node13 scylla[471624]:  [shard 2] alternator-server - api_handler success case

As you can see in the above log, my item contains four columns: bi, obj, head, and val. When the data is written to scylla, I go to query this data. It is found that the ‘val’ column has all been lost, and the ‘head’ column only has value, and the key is missing.

[root@kcpdeploy ~]# aws dynamodb get-item --table-name "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4" --key '{"bi": {"S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4.0"}, "obj":{"S":"file4005"}}' --endpoint-url http://10.10.10.92:9080
{
    "Item": {
        "": {
            "B": "AQEqBwAACAAAAAwAAAB1c2VyLnJndy5hY2x7AAAAAgJ1AAAAAwIOAAAAAwAAAGxqeAMAAABsangEA1sAAAABAQAAAAMAAABsangPAAAAAQAAAAMAAABsangFAzIAAAACAgQAAAAAAAAAAwAAAGxqeAAAAAAAAAAAAgIEAAAADwAAAAMAAABsangAAAAAAAAAAAAAAAAAAAAAFQAAAHVzZXIucmd3LmNvbnRlbnRfdHlwZRkAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0ADQAAAHVzZXIucmd3LmV0YWcgAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmURAAAAdXNlci5yZ3cubWFuaWZlc3TiBAAACAbcBAAAAAAgAAAAAAAAAAAAAAYGjwAAAAoKdQAAAAoAAAB6eC1zY3lsbGE5LQAAADdmMWJmOWM4LThlNGQtNGQ0Yy05MTAzLWM4MzM5MWNiODRiMS40MzQxNjcuNC0AAAA3ZjFiZjljOC04ZTRkLTRkNGMtOTEwMy1jODMzOTFjYjg0YjEuNDM0MTY3LjQAAAAAAAAAAAAIAAAAZmlsZTQwMDUAAAAAAAAAAAAAAAAAAAAAAAAAACEAAAAuaDVZc2VKMVZ5bVFuRkZuNGNEZFc1d3pTMk9LRHU2WF8BAAAAAAAAAAAAAAACASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAEQAAAGRlZmF1bHQtcGxhY2VtZW50EQAAAGRlZmF1bHQtcGxhY2VtZW50DwAAAAEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2wxAQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDIBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sMwEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w0AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDUBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sNgEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w3AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDgBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sOQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMgEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMwEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNQcAAAABAAAAawIAAAAxMAEAAABtAQAAADUKAAAAcGFja2V0c2l6ZQEAAAA4AgAAAHJ0AgAAADEwCwAAAHN0cmlwZV91bml0BAAAADQwOTYJAAAAdGVjaG5pcXVlBgAAAGNhdWNoeQIAAAB3dAIAAAAxMAQAAABDWlJTAAAAAAAAAAAUAAAAdXNlci5yZ3cuc291cmNlX3pvbmUEAAAAAAAAABYAAAB1c2VyLnJndy5zdG9yYWdlX2NsYXNzCAAAAFNUQU5EQVJEEwAAAHVzZXIucmd3LngtYW16LWRhdGUgAAAARnJpLCAxMSBTZXAgMjAyMCAwNTozNjo1OSArMDAwMAAfAAAAdXNlci5yZ3cueC1hbXotbWV0YS1zM2NtZC1hdHRyc4UAAAB1aWQ6MC9nbmFtZTpyb290L3VuYW1lOnJvb3QvZ2lkOjAvbW9kZTozMzE4OC9tdGltZToxNTk2NjEzODc5L2F0aW1lOjE1OTk3MjI4MjgvbWQ1OjcyYTBkOThlZjZjN2VlZjY1NDc4NmE1ZDcxNTEwMDZlL2N0aW1lOjE1OTY2MTM4NzkAAAAAAA=="
        }, 
        "obj": {
            "S": "file4005"
        }, 
        "bi": {
            "S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4.0"
        }
    }
}

Other normally written data is like this:

[root@kcpdeploy ~]# aws dynamodb get-item --table-name "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4" --key '{"bi": {"S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4.0"}, "obj":{"S":"file1"}}' --endpoint-url http://10.10.10.92:9080
{
    "Item": {
        "head": {
            "B": "AQEnBwAACAAAAAwAAAB1c2VyLnJndy5hY2x7AAAAAgJ1AAAAAwIOAAAAAwAAAGxqeAMAAABsangEA1sAAAABAQAAAAMAAABsangPAAAAAQAAAAMAAABsangFAzIAAAACAgQAAAAAAAAAAwAAAGxqeAAAAAAAAAAAAgIEAAAADwAAAAMAAABsangAAAAAAAAAAAAAAAAAAAAAFQAAAHVzZXIucmd3LmNvbnRlbnRfdHlwZRkAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0ADQAAAHVzZXIucmd3LmV0YWcgAAAAN2I2Yzk1MmM3ZjBiMjNmMWRlMzg1MjI0N2ZkYjQ1ZjYRAAAAdXNlci5yZ3cubWFuaWZlc3TfBAAACAbZBAAAAAAgAAAAAAAAAAAAAAYGjAAAAAoKdQAAAAoAAAB6eC1zY3lsbGE5LQAAADdmMWJmOWM4LThlNGQtNGQ0Yy05MTAzLWM4MzM5MWNiODRiMS40MzQxNjcuNC0AAAA3ZjFiZjljOC04ZTRkLTRkNGMtOTEwMy1jODMzOTFjYjg0YjEuNDM0MTY3LjQAAAAAAAAAAAAFAAAAZmlsZTEAAAAAAAAAAAAAAAAAAAAAAAAAACEAAAAuYTVhREVRX2ZnV09iZU9CQnJPWWFkNTBUNHRDZDJnMl8BAAAAAAAAAAAAAAACASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAEQAAAGRlZmF1bHQtcGxhY2VtZW50EQAAAGRlZmF1bHQtcGxhY2VtZW50DwAAAAEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2wxAQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDIBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sMwEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w0AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDUBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sNgEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w3AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDgBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sOQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMgEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMwEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNQcAAAABAAAAawIAAAAxMAEAAABtAQAAADUKAAAAcGFja2V0c2l6ZQEAAAA4AgAAAHJ0AgAAADEwCwAAAHN0cmlwZV91bml0BAAAADQwOTYJAAAAdGVjaG5pcXVlBgAAAGNhdWNoeQIAAAB3dAIAAAAxMAQAAABDWlJTAAAAAAAAAAAUAAAAdXNlci5yZ3cuc291cmNlX3pvbmUEAAAAAAAAABYAAAB1c2VyLnJndy5zdG9yYWdlX2NsYXNzCAAAAFNUQU5EQVJEEwAAAHVzZXIucmd3LngtYW16LWRhdGUgAAAARnJpLCAxMSBTZXAgMjAyMCAwNTozNjo1NyArMDAwMAAfAAAAdXNlci5yZ3cueC1hbXotbWV0YS1zM2NtZC1hdHRyc4UAAAB1aWQ6MC9nbmFtZTpyb290L3VuYW1lOnJvb3QvZ2lkOjAvbW9kZTozMzE4OC9tdGltZToxNTk2NjEzMTg3L2F0aW1lOjE1OTk3MjI4MjYvbWQ1OjdiNmM5NTJjN2YwYjIzZjFkZTM4NTIyNDdmZGI0NWY2L2N0aW1lOjE1OTY2MTMxODcAAAAAAA=="
        }, 
        "obj": {
            "S": "file1"
        }, 
        "bi": {
            "S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.4.0"
        }, 
        "val": {
            "B": "CAO7AAAABQAAAGZpbGUxAAAAAAAAAAAABwN4AAAAAAAAIAAAAAAA+gxbX2xbBwUgAAAAN2I2Yzk1MmM3ZjBiMjNmMWRlMzg1MjI0N2ZkYjQ1ZjYDAAAAbGp4AwAAAGxqeBgAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0AACAAAAAAAAAAAAAIAAAAU1RBTkRBUkQAAAAAAAAAAAABAQoAAACI//////////8AAAAAAAAAAAAAAAAAAAAAAAAAAA=="
        }
    }
}

I did not modify the LWT policy (the default is always), so the write latency will be a bit high, but this should not be the cause of the data loss, because I can reproduce this failure when I set LWT to only_rmw_uses_lwt, which is a write latency It is ms level.
image

Alternator bug high

All 23 comments

Note that LWT is involved here, for both modes you used, because your request has a condition ("ConditionExpression":"attribute_not_exists(bi)"). If you want to try without LWT you'll need to also remove the condition expression (and may also use forbid or unsafe mode to make sure it isn't used in practice).

@Felix-zhoux are you saying that you are sure that in your test, all these writes go to different sort keys, there is never parallel writes to the same sort key?

CC: @gleb-cloudius @kostja

@Felix-zhoux I just realized you are using Scylla 4.0 (at least that's what your issue says). A lot has improved since in both Alternator and LWT. Can you please try to upgrade to 4.2rc4 and check if the issue is still present? Thanks.

@Felix-zhoux I just realized you are using Scylla 4.0 (at least that's what your issue says). A lot has improved since in both Alternator and LWT. Can you please try to upgrade to 4.2rc4 and check if the issue is still present? Thanks.

Okay, I will try to upgrade scylla. By the way, I also reproduced this problem in version 4.1.3.

@Felix-zhoux another question for clarification: The workload is entirely PutItem operations writing full items of the sort you showed above? Or do you have other concurrent write operations doing UpdateItem modifying parts of items?

Note that LWT _is_ involved here, for both modes you used, because your request has a condition ("ConditionExpression":"attribute_not_exists(bi)"). If you want to try without LWT you'll need to also remove the condition expression (and may also use forbid or unsafe mode to make sure it isn't used in practice).

I will do some more tests according to what you said, such as not adding conditions and setting LWT to forbid.

@Felix-zhoux are you saying that you are sure that in your test, _all_ these writes go to different sort keys, there is never parallel writes to the same sort key?

Yes, I am sure about that. I started 10 processes, each process put 1000 items, process 1 put 'obj' named file1-->file1000, process 2 put 'obj' named file1001-->file2000... total put 10000 items .

@Felix-zhoux another question for clarification: The workload is entirely PutItem operations writing full items of the sort you showed above? Or do you have other concurrent write operations doing UpdateItem modifying parts of items?

Also happening is the BatchWriteItem operation for other keyspaces. There is no UpdateItem operation. I uploaded the log during the put item test, you can check the log to confirm.
In this log file, you can find the put item request corresponding to obj ='file4005'.

test-lost-data-0911-112.log

@nyh Hi~!
I performed four more tests, all of which removed the condition expression.
A put item script is as follows(there are 9 others, and their i has different value ranges):

[root@kcpdeploy aws-put]# cat aws-put-1.sh 
#!/bin/sh
# created at 2020.09.11
# put 1k items
i=1
while [ $i -le 1000 ]
do
    aws dynamodb put-item --table-name dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.9 --item '{"bi":{"S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.9.0"}, "obj":{"S":'\"file${i}\"'}, "head":{"S":"AQEqBwAACAAAAAwAAAB1c2VyLnJndy5hY2x7AAAAAgJ1AAAAAwIOAAAAAwAAAGxqeAMAAABsangEA1sAAAABAQAAAAMAAABsangPAAAAAQAAAAMAAABsangFAzIAAAACAgQAAAAAAAAAAwAAAGxqeAAAAAAAAAAAAgIEAAAADwAAAAMAAABsangAAAAAAAAAAAAAAAAAAAAAFQAAAHVzZXIucmd3LmNvbnRlbnRfdHlwZRkAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0ADQAAAHVzZXIucmd3LmV0YWcgAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmURAAAAdXNlci5yZ3cubWFuaWZlc3TiBAAACAbcBAAAAAAgAAAAAAAAAAAAAAYGjwAAAAoKdQAAAAoAAAB6eC1zY3lsbGE5LQAAADdmMWJmOWM4LThlNGQtNGQ0Yy05MTAzLWM4MzM5MWNiODRiMS40MzQxNjcuNC0AAAA3ZjFiZjljOC04ZTRkLTRkNGMtOTEwMy1jODMzOTFjYjg0YjEuNDM0MTY3LjQAAAAAAAAAAAAIAAAAZmlsZTQwMDUAAAAAAAAAAAAAAAAAAAAAAAAAACEAAAAuaDVZc2VKMVZ5bVFuRkZuNGNEZFc1d3pTMk9LRHU2WF8BAAAAAAAAAAAAAAACASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAEQAAAGRlZmF1bHQtcGxhY2VtZW50EQAAAGRlZmF1bHQtcGxhY2VtZW50DwAAAAEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2wxAQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDIBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sMwEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w0AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDUBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sNgEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w3AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDgBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sOQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMgEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMwEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNQcAAAABAAAAawIAAAAxMAEAAABtAQAAADUKAAAAcGFja2V0c2l6ZQEAAAA4AgAAAHJ0AgAAADEwCwAAAHN0cmlwZV91bml0BAAAADQwOTYJAAAAdGVjaG5pcXVlBgAAAGNhdWNoeQIAAAB3dAIAAAAxMAQAAABDWlJTAAAAAAAAAAAUAAAAdXNlci5yZ3cuc291cmNlX3pvbmUEAAAAAAAAABYAAAB1c2VyLnJndy5zdG9yYWdlX2NsYXNzCAAAAFNUQU5EQVJEEwAAAHVzZXIucmd3LngtYW16LWRhdGUgAAAARnJpLCAxMSBTZXAgMjAyMCAwNTozNjo1OSArMDAwMAAfAAAAdXNlci5yZ3cueC1hbXotbWV0YS1zM2NtZC1hdHRyc4UAAAB1aWQ6MC9nbmFtZTpyb290L3VuYW1lOnJvb3QvZ2lkOjAvbW9kZTozMzE4OC9tdGltZToxNTk2NjEzODc5L2F0aW1lOjE1OTk3MjI4MjgvbWQ1OjcyYTBkOThlZjZjN2VlZjY1NDc4NmE1ZDcxNTEwMDZlL2N0aW1lOjE1OTY2MTM4NzkAAAAAAA=="},"val":{"S":"CAO+AAAACAAAAGZpbGU0MDA1AAAAAAAAAAAABwN4AAAAAAAAIAAAAAAA+wxbX1khdAogAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmUDAAAAbGp4AwAAAGxqeBgAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0AACAAAAAAAAAAAAAIAAAAU1RBTkRBUkQAAAAAAAAAAAABAQoAAACI//////////8AAAAAAAAAAAAAAAAAAAAAAAAAAA=="}}' --endpoint-url http://10.10.10.94:9080
    let i++
done

Among them, the first two tests were carried out in the scylla-4.0.0 version:
Test 1, does not set LWT (default is always), and the problem of data loss is reproduced.
Test 2, set LWT to forbid, there is no problem of data loss.

The last two tests were carried out in the upgraded version, scylla-4.2.rc4, and the LWT default policy was set to only_rmw_uses_lwt in the configuration file.
Test 3, set the LWT strategy of the tested table to ‘always’ by tagging. Reproduced the problem of missing data. In the total of 10,000 items, 3 items have lost data, in the same form (The ‘val’ column has all been lost, and the ‘head’ column only has value, and the key is missing).

[root@kcpdeploy result]# cat failed.out-4-2-rc4-always 
get file8402 failed!
get file4619 failed!
get file4970 failed!
[root@kcpdeploy result]# aws dynamodb get-item --table-name "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.8" --key '{"bi": {"S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.8.0"}, "obj":{"S":"file4619"}}' --endpoint-url http://10.10.10.92:9080
{
    "Item": {
        "": {
            "S": "AQEqBwAACAAAAAwAAAB1c2VyLnJndy5hY2x7AAAAAgJ1AAAAAwIOAAAAAwAAAGxqeAMAAABsangEA1sAAAABAQAAAAMAAABsangPAAAAAQAAAAMAAABsangFAzIAAAACAgQAAAAAAAAAAwAAAGxqeAAAAAAAAAAAAgIEAAAADwAAAAMAAABsangAAAAAAAAAAAAAAAAAAAAAFQAAAHVzZXIucmd3LmNvbnRlbnRfdHlwZRkAAABhcHBsaWNhdGlvbi9vY3RldC1zdHJlYW0ADQAAAHVzZXIucmd3LmV0YWcgAAAANzJhMGQ5OGVmNmM3ZWVmNjU0Nzg2YTVkNzE1MTAwNmURAAAAdXNlci5yZ3cubWFuaWZlc3TiBAAACAbcBAAAAAAgAAAAAAAAAAAAAAYGjwAAAAoKdQAAAAoAAAB6eC1zY3lsbGE5LQAAADdmMWJmOWM4LThlNGQtNGQ0Yy05MTAzLWM4MzM5MWNiODRiMS40MzQxNjcuNC0AAAA3ZjFiZjljOC04ZTRkLTRkNGMtOTEwMy1jODMzOTFjYjg0YjEuNDM0MTY3LjQAAAAAAAAAAAAIAAAAZmlsZTQwMDUAAAAAAAAAAAAAAAAAAAAAAAAAACEAAAAuaDVZc2VKMVZ5bVFuRkZuNGNEZFc1d3pTMk9LRHU2WF8BAAAAAAAAAAAAAAACASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAEQAAAGRlZmF1bHQtcGxhY2VtZW50EQAAAGRlZmF1bHQtcGxhY2VtZW50DwAAAAEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2wxAQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDIBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sMwEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w0AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDUBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sNgEBLwAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZgcAAABlY3Bvb2w3AQEvAAAAIAAAAC9sb2cvY2VwaC0xNC4yLjgvYnVpbGQvY2VwaC5jb25mBwAAAGVjcG9vbDgBAS8AAAAgAAAAL2xvZy9jZXBoLTE0LjIuOC9idWlsZC9jZXBoLmNvbmYHAAAAZWNwb29sOQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMQEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMgEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxMwEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNAEBMAAAACAAAAAvbG9nL2NlcGgtMTQuMi44L2J1aWxkL2NlcGguY29uZggAAABlY3Bvb2wxNQcAAAABAAAAawIAAAAxMAEAAABtAQAAADUKAAAAcGFja2V0c2l6ZQEAAAA4AgAAAHJ0AgAAADEwCwAAAHN0cmlwZV91bml0BAAAADQwOTYJAAAAdGVjaG5pcXVlBgAAAGNhdWNoeQIAAAB3dAIAAAAxMAQAAABDWlJTAAAAAAAAAAAUAAAAdXNlci5yZ3cuc291cmNlX3pvbmUEAAAAAAAAABYAAAB1c2VyLnJndy5zdG9yYWdlX2NsYXNzCAAAAFNUQU5EQVJEEwAAAHVzZXIucmd3LngtYW16LWRhdGUgAAAARnJpLCAxMSBTZXAgMjAyMCAwNTozNjo1OSArMDAwMAAfAAAAdXNlci5yZ3cueC1hbXotbWV0YS1zM2NtZC1hdHRyc4UAAAB1aWQ6MC9nbmFtZTpyb290L3VuYW1lOnJvb3QvZ2lkOjAvbW9kZTozMzE4OC9tdGltZToxNTk2NjEzODc5L2F0aW1lOjE1OTk3MjI4MjgvbWQ1OjcyYTBkOThlZjZjN2VlZjY1NDc4NmE1ZDcxNTEwMDZlL2N0aW1lOjE1OTY2MTM4NzkAAAAAAA=="
        }, 
        "obj": {
            "S": "file4619"
        }, 
        "bi": {
            "S": "dir.7f1bf9c8-8e4d-4d4c-9103-c83391cb84b1.434167.8.0"
        }
    }
}

Test 4, set the LWT strategy of the tested table to ‘forbid’ by tagging. No data loss problem.

I hope my test can give you some help in analyzing this problem.

Thanks @Felix-zhoux this informat is very helpful, thanks! This can be a bug in LWT, but can also be a bug in the way that Alternator uses LWT, I'll have to investigate where exactly do we lose pieces of the item.

Unfortunately (or rather, fortunately), I was able to reproduce this issue on my laptop, with 5 Scylla processes (as usual, RF=3), writes in parallel to all of them, each write going to a different row in the same partition. Exactly as @Felix-zhoux I can easily (a trivial Scan request!) find items which were broken, and they are broken in exactly the same way that @Felix-zhoux saw - one attribute is missing and one has an empty name (!). What is really odd is that it seems that as the writes are happening, a constant percentage, which is suspiciously close to 1/256 (!) of the items get messed up.
I'm now trying to simplify my test case (fewer server and client processes, fewer writes, etc.) and automate it, and will report back with more info.

Two more observations:

  1. The problem does not reproduce if we don't have clients doing writes in parallel. If there is just one client thread and it writes a sequence of items to random coordinators, I can't reproduce the bug. I need at least two clients threads in parallel, each writing different rows to random coordinators. Two client threads are enough.
  2. Not only does the problem reproduce for 5, 4 and 3 nodes, it also reproduced exactly the same problem with with just two Scylla nodes! Note that with two Scylla nodes, RF=1, so because we have just one partition, actually just one of the two nodes actually holds data. However, the problem does NOT reproduce on a cluster of just a single Scylla node!

Putting those two observations together, I think that for the bug to manifest we need to have two writes (to different rows in the same partition) happening on two different coordinators in parallel - and for that we need at least two Scylla nodes (no matter that just one holds any data) and two client threads and anything less does not reproduce the bug.

The following Python program, if run against Scylla running on two processes on 127.0.0.1 and 127.0.0.2, reproduces the bug every time, in a few seconds. I starts two client threads, each connecting to a different server, and each writing 1000 short items to this same server. At the end, we scan the table, and notice broken items. Here is an example output:

$ ./7218.py       
Bad item: {'p': 'hi', 'c': 'item1350', '': 'dog'}
Bad item: {'p': 'hi', 'c': 'item1361', '': 'dog'}
Bad item: {'p': 'hi', 'c': 'item1816', '': 'dog'}
Bad item: {'p': 'hi', 'c': 'item56', '': 'dog'}
Bad item: {'p': 'hi', 'c': 'item833', '': 'dog'}
Traceback (most recent call last):
  File "./7218.py", line 71, in <module>
    assert n_bad_items == 0
AssertionError

Each item is supposed to have two non-key attributes - v1=dog and v2=cat - but the bug is always the same - v2 is completely missing and v1 lost its name, for several items.

The good news is that: 1. I can easily re-run this short test many times while debugging, to figure out what went wrong, and 2. This test can be easily made into a dtest if we want it as a regression test..

This simplistic test, if run multiple times, reuses the same table without clearing it, but this is not a problem - the test overwrites the same items, and we get different garbage items every time...

The test:

#!/usr/bin/python3
import boto3
import botocore
from threading import Thread

# To reproduce issue 7218, need parallel connection to *two* nodes:
r1 = boto3.resource('dynamodb', endpoint_url='http://127.0.0.1:8000/',
        region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')
r2 = boto3.resource('dynamodb', endpoint_url='http://127.0.0.2:8000/',
        region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')

# Create the table, access it through the two connections, r1 and r2:
try:
    table_r1 = r1.create_table(
       TableName='tab',
       KeySchema=[
            {'AttributeName': "p", 'KeyType': "HASH"},
            {'AttributeName': "c", 'KeyType': "RANGE"}
       ],
       AttributeDefinitions=[
            {'AttributeName': "p", 'AttributeType': "S"},
            {'AttributeName': "c", 'AttributeType': "S"},
       ],
       BillingMode='PAY_PER_REQUEST')
except:
    table_r1 = r1.Table('tab')

table_r2 = r2.Table('tab')

def writes(tab, rang):
    for i in rang:
        tab.put_item(Item={
            'p': 'hi',
            'c': 'item{}'.format(i),
            'v1': 'dog',
            'v2': 'cat'})

# Create two writing threads, each writing 500 *different* rows to
# one different connection:
total_items = 2000
t1 = Thread(target=writes, args=(table_r1, range(0,1000)))
t2 = Thread(target=writes, args=(table_r2, range(1000, 2000)))
t1.start()
t2.start()
t1.join()
t2.join()

def check_items(items):
    for item in items:
        check_item(item)

n_items = 0
n_bad_items = 0
def check_item(item):
    global n_items
    global n_bad_items
    n_items = n_items + 1
    if not 'v1' in item or not 'v2' in item:
        n_bad_items = n_bad_items + 1
        print('Bad item: {}'.format(item))

# Scan the table, looking for broken items (issue #7218)
response = table_r1.scan(ConsistentRead=True)
check_items(response['Items'])
while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'],
        ConsistentRead=True)
    check_items(response['Items'])

assert n_items == total_items
assert n_bad_items == 0

Ok, I found the cause for the bug...
The broken items happened because put_item_operation::apply() got called twice for the same operation to build the mutation, and I assumed - for performance - that apply() is only ever called once per operation, so the implementation std::moves() some strings saved in the operation, like the column name.

An obvious fix would be to not move anything, and not assume that apply() is only called once.

However, I'm still curious why cas_request::apply() is called twice. It sounds like a waste of time - building the same mutation twice, what's the point? @gleb-cloudius is this something that is supposed to happen - that cas_request::apply() gets called twice?

I can confirm that fixing Alternator's cas_request::apply() implementation to not assume it is only called once, so not "recycle" any strings saved in that object, fixes this bug. I already have a working patch that fixes my reproducing test.

However, before I submit my patch, @gleb-cloudius I would like you to confirm please if it makes sense that cas_request::apply() was called more than once for the same request. If it does makes sense and does happen under some rare circumstances (which?), I'll submit my patch for Alternator. But if it shouldn't have happened, maybe this exposed some problem in LWT and we better fix it instead (or in addition to) fixing Alternator?

Sure it can in case of a contention. This is how paxos works: a node
can get a promise but accept may fail if preempted by a higher ballot.
Apply() happens between promise and accept. Since after failed accept
an operation is retried apply() can be called more than once.

--
Gleb.

Sure it can in case of a contention. This is how paxos works: a node can get a promise but accept may fail if preempted by a higher ballot. Apply() happens between promise and accept. Since after failed accept an operation is retried apply() can be called more than once.

Thanks. I had this misconception that apply() would only be called once even in this case, but obviously you're right - it will need to be called again because the timestamp changed, and possibly (but not in this test) also the result of the read changed.
By the way, in this specific test case there is no real contention - the different clients write to different rows - but I guess the current implementation works at the partition level so it sees what it thinks is contention on the partition.

Very nice detective work here 👍

On Tue, Sep 15, 2020 at 09:58:16PM -0700, nyh wrote:

By the way, in this specific test case there is no real contention - the different clients write to different rows - but I guess the current implementation works at the partition level so it sees what it thinks is contention on the partition.

Yes, the lock/paxos ballot is per partition right now.

--
Gleb.

@Felix-zhoux I sent a patch for review to the mailing list, titled "[PATCH] alternator: fix corruption of PutItem operation in case of contention". Thanks for discovering this bug, and making it (relatively) easy to reproduce! If you are compiling Scylla from source you can already try this patch - or just wait for it to reach master.

@nyh Thank you very much for solving this problem in such a short time. Your high-quality and efficient work makes me feel at ease. I will continue to conduct in-depth testing of Scylla.

@nyh I backported to 4.2, but 4.1 requires more effort.

@nyh I backported to 4.2, but 4.1 requires more effort.

Thanks. I'll look into it.

Backported to next-4.1 (and tested with the new dtest that it fixes the bug, plus that the usual Alternator tests pass). f8c7c485d2d23ca1f9f8bfbf12fbaccad5df70a8

Backported to 4.0 too.

Was this page helpful?
0 / 5 - 0 ratings