Scylla: repair_joint_row_3nodes_1_diff_shard_count_test fails on unexpected rx_row_nr

Created on 23 Aug 2020  路  10Comments  路  Source: scylladb/scylla

Since 3b1ff90a1a80c4b0d2b6784de052a376ad75e7ee there are a number of dtest regressions related to repair.
This one, in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/581/testReport/repair_additional_test/RepairAdditionalTest/repair_joint_row_3nodes_1_diff_shard_count_test/
Looks like:

Traceback (most recent call last):
  File "/usr/lib64/python3.7/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib64/python3.7/unittest/case.py", line 645, in run
    testMethod()
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/repair_additional_test.py", line 2778, in repair_joint_row_3nodes_1_diff_shard_count_test
    return RepairAdditionalBase._repair_joint_row_3nodes_same_key_same_value_test(self, same_shard_count=False)
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/repair_additional_test.py", line 2216, in _repair_joint_row_3nodes_same_key_same_value_test
    self.check_repair_tx_rx_rows(node3, expected_tx_row_nr=20, expected_rx_row_nr=10)
  File "/jenkins/workspace/scylla-master/dtest-release/scylla-dtest/repair_additional_test.py", line 78, in check_repair_tx_rx_rows
    self.assertEqual(rx, expected_rx_row_nr)
  File "/usr/lib64/python3.7/unittest/case.py", line 869, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib64/python3.7/unittest/case.py", line 862, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: 11 != 10

It's highly reproducible.
I changed the assert to a warning and the rest of the test passes, just counting the number of rows in each node after repair.
But @asias please look into this and determine if we just need to adjust the test or there could be an underlying problem in scylla we need to fix.

For reference, here's the change I tried in the test:

index 3295616c..fb3dd2c9 100644
--- a/repair_additional_test.py
+++ b/repair_additional_test.py
@@ -74,8 +74,10 @@ class RepairAdditionalBase(Tester):
             kv = re.findall("rx_row_nr=\d*", line)[0].split('=')
             debug(kv)
             rx += int(kv[1])
-        self.assertEqual(tx, expected_tx_row_nr)
-        self.assertEqual(rx, expected_rx_row_nr)
+        if tx != expected_tx_row_nr:
+            debug("Expected {} tx rows but seen {}".format(expected_tx_row_nr, tx))
+        if rx != expected_rx_row_nr:
+            debug("Expected {} tx rows but seen {}".format(expected_rx_row_nr, rx))

     def _stop_all_nodes_except_for(self, node):
         debug("Stopping all nodes except for: {}".format(node.name))
Backport candidate repair

All 10 comments

I am looking at it.

I found the issue. The nr of rows of the repair for bootstrap is counted. This causes the nr of rows expected to be wrong. I am sending a patch.

Patch for scylla core is sent: https://github.com/scylladb/scylla/pull/7101

I will send the dtest change once #7101 is merged in master.

--- a/repair_additional_test.py
+++ b/repair_additional_test.py
@@ -65,7 +65,7 @@ def check_rows_on_node(self, node_to_check, rows, found=None, missings=None, res
     def check_repair_tx_rx_rows(self, node_to_check, expected_tx_row_nr, expected_rx_row_nr):
         tx = 0
         rx = 0
-        for line in node_to_check.grep_log("stats: ranges_nr"):
+        for line in node_to_check.grep_log("stats: repair_reason=repair"):
             line = line[0]
             debug(line)
             kv = re.findall("tx_row_nr=\d*", line)[0].split('=')

@asias please send patch for the dtest

@asias please send patch for the dtest

https://github.com/scylladb/scylla-dtest/pull/1651

This looks like just a logging improvement, don't see a reason to backport. @asias please confirm.

@asias ping

This looks like just a logging improvement, don't see a reason to backport. @asias please confirm.

Yes. We can safely skip the backport.

Was this page helpful?
0 / 5 - 0 ratings