Cockroach: server: one unaccounted-for range

Created on 1 Nov 2018  路  3Comments  路  Source: cockroachdb/cockroach

After @celiala's PR #31830, almost all the ranges are accounted for in the UI. However, there seems to be a discrepancy there, and one range is missing.

See: https://github.com/cockroachdb/cockroach/pull/31830#discussion_r229870404

A-kv-client A-webui-general C-investigation T-observability

Most helpful comment

I did a little digging here because I was curious and figured it'd be something easy to fix, and it looks like the logic for scanning over the meta range to find the range descriptors is incorrect.

I added a couple of lines of debug logging [1] to (*adminServer).statsForSpan, ran the test repro that @celiala added, and found that the range descriptors it's iterating over for each given span are all wrong, e.g.:

I181112 20:25:29.858865 1567 server/admin.go:664  starting iteration over span: /System/ts{d-e}
I181112 20:25:29.858899 1567 server/admin.go:670  range: r4:/System/{NodeLivenessMax-tsd} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.858924 1567 server/admin.go:675  done iterating over span: /System/ts{d-e}
I181112 20:25:29.859381 1567 server/admin.go:664  starting iteration over span: /{Min-System/tsd}
I181112 20:25:29.859399 1567 server/admin.go:670  range: r1:/{Min-System/} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859411 1567 server/admin.go:670  range: r1:/{Min-System/} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859434 1567 server/admin.go:670  range: r2:/System/{-NodeLiveness} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859445 1567 server/admin.go:670  range: r3:/System/NodeLiveness{-Max} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1]
I181112 20:25:29.859456 1567 server/admin.go:675  done iterating over span: /{Min-System/tsd}
I181112 20:25:29.859869 1567 server/admin.go:664  starting iteration over span: /{System/tse-Table/SystemConfigSpan/Start}
I181112 20:25:29.859897 1567 server/admin.go:670  range: r5:/System/ts{d-e} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859917 1567 server/admin.go:675  done iterating over span: /{System/tse-Table/SystemConfigSpan/Start}
I181112 20:25:29.871391 1602 server/admin.go:664  starting iteration over span: /Table/{3-4}
I181112 20:25:29.871420 1602 server/admin.go:675  done iterating over span: /Table/{3-4}
I181112 20:25:29.876226 1517 server/admin.go:664  starting iteration over span: /Table/1{2-3}
I181112 20:25:29.876261 1517 server/admin.go:670  range: r8:/Table/1{1-2} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1]
I181112 20:25:29.876271 1517 server/admin.go:675  done iterating over span: /Table/1{2-3}

This is presumably because the meta keys use the end keys of ranges, not the start keys. Rather than trying to rewrite something here, I'd suggest looking into using something like the existing RangeIterator.

Is that enough of a bread crumb for one of you two to follow up on it, or would you like me to?

[1] the diff in question:

diff --git a/pkg/server/admin.go b/pkg/server/admin.go
index 23d647b7de..87061c81d5 100644
--- a/pkg/server/admin.go
+++ b/pkg/server/admin.go
@@ -661,15 +661,18 @@ func (s *adminServer) statsForSpan(

        // This map will store the nodes we need to fan out to.
        nodeIDs := make(map[roachpb.NodeID]struct{})
+       log.Infof(ctx, "starting iteration over span: %v", span)
        for _, kv := range rangeDescKVs {
                var rng roachpb.RangeDescriptor
                if err := kv.Value.GetProto(&rng); err != nil {
                        return nil, s.serverError(err)
                }
+               log.Infof(ctx, "range: %v", rng)
                for _, repl := range rng.Replicas {
                        nodeIDs[repl.NodeID] = struct{}{}
                }
        }
+       log.Infof(ctx, "done iterating over span: %v", span)

        // Construct TableStatsResponse by sending an RPC to every node involved.

All 3 comments

I did a little digging here because I was curious and figured it'd be something easy to fix, and it looks like the logic for scanning over the meta range to find the range descriptors is incorrect.

I added a couple of lines of debug logging [1] to (*adminServer).statsForSpan, ran the test repro that @celiala added, and found that the range descriptors it's iterating over for each given span are all wrong, e.g.:

I181112 20:25:29.858865 1567 server/admin.go:664  starting iteration over span: /System/ts{d-e}
I181112 20:25:29.858899 1567 server/admin.go:670  range: r4:/System/{NodeLivenessMax-tsd} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.858924 1567 server/admin.go:675  done iterating over span: /System/ts{d-e}
I181112 20:25:29.859381 1567 server/admin.go:664  starting iteration over span: /{Min-System/tsd}
I181112 20:25:29.859399 1567 server/admin.go:670  range: r1:/{Min-System/} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859411 1567 server/admin.go:670  range: r1:/{Min-System/} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859434 1567 server/admin.go:670  range: r2:/System/{-NodeLiveness} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859445 1567 server/admin.go:670  range: r3:/System/NodeLiveness{-Max} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1]
I181112 20:25:29.859456 1567 server/admin.go:675  done iterating over span: /{Min-System/tsd}
I181112 20:25:29.859869 1567 server/admin.go:664  starting iteration over span: /{System/tse-Table/SystemConfigSpan/Start}
I181112 20:25:29.859897 1567 server/admin.go:670  range: r5:/System/ts{d-e} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=1]
I181112 20:25:29.859917 1567 server/admin.go:675  done iterating over span: /{System/tse-Table/SystemConfigSpan/Start}
I181112 20:25:29.871391 1602 server/admin.go:664  starting iteration over span: /Table/{3-4}
I181112 20:25:29.871420 1602 server/admin.go:675  done iterating over span: /Table/{3-4}
I181112 20:25:29.876226 1517 server/admin.go:664  starting iteration over span: /Table/1{2-3}
I181112 20:25:29.876261 1517 server/admin.go:670  range: r8:/Table/1{1-2} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1]
I181112 20:25:29.876271 1517 server/admin.go:675  done iterating over span: /Table/1{2-3}

This is presumably because the meta keys use the end keys of ranges, not the start keys. Rather than trying to rewrite something here, I'd suggest looking into using something like the existing RangeIterator.

Is that enough of a bread crumb for one of you two to follow up on it, or would you like me to?

[1] the diff in question:

diff --git a/pkg/server/admin.go b/pkg/server/admin.go
index 23d647b7de..87061c81d5 100644
--- a/pkg/server/admin.go
+++ b/pkg/server/admin.go
@@ -661,15 +661,18 @@ func (s *adminServer) statsForSpan(

        // This map will store the nodes we need to fan out to.
        nodeIDs := make(map[roachpb.NodeID]struct{})
+       log.Infof(ctx, "starting iteration over span: %v", span)
        for _, kv := range rangeDescKVs {
                var rng roachpb.RangeDescriptor
                if err := kv.Value.GetProto(&rng); err != nil {
                        return nil, s.serverError(err)
                }
+               log.Infof(ctx, "range: %v", rng)
                for _, repl := range rng.Replicas {
                        nodeIDs[repl.NodeID] = struct{}{}
                }
        }
+       log.Infof(ctx, "done iterating over span: %v", span)

        // Construct TableStatsResponse by sending an RPC to every node involved.

Appreciate you digging into this, Alex! I think this bread crumb should get me to a solution, but I'll reach out if I run into trickiness. Thanks!

@celiala @dhartunian is this issue still current?

Was this page helpful?
0 / 5 - 0 ratings