Test-infra: Add StatefulSet tests to non-blocking and file flakes

Created on 28 Jan 2017  路  13Comments  路  Source: kubernetes/test-infra

It seems like one of the tests here https://k8s-testgrid.appspot.com/google-gce#gci-gce-statefulset is completely broken and we never noticed.
We should add all the StatefulSet tests to non-blocking considering @kow3ns is doing a large refactor.

Most helpful comment

I'm sorry, guys! I didn't even know a CockroachDB test existed. It looks like it got added in the StatefulSet changes leading up to 1.5.0 (https://github.com/kubernetes/kubernetes/pull/36793).

I have to run now, but I'll take a look at what's going on by Monday. Assuming it doesn't need any major changes, I should be able to show the test some love.

All 13 comments

I think there's now a way to set up notifications from testgrid, maybe @michelle192837 can set up some tests and we can add more?

(Orthogonal though, still if those need to be added to sq please add them)

I think we should remove the CockroachDB test. The CockroachDB software is beta and released frequently. If we don't have a maintainer that shows a lot of love for this test, and has the expertise in Cockroach to do it, we should remove the test and manifest imo.

@a-robinson is the maintainer I think

I'm sorry, guys! I didn't even know a CockroachDB test existed. It looks like it got added in the StatefulSet changes leading up to 1.5.0 (https://github.com/kubernetes/kubernetes/pull/36793).

I have to run now, but I'll take a look at what's going on by Monday. Assuming it doesn't need any major changes, I should be able to show the test some love.

My test infra knowledge is a little out of date, so I have a couple quick questions.

  1. I love that the kubectl logs are available for failed runs. Are the logs for other containers running in the cluster available anywhere?

  2. How far back do test histories go in testgrid? I'm trying to see when it started failing and can't figure out how to look any further back than a couple weeks ago.

Also, it's unrelated to test infra, but it looks like Events for the failed probes are failing to get reported to the apiserver due to the API object-related error 'object does not implement the List interfaces':

E0210 19:12:45.046912    1353 event.go:259] Could not construct reference to: '&v1.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-statefulset-vtc1w", Name:"cockroachdb-0", UID:"bd50bec4-efc4-11e6-8f42-42010af00002", APIVersion:"v1", ResourceVersion:"1521", FieldPath:"spec.containers{cockroachdb}"}' due to: 'object does not implement the List interfaces'. Will not report event: 'Warning' 'Unhealthy' 'Readiness probe failed: HTTP probe failed with statuscode: 404'

Is that expected? I don't see any open issues about it on the main repo, but don't want to clutter things up if it's well known.

Logs are generated by cluster/log-dump.sh. I think we only get logs for system components and not for the test containers themselves. You can look through what's on gcsweb for one of the failed runs.

Testgrid only looks back two weeks for e2e runs. I found that 775 was the first run to fail, and the only commit between it and 774 was this one.

Thanks Joe, I understand the problem now. It's something that may be of interest to the design of StatefulSets, but that we can also work around in the short term.

Effectively, the issue is that CockroachDB is a CP system (rather than AP). That means that a node isn't able to do anything unless it's part of a majority. Some time ago, CockroachDB's health checks were "improved" to make a node not consider itself healthy if it's partitioned away from a majority of the cluster.

Unfortunately, the test takes down all three nodes and the StatefulSet infrastructure only brings them back up one at a time, waiting until the first is healthy before bringing back up the others. The first node thinks that it's in a minority partition when it comes back up and thus never considers itself healthy, so it never passes its readiness/liveness probes, which prevents the other two nodes from ever getting brought up.

We can fix the test by removing the readiness/liveness probes or switching them to hit a different endpoint on the node (albeit one that won't really do any useful liveness checking). I'd be happy to just remove them for now if we want to get the test green ASAP.

The question that may be worth thinking about (@kow3ns) is whether this sort of majority-based liveness probe should be supported. There are situations where a partition could make it worthwhile for a pod to ask to be killed and moved somewhere else, but it's possible that not enough systems would use that option for it to be worth the effort. One not-at-all-thought-through way to support it would be for Statefulsets to have an option to skip the one-by-one process of creating each replica and just bring them up all at once.

Generally, for the category of distributed systems that use quorum based consensus protocols (e.g. Raft, Zab, Mutli-Paxos) to replicate a state machine to achieve linearizable operations the following semantics are what is implemented (e.g. ZooKeeper, etcd, Consul).

liveness probe - Is the distributed systems local process running in the Pod
readiness probe - Can the process in the Pod participate in forming a quorum

Liveness is probably an overloaded term here, and it should not be confused with liveness in the context of distributed systems/concurrent programming theory.

Whether the processes have bootstrapped a quorum, and whether they have elected a stable leader (which is requirement for many of the more efficient systems in this category), is currently considered to be more of a cluster wide health check. It would be useful to have a way to surface this telemetry, but the liveness and readiness probes probably shouldn't be used for this.

In practice, this has not been an issue. Clients of distributed systems in this category are generally designed to be resilient to intermittent cluster unavailability (e.g. when leader election occurs during a node failure, or when a majority component can't be reached due to network partition). Many will also automatically discover cluster membership when configured with a set of entry points.

I do have a feature request open to add what we are calling burst semantics, which you could use to allow the readiness probe to fail when a majority component of the StatefulSet is unavailable. However, its intention is more to prevent deadlock and optimize deployment for distributed systems in this category.

Thanks, it seems as though the Burst Semantics would work.

Although I think I'm a little confused about what you mean by "Can the process in the Pod participate in forming a quorum". You're saying that it isn't dependent on whether the pod is aware of an existing quorum, just on whether it hypothetically could form a quorum?

@a-robinson If a process is not ready, its DNS entry will not be added to the domain controlled by its headless service. Its peers will not be able to resolve its address. If the process is in a state where it can not process requests from its peers (e.g. replicate log entries, update your current state with a snapshot, participate in leader election), it is not ready, and its readiness check should fail. Otherwise it should pass its readiness check so that it can either rejoin or boostrap a cluster.

Looking at the CockroachDB architecture and clients, is your problem that you need a check to add each node to a service so that SQL drivers can be correctly load balanced against the SQL layer (i.e. do you need to be able to provide clients with a service that detects which nodes are up and load balances connections accordingly)?

Ok, so the other systems base readiness on whether they're ready to interact with other replicas, not whether they actually are interacting. Got it.

Load balancing only to functional replicas is certainly a nice benefit of proper liveness checks for CockroachDB. However, the odds of a partition where a client is capable of hitting nodes on both sides of the partition (and the client node's kube-proxy is getting updates on pod liveness for nodes on minority side) seem pretty low, so I don't think that leaving out the checks is all that big of a problem for clients.

I appreciate the explanation, and will likely add a more idiomatic readiness endpoint to CockroachDB soon.

/close

Was this page helpful?
0 / 5 - 0 ratings