Seen in our nightlies, for example here https://teamcity.cockroachdb.com/viewLog.html?buildId=1680645&buildTypeId=Cockroach_Nightlies_WorkloadNightly&tab=buildLog&branch_Cockroach_Nightlies=%3Cdefault%3E&_focus=5217
It looks like we're not running most of the roachtests at the moment due to (hopefully just) this, so upping the quota is fairly high priority.
Actually, could this be fallout from https://github.com/cockroachdb/cockroach/pull/43748 somehow? A look at
seems to indicate that the problems started around that time, and the mention of europe above is suspicious. cc @jlinder
Current SSD limits show that we currently have 25TB of quota in eu-west2. Digging in to see how much we'd actually be allocating.

I have requested that we get a bump to 100TB of local SSD across all of those clusters.
Each local SSD is a fixed 375GB in size, which means that we would hit a 25TB quota after allocating 66 nodes.
Thanks @bobvawter! With a CPU quota of 1024, and four vCPUs per node, we could end up putting up to 256 nodes into one region at any given time, but I'm just going to hazard a guess that this isn't actually happening. What SSD quota do we have set in our main GCE region?
We have 100TB of local ssd quota in us-east1.
From the logs of the build:
[03:01:52] : [Step 2/2] Worker 9 returned with error. Quiescing. Error: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod create teamcity-1680645-1578556249-05-n4cpu4 -n 4 --gce-machine-type=n1-standard-4 --lifetime=12h0m0s --gce-zones=us-central1-b,us-west1-b,europe-west2-b --local-ssd-no-ext4-barrier returned:
It looks like roachtest is calling roachprod with an explicit zones list, but without the --geo flag. This would then trigger the behavior added by #43748
@jlinder can you do a review of roachtest to have it not populate the zones flags unless the calling test is actually requesting a geo-distributed cluster?
I still think there's something to look into here w.r.t #43748. None of the tests that were running in the failure logs that I looked at "intended" to create geo-replicated clusters, yet a fair amount of them seem to have ended up with at least a node in europe. We are starting roachtest with this command line:
[Step 2/2] + timeout -s INT 70200 bin/roachtest run --build-tag v20.1.0-alpha.20191118 --slack-token ******* --cluster-id 1680645 --zones us-central1-b,us-west1-b,europe-west2-b --cockroach /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 --roachprod /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod --workload /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/workload --artifacts /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200109-1680645 --parallelism 16 --cpu-quota=1024 --teamcity
which in turn creates nodes via
roachprod create teamcity-1680645-1578556249-05-n4cpu4 -n 4 --gce-machine-type=n1-standard-4 --lifetime=12h0m0s --gce-zones=us-central1-b,us-west1-b,europe-west2-b --local-ssd-no-ext4-barrier
That is, it passes --gce-zones through, but does not pass --geo. As of #43748, --geo is however silently implied.
This does not appear to have been intentional. I'm pretty sure I can fix this in roachtest. Is that what I should do or should we revisit the flags PR?
@jlinder I think the change would be a simple one, just here:
Making sure that --zones is only emitted if --geo is, too.
Fixing / modifying roachtest is the right approach. The roachprod flags PR was intentional to always use geo when zones are specified (because, why else would one set zones?).
It looks like a slight tweak in the place you mentioned would change the command that is output and the clarity in the roachtest code that using zones automatically uses geo in roachprod. But what about the code that configures the clusterSpec? Is there something about the roachtest commands themselves that don't specify --geo when specifying zones? And is there something that should correctly configure the clusterSpec before getting to the args method?
I think roachtest blindly passes --zones through when it's specified. I don't think it's worth adding extra logic that avoids doing that unless Geo is also specified, just because we'd still need to make sure that if such a spec got created for whatever reason, it would not result in an implicit Geo flag. I don't feel strongly, though.
Ah. I see that the roachtest --zones arg doesn't get put into clusterSpec and is blindly passed to roachprod here:
It logically follows that the way it's working now is the way it was _thought_ to work but was not clear that --geo may not be set and that it wouldn't run across the defined zones.
I don't feel strongly about ensuring a --geo flag is in there but do support it.
Quota increase is still pending, per email from GCP support.
On Mon, Jan 13, 2020, 10:55 James H. Linder notifications@github.com
wrote:
Ah. I see that the roachtest --zones arg doesn't get put into clusterSpec
and is blindly passed to roachprod here:It logically follows that the way it's working now is the way it was
thought to work but was not clear that --geo may not be set and that it
wouldn't run across the defined zones.I don't feel strongly about ensuring a --geo flag is in there but do
support it.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/43898?email_source=notifications&email_token=AAI23FHXWB43QCCVGBXHF6DQ5SFHLA5CNFSM4KF65AC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIZHQYI#issuecomment-573732961,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAI23FF3SAFKS4H2Z4ZPKADQ5SFHLANCNFSM4KF65ACQ
.
After slack conversations, I'm going to change roachtest to only pass the first zone in the --zones flag when geo _is not_ set for a test and to pass all the zones when geo _is_ set for a given test.
The reason for this route is because 1) when the roachtest nightlies are run, they need to receive multiple zones because some tests need to run geo-distributed and 2) we want to keep the new functionality for how roachprod handles the zones and geo flags.
Our local SSD quota has been increased by GCP support to 100TB across those zones.
Most helpful comment
We have 100TB of local ssd quota in us-east1.