Looks like the wrong url is used.
It is using: https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.zip
and it should be using: https://snapshots.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.zip
If I run this xpack qa tests locally then the latter url is used. The last good commit for the mentioned build is: a740b542e586290abd9bf82fbb9fbf5fa5b46b00 and nothing after that changed this xpack qa tests. I don't understand why locally a different url is used to download xpack, so I'm opening an issue for this.
:x-pack:qa:full-cluster-restart:with-system-key:v6.2.5-SNAPSHOT#oldClusterTestCluster#node0.copyBwcPlugins (Thread[Daemon worker,5,main]) completed. Took 0.375 secs.
10:16:46 FAILURE: Build failed with an exception.
10:16:46
10:16:46 * What went wrong:
10:16:46 Could not resolve all files for configuration ':x-pack:qa:full-cluster-restart:with-system-key:v6.2.5-SNAPSHOT#oldClusterTestCluster_elasticsearchBwcPlugins'.
10:16:46 > Could not find x-pack.zip (org.elasticsearch.plugin:x-pack:6.2.5-SNAPSHOT).
10:16:46 Searched in the following locations:
10:16:46 https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.zip
10:16:46
Pinging @elastic/es-core-infra
Last week I have run into this too. I do not yet know what the problem is. We have both registered in the build: https://github.com/elastic/elasticsearch/blob/a740b542e586290abd9bf82fbb9fbf5fa5b46b00/x-pack/qa/rolling-upgrade/build.gradle#L303-L309
I pointed to the wrong build.gradle, it should be: https://github.com/elastic/elasticsearch/blob/a740b542e586290abd9bf82fbb9fbf5fa5b46b00/x-pack/qa/full-cluster-restart/build.gradle#L272-L279
I think https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.3+periodic/39/console is also this:
09:43:56 FAILURE: Build failed with an exception.
09:43:56 Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0.
09:43:56 See https://docs.gradle.org/4.5/userguide/command_line_interface.html#sec:command_line_warnings
09:43:56
09:43:56 4264 actionable tasks: 4134 executed, 130 up-to-date
09:43:56 * What went wrong:
09:43:56 Could not resolve all files for configuration ':x-pack:qa:full-cluster-restart:with-system-key:v5.6.10-SNAPSHOT#oldClusterTestCluster_elasticsearchBwcPlugins'.
09:43:56 > Could not find x-pack.zip (org.elasticsearch.plugin:x-pack:5.6.10-SNAPSHOT).
09:43:56 Searched in the following locations:
09:43:56 https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/x-pack-5.6.10-SNAPSHOT.zip
09:43:56
I opened a gradle discuss issue to hopefully get some help in what could be happening here.
https://discuss.gradle.org/t/not-all-maven-repos-searched/26796
Another instance of the issue: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.3+multijob-unix-compatibility/os=debian/51/console
So I saw this one again today. I tried it locally and got:
[manybubbles@localhost ~]$ wget https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/x-pack-5.6.10-SNAPSHOT.zip
--2018-05-31 17:14:03-- https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/x-pack-5.6.10-SNAPSHOT.zip
Resolving artifacts.elastic.co (artifacts.elastic.co)... 23.21.67.46, 107.21.253.15, 23.23.109.100, ...
Connecting to artifacts.elastic.co (artifacts.elastic.co)|23.21.67.46|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2018-05-31 17:14:03 ERROR 404: Not Found.
It looks to me like the last unified release build failed to start properly and the S3 bucket doesn't actually have the snapshot in there in the first place.
I see. The thing is supposed to be in snapshot.elastic.co but we aren't looking there. This is what I get for skimming the issue rather than reading the whole thing.
I think I found the issue here. What lead me to this discovery is the following output in the build failure:
Resource missing. [HTTP GET: https://repo.maven.apache.org/maven2/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/maven-metadata.xml]
Resource missing. [HTTP GET: https://repo.maven.apache.org/maven2/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/x-pack-5.6.10-SNAPSHOT.pom]
Resource missing. [HTTP HEAD: https://repo.maven.apache.org/maven2/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/x-pack-5.6.10-SNAPSHOT.zip]
Resource missing. [HTTP GET: https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/maven-metadata.xml]
Download https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/5.6.10-SNAPSHOT/x-pack-5.6.10-SNAPSHOT.pom
Resource missing. [HTTP GET: https://repo.maven.apache.org/maven2/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/maven-metadata.xml]
Resource missing. [HTTP HEAD: https://repo.maven.apache.org/maven2/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.pom]
Resource missing. [HTTP HEAD: https://repo.maven.apache.org/maven2/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.zip]
Resource missing. [HTTP GET: https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/maven-metadata.xml]
Resource missing. [HTTP HEAD: https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.pom]
Resource missing. [HTTP HEAD: https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.zip]
Resource missing. [HTTP GET: https://snapshots.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/maven-metadata.xml]
Found locally available resource with matching checksum: [https://snapshots.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.pom, /var/lib/jenkins/.gradle/caches/modules-2/files-2.1/org.elasticsearch.plugin/x-pack/6.2.5-SNAPSHOT/e967615372a620a1af270d846ff6a03f800fc5d/x-pack-6.2.5-SNAPSHOT.pom]
So it looks like Gradle is doing HEAD requests to find out where the artifact is located and then downloading using a GET. This gave me the idea that Amazon S3 (which we use as repository) was lying to one of Gradle's HEAD requests so that it would mislead Gradle to download the artifact from the wrong URL.
I tested this by running
until curl --fail --head "https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.pom"; do echo "Retrying..."; done
and, lo and behold:
13:38 $ until curl --fail --head "https://artifacts.elastic.co/maven/org/elasticsearch/plugin/x-pack/6.2.5-SNAPSHOT/x-pack-6.2.5-SNAPSHOT.pom"; do echo "Retrying..."; done
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
curl: (22) The requested URL returned error: 404 Not Found
Retrying...
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 1010
Content-Type: application/octet-stream
Date: Fri, 01 Jun 2018 11:38:57 GMT
ETag: "e1a35602bf2421354fcb18fd08a44289"
Last-Modified: Thu, 31 May 2018 18:30:50 GMT
Server: nginx/1.4.6 (Ubuntu)
x-amz-expiration: expiry-date="Sun, 01 Jul 2018 00:00:00 GMT", rule-id="auto expiration"
x-amz-id-2: aFpOLUxLpw3uuqPVgXVboHovU9sxzQVmsWU5q/Ij5BhUr5LcmMkpreXAhwfJ+Flppk6+kRFDbIw=
x-amz-request-id: 340D20BD083452F8
x-amz-server-side-encryption: AES256
x-ngx-hostname: www05
Connection: keep-alive
The problem here is S3 provides read-after-write semantics but not read-after-update nor read-after-delete semantics. This means when we upload our daily snapshot artifacts to S3, there will be a period of time when S3 is not yet eventually consistent and we will not necessarily be able to see these artifacts leading to these failures. I had awhile ago noticed a correlation between when our daily snapshots are uploaded and when these builds fail due to these artifacts not being accessible, but had not yet reached the conclusion it was due to eventual consistency properties of S3. We also upload these artifacts to a new path during the daily snapshot uploads for which we would have read-after-write guarantees. Therefore, I think we need to change the build to locate the most recent daily snapshot and modify the repositories to be based on that location. Unfortunately, the manifests that we upload currently overwrite each other so we don鈥檛 currently have a manifest available to locate the artifacts. As these are not public buckets, we can鈥檛 enumerate the objects in snapshots.elastic.co without credentials and that is not something I want to inject into the build, and it won鈥檛 help us locally anyway. Therefore, we need internally address the issue with the overwriting manifests and then we can fix this issue. I will take this up internally.
I have opened internal infra issues to see what we can do here.
@jasontedor can you provide more explanations? I don't follow your observations here. What I've observed was that artifacts.elastic.co (not snapshots.elastic.co) sometimes returned 200 for a HEAD request to a -SNAPSHOT artifact, which looks unexpected to me as I thought that snapshot artifacts should only appear at snapshots.elastic.co. The second thing is that, even in the presence of S3's lacking read-after-update semantics, we should still be seeing either the old or the new artifact if we do not explicitly first delete the old snapshot artifacts before uploading the new ones.
@ywelsch I have no ideas why artifacts.elastic.co would be lying to us. It makes so little sense that when I read your post I misread and thought the problem had to be snapshots.elastic.co. The internal requests I made for improvements to the snapshots still stand, but now I will also chase down what is going on with artifacts.
The problem here as discovered by our amazing infra team is that the proxy that serves artifacts.elastic.co and snapshots.elastic.co caches the S3 objects that are requested through the proxy. However, the cache key did not include the host (meaning artifacts.elastic.co versus snapshots.elastic.co). This means that a successful lookup from snapshots.elastic.co for a snapshot artifact would get cached and then artifacts.elastic.co could reuse that cached object when answering a GET or HEAD request. The issue is intermittent because of proxy load balancing and because the cache is periodically purged (otherwise we would be caching all of our artifacts and snapshots on the proxies). Our infra team will deploy a change to include the host in the cache key which will address this issue.
Most helpful comment
The problem here as discovered by our amazing infra team is that the proxy that serves artifacts.elastic.co and snapshots.elastic.co caches the S3 objects that are requested through the proxy. However, the cache key did not include the host (meaning artifacts.elastic.co versus snapshots.elastic.co). This means that a successful lookup from snapshots.elastic.co for a snapshot artifact would get cached and then artifacts.elastic.co could reuse that cached object when answering a GET or HEAD request. The issue is intermittent because of proxy load balancing and because the cache is periodically purged (otherwise we would be caching all of our artifacts and snapshots on the proxies). Our infra team will deploy a change to include the host in the cache key which will address this issue.