Openjdk-infrastructure: A download server for linux packages for AdoptOpenJDK

Created on 19 Nov 2019  路  10Comments  路  Source: AdoptOpenJDK/openjdk-infrastructure

We've been using an Artifactory instance sponsored by JFrog for roughly half a year to host Linux packages of AdoptOpenJDK. There's work underway to host our flavour of JMC there, too. I'd like to reconsider whether we're on the right track here:

  • What are our needs?
  • Do we favour a managed solution like Artifactory?

    • Is Artifactory the right solution, after all?

  • Do we favour to host it oursevles?

There are various reason I'd like to reconsider our choice of Artifactory:

  • We're dependant on the external provider, in this case JFrog.
  • We don't have full control over the URL (it's https://adoptopenjdk.jfrog.io/). As a consequence, migrating to a different service (if we ever have to do it) is going to be a pain and take a long time.
  • The automatic generation of Debian package indices is broken (upstream issue) rendering our automation inoperative. The ETA for a fix is in the next two quarters (according to JFrog).
  • The support for Eclipse update sites (JMC) isn't great.
  • We might want to host things that don't fit into Artifactory at all, like packages for Alpine Linux.

My objective is to collect a list of requirements first so that we can check the various options out before coming up with an actionable proposal.

enhancement

Most helpful comment

Rough idea using AWS terminology:

architecture_sketch

The Jenkins nodes push build artifacts to an upload server using restricted SFTP. The upload server keeps a local copy of all files. It is responsible for generating package indices and signing files. This cannot be done on Jenkins nodes because reprepro needs all packages on a local disk to generate the package indices. The upload server syncs its local copy of all files with a S3 bucket. From there, our users download the files via Cloudfront.

The AdoptOpenJDK GPG key needs to be stored on the upload server. Therefore, it has to be locked down.

Questions:

  • Does that look okay to you?
  • Although I used AWS terminology, what would be our preferred provider?
  • Does SFTP work for everyone?

As soon as we have a proposal everybody is happy with, I'll do a test setup so that we can verify that it actually works as expected.

All 10 comments

From the perspective of the Linux packages:

  • We've amassed approx. 20 GB of Debian/Ubuntu packages and 180 GB of RPMs since May 2019 (6 months). I expect that we need around 500 GB of storage per year for releases alone. In the first months, we didn't build all OpenJ9 variants. We have 5 TBs of nightly builds.
  • I'm leaning towards using upstream tooling to host the package feeds, i.e reprepro and createrepo. This means I'd need a server that has a complete local copy of all the packages so that the package indexes and metadata can be generated and signed with our GPG key (increased security required).

Sounds excellent. Having the JMC update sites on a download server would be great. Something along the lines of:

https://<baseurl>/jmc/updatesites/latest/ide/
https://<baseurl>/jmc/updatesites/latest/rcp/
https://<baseurl>/jmc/updatesites/7.0.0/ide/
https://<baseurl>/jmc/updatesites/7.0.0/rcp/
https://<baseurl>/jmc/updatesites/7.1.0/ide/
https://<baseurl>/jmc/updatesites/7.1.0/rcp/

Note that once we have published the update sites, we should re-spin and re-publish the application builds, including 7.0.0 and 7.1.0, with correct overrides for the URLs. Then it will finally be possible to install the optional plug-ins. :)

Rough idea using AWS terminology:

architecture_sketch

The Jenkins nodes push build artifacts to an upload server using restricted SFTP. The upload server keeps a local copy of all files. It is responsible for generating package indices and signing files. This cannot be done on Jenkins nodes because reprepro needs all packages on a local disk to generate the package indices. The upload server syncs its local copy of all files with a S3 bucket. From there, our users download the files via Cloudfront.

The AdoptOpenJDK GPG key needs to be stored on the upload server. Therefore, it has to be locked down.

Questions:

  • Does that look okay to you?
  • Although I used AWS terminology, what would be our preferred provider?
  • Does SFTP work for everyone?

As soon as we have a proposal everybody is happy with, I'll do a test setup so that we can verify that it actually works as expected.

Sound reasonable to me, but I'm not directly involved in these part. Patrick (@reinhapa), what do you think?

I have no specific opinion about this, but I will need some help getting the update sites to be working later down the process though...

@aahlenst - when do you think the test setup will be available?

I cannot give any estimates. Won't happen until mid of February for sure except someone steps up to help. Happy to talk anyone through it.

Requirements we have:

  • Support for deb, rpm, apk (Alpine), Eclipse P2
  • 500 GB of releases per year
  • 2 TB of nightly builds per year
  • 20 TB of bandwidth per month via CDN
  • Custom domain via SSL (e.g., packages.adoptopenjdk.net)

The storage and bandwidth requirements are estimates. It's very hard to get that info out of Artifactory.

I did some further research on options:

  • Self-hosting with OSS (createrepo and friends) is rather expensive because of storage and bandwidth requirements. We need a full local view of the entire package trees to generate and sign indexes. So we need some TBs of block storage. Just to fulfill the requirements for one year, we'd need to spend around 3200$ for the machine at AWS (t3a.medium with 3 TB of EBS, no backup). Bandwidth via Cloud Front is another 20000$ per year (seems a bit high?). We might be able to reduce the spending on bandwidth with Cloudflare, Fastly, ... Backup and people to operate that would come on top. I looked at Hetzner, too, where we host Jenkins. They would be significantly cheaper, but they do not really have good backup facilities for that amount of data. Another drawback of self-hosting: No API.
  • Self-hosting with Sonatype Nexus Pro: Would be cheaper because Nexus Pro can use S3 (and only S3, no Azure Blob Storage or something like that) and S3 is significantly cheaper than EBS. Nexus OSS does not support S3. Drawbacks: We'd still have to operate everything ourselves and Sonatype did not seem that interested to work with us.
  • Using GitHub, Azure Artifacts, AWS CodeArtifact, GCP ArtifactRegistry: Do not support the formats we need.
  • PackageCloud.io: They do not support P2, APK, but still interesting.
  • CloudSmith: They do not support P2, but are willing to add it.

Hey folks / @aahlenst; Lee from @cloudsmith-io here. We're happy to help if we can. We're firm believers in data portability and reducing vendor lock-in, which is why we offer things like the custom domains support. If P2 is critical, we can see about prioritising it for you.

Was this page helpful?
0 / 5 - 0 ratings