Dataverse: EPIC: small footprint container usable for development, testing and production purposes

Created on 8 Nov 2018  路  4Comments  路  Source: IQSS/dataverse

IMHO this is an epic, not a single story.

This issue is successor to #5187 and closes it.

It is intended to serve as a base for solutions or make life easier in:

  • #5068 - Running integration tests for more than just API
  • #4172 - Payara 5 Upgrade
  • #5072 - Refactoring of S3 storage driver
  • #4665 - Docker in production and projects related to that (IQSS/dataverse-docker and the like)
  • #4040 - OpenShift / Kubernetes deployments

This is blocked by some stuff and relies on some prior art to be done:

  • #5288 - Dependency Housekeeping Done!
  • #5274 - Stripping AWS dependency and TrueZIP from POM
  • #5360 - Dependency Housekeeping, second round
  • #5293 - Simplify configuration of the application.
  • #5345 - Make EJB timers non-persistent to drop database requirement.
  • #5361 - Bootstraping empty instance
  • payara/Payara#3506 and payara/docker-payaraserver-full#72 - Logging to console eats 1 CPU

Things to consider:

  • Add a HEALTCHECK CMD that uses a to-be-built API endpoint reporting on health status?

Vision / Proposal

Currently, when running integration tests or deploying Dataverse to Docker/Kubernetes, only kind of heavyweight solutions exist with the DockerAIO for IT tests and most (all?) Docker/Kubernetes/OpenShift approaches relying on the installer script.

I encourage the following vision:

  1. Build a new image directly from a Maven target, needing only a dev env plus Docker installed and running (obsolete once img support is in place...)
  2. Make this image as small as possible with an application server only, add dependencies and the application.
  3. Anything else lives in other containers, following the micro services credo.
  4. Make the application container itself stateless, also following the micro services credo. (This does not affect the use of volumes/...)
  5. Make the configuration a breeze - don't use the install script inside the container. Instead provide options to get the configuration inside from external sources.

To get there I suggest using:

Things to keep in mind:

  1. Ideally this is based on Payara 5, not 4.
  2. Let users still use the "old" WAR file approach in parallel! Somebody might rely on that. (That's why I killed #5187)
  3. Let the configuration ways currently know to all users still work. Somebody might rely on that!

Give it a shot! (Testing)

To test, just have Docker, Maven, Git and Java installed.
Then do:

git clone https://github.com/poikilotherm/dataverse -b 5292-small-container
mvn -Pcontainer clean package docker:build docker:run -DskipTests

Please keep in mind that this is a feature branch. If you already have a cloned dataverse repo, you might better off using:

git remote add poikilotherm https://github.com/poikilotherm/dataverse
git fetch poikilotherm 5292-small-container
git checkout -b 5292-small-container

I regularly update this feature branch to be based on the latest develop. This involves rebasing, which will let your local branch be diverged. In that case, simply use git reset --hard poikilotherm/5292-small-container after a fetch.

Most helpful comment

@pdurbin and other: initial work on the building part has been to my feature branch.

Right now this will (of course) not work. The Postgres driver is missing right now and the config part (see #5293) has to be addressed first.

As I wrote in the commit message, the upstream container project is in some parts not very responsive/active (see here, here, here and here).

Will try my best to get things upstream, but maybe better fork and try to get this upstream later:

  • [ ] Fix the missing signal handling including gracefull shutdowns
  • [ ] Get inspiration from the stuff that @shoeper did
  • [ ] Adress more issues from upstream.

Also wondering if automated builds and security scans from quay.io could be interesting for this.

All 4 comments

@pdurbin and other: initial work on the building part has been to my feature branch.

Right now this will (of course) not work. The Postgres driver is missing right now and the config part (see #5293) has to be addressed first.

As I wrote in the commit message, the upstream container project is in some parts not very responsive/active (see here, here, here and here).

Will try my best to get things upstream, but maybe better fork and try to get this upstream later:

  • [ ] Fix the missing signal handling including gracefull shutdowns
  • [ ] Get inspiration from the stuff that @shoeper did
  • [ ] Adress more issues from upstream.

Also wondering if automated builds and security scans from quay.io could be interesting for this.

I am currently working on an updated payara image, see https://github.com/poikilotherm/docker-payaraserver-full/tree/refactor . Will try to get my work upstream, just a few hours ago they merged stuff :-D

Since the upstream merged a lot of stuff cleaning up most of the issues (see here), I switched back to use those. Some issues still exist with the init system.

I just opened PR payara/docker-payaraserver-full#61 and hope things will get merged.

Current niceness blockers:

  • It would be a great relief not to include the bootstrap shell scripts but make Dataverse do it themself. #5361, #7256
  • It would so great to not fiddle with the Postgres driver, but include it in the WAR. #6819
  • It would be so nice not to create a tedious init script configuring Dataverse on container boot. This is true for multiple resources.
    A start to be made with the DB connection, using @DataSourceDefinition, see #6819. JMS etc to follow.
  • No more fiddling with password aliases by using secrets via MicroProfile Config. #7000, #5293
  • No more fiddling with configuration parsing during startup by using MicroProfile Config. #7000
  • Some other, smaller things to do...

But maybe we can just deal with it for now. Jenkinsfiles missing. Solr image missing.

Was this page helpful?
0 / 5 - 0 ratings