Jib: Split dependencies into two layers(silent and mutable)

Created on 15 Jun 2018 · 21Comments · Source: GoogleContainerTools/jib

Now Jib separates your application into multiple layers: dependencies, resources and classes, but still a question here. dependencies layer is still very big. We can split dependencies layer. Why create a new layer for dependencies? I did some research about dependencies in Java application. Some jars are very big, for example groovy jar, hibernate jar etc. These jars are very big but stable, and developers don't change their versions frequently. The developers change the company internal artifacts regularly, or changed by SNAPSHOT automatically. For example, a Spring Boot Application is almost 30+M, if we create a layer with following artifacts included, the original dependencies layer will be 5M almost, an new 25M layer(silent dependencies layer) could be stable for a long time. if the developer changes internal artifact version, only push the 5M layer(mutable dependencies layer), not original 30M layer again.

In Java, another consideration is SNAPSHOT. During development and testing phases, most guys use SNAPSHOT version. These jars will be changed regularly from continue integration system. For some case, for example, security jars always were delivered as SNAPSHOT version. 100K SNAPSHOT jar will make you rebuild the dependencies layer.

We can introduce a silent dependencies layer and group stable & big jars according to groupId and artifact wild match, or mutable dependencies layer.
jib-maven-plugin Configuration:

<silentDependencies>org.springframework:*, org.hibernate:*, *:commons-*, org.webjars:*</silentDependencies>
<mutableDependencies>com.yourcompany:*,*:*:*-SNAPSHOT</mutableDependencies>

the Dockerfile will copy slient jars into a new layer

COPY silent-libs /app/libs/
COPY mutable-libs /app/libs

all artifacts should be in libs directory to make dependencies check easy.

The followings are some very popular artifacts size information for references, and of course the developers can add other jars. for a spring boot application, almost silent dependencies layer is almost 80+% in size, and mutable dependencies layer almost 20-% in size.

Spring Framework related

spring all: spring & spring boot & spring cloud = 12M
reactor all: 3M

Apache

apache commons all: 1.8M

JVM languages

groovy: 4.7M
jruby: 10M
kotlin all: 3.5M
scala: 5.7M

web server

tomcat all: 3.6M
jetty all: 2.4
undertow all: 3M

Java EE

javax.*: stable, size is small
javaee: 2M

Driver & client

Oracle: ojdbc8.jar 4M
MySQL: 2M
H2 Database: 2M
postgresql: 0.7M
sqlserver: 0.9M
kafka: 7.5M

Misc

hibernate: 7.6M
jackson all: 2M
webjars: font-awesome(7.6M）, bootstrap(1.0M), jquery(1.5M)
byte-buddy: 2.9M
aspectj: aspectj* 3M
snappy-java: 1.1M
netty all: 2.3M
freemarker: 1.5 M
thymeleaf: 1.0 M
bouncycastle: 4M

discuss

Source

linux-china

👍7

Most helpful comment

FYI @velo @liqweed @linux-china @ivan-gammel @stigkj @d5nguyenvan @Mart-Bogdan @Ameausoone @steven-sheehy @mdiskin @yamass

The Jib Extension Framework is now available with the latest Jib versions. You can easily extend and tailor the Jib plugins behavior to your liking.

We've written a general-purpose layer-filter extension that enables fine-grained layer control, including deleting files and moving files into new layers.

For general information about using and writing extensions, take a look at the Jib Extensions repo.

chanseokoh on 11 Jun 2020

👍2

All 21 comments

Hi @linux-china , thanks for filing this very detailed issue! This suggested feature sounds like a great idea. Our team is currently at the O'Reilly Velocity conference and we will be back next week to look over this in more detail.

coollog on 15 Jun 2018

@coollog I added consideration for SNAPSHOT version in Java, and these jars should be considered as mutable.

linux-china on 15 Jun 2018

👍1

This feature will be really a step forward: I've been using Spotify Docker plugin for a while - with carefully written Dockerfile it's possible to achieve nearly the same result as with Jib. However, if frequently updated dependencies could be treated separately from stable ones, it will greatly reduce the size of updates and in some cases may reduce deployment times by minutes, which is a game changer for CI.

ivan-gammel on 10 Jul 2018

@ivan-gammel, currently (probably our first step) we're thinking of doing this as using a simple SNAPSHOT and non-SNAPSHOT heuristic. Is that sufficient for your use-case? @linux-china's propsal is a little more elaborate and we'd like to flesh that idea more out before we commit to anything.

loosebazooka on 10 Jul 2018

@loosebazooka I think it's a good first step, because it will cover 90% of the builds, but in the end it will be great to see the original proposal implemented.

In my projects I usually observe 3 different frequencies of changes in dependencies: first of all, snapshots - changes in current development stream, that may happen multiple times a day (e.g. parts of multi-module Maven projects). Then there are internal libraries (e.g. common data models or API interface declarations), which are patched once in 2-4 weeks and get architectural update once in 4-6 months. And finally the technology stack - Spring, Hibernate etc, which is usually updated once in a 6-9 months or to incorporate critical security fixes that are important for us (never happened so far). The snapshots and the stack usually have the biggest size.

Because of the small size, internal libraries can be included together with the snapshots, so the original proposal with two layers of dependencies by @linux-china fits very well.

ivan-gammel on 11 Jul 2018

Why not a configuration like this:

<layers>
  <layer>org.apache.*:*</layer>
  <layer>my.company.groupid:*</layer>
</layers>

layer format is groupId[:artifactId][:version][:type][:scope][:classifier]
inspire by enforcer plugin
https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html

The example above would split the current dependencies layer into 3 layers:

the first layer, contains all dependencies that did not met any criteria.
Next layer, will include apache jars matching org.apache.*:*
Last, my company jars

velo on 12 Jul 2018

@loosebazooka implemented the SNAPSHOT separation - to be released in version 0.9.7

@velo thanks for the suggestion! That is definitely a good way to configure it, but we probably want to avoid referencing layers since they are exposing image format implementation details to Jib's configuration. Also, it might be confusing to have layers as a configuration when we build other layers besides the dependencies ones. An alternative naming like matchDependencies or something related to grouping dependencies for caching purposes.

coollog on 14 Jul 2018

Maybe dependencyIsolation?

velo on 14 Jul 2018

Or even dependencyManagement:

<dependencyManagement>
     <dependencySet id="stack">
            <!-- All project dependencies matching those defined in Spring Boot POM 
                 will be included here. It's more difficult to implement, but it will reduce 
                 a lot of boilerplate, when dealing with technology stack definitions -->
           <import>org.springframework.boot:spring-boot-dependencies:${spring-boot.version}</import>
      </dependencySet>
      <dependencySet>
            <!-- default dependency set: 
                 includes all dependencies that are not matching other definitions -->
            <includeUnmatched />
      </dependencySet>
      <dependencySet id="libraries">
           <include>com.mycompany.common:*:*</include>
           <include>org.apache.something:somelib:1.5.RC1</include>
      </dependencySet>
      <dependencySet id="snapshots">
             <includeCurrentProject /> <!-- same as including ${project.groupId}:*:* -->
             <includeSnapshots /> !<-- same as including *:*:*-SNAPSHOT -->
      </dependencySet>
</dependencyManagement>

ivan-gammel on 14 Jul 2018

For Gradle one could have one layer for project dependencies or put it together with the SNAPSHOT deps.

stigkj on 14 Jul 2018

@ivan-gammel <dependencyManagement> is already something used by maven with a totally different meaning. Would suggest something else.

<includeUnmatched /> having to declared something like this implies that is possible to create a image

Also, the dependency list needs to be consumed a single time only. So if I have:

org.springframework.boot:*
org.springframework:*

The org.springframework:* should not include any boot dependencies.

Another thing that I wonder, is if when I create a org.springframework.boot:spring-boot-dependencies, should it follow and include it's dependencies too?

@coollog

we probably want to avoid referencing layers

Well, this configuration exists with the sole purpose of segmenting jars across multiple layers.... if we call it something else then layers we will need to make absolutely documented that this dependencySomething is in fact changing layers

velo on 14 Jul 2018

@velo I agree, that dependencyManagement might be confusing. Could you please clarify your comment regarding includeUnmatched? Some declaration for unmatched dependencies is necessary, because there can be no reasonable default layer for them. We could assume, that there will be an automatic layer created for them, but this behavior would not be obvious.

Regarding the springframework groups, they are different: org.springframework does not include org.springframework.boot, but org.springframework.* does. It's a good question, how the algorithm should behave, when a dependency matches more than one layer. Probably, it should be included in the first matched layer and a warning should be logged for each subsequent match?

Regarding spring-boot-dependencies, it's a BOM - the idea is to match against all dependencies listed there, not to include the spring-boot-dependencies artifact itself in some layer (note that "import" tag is used for it instead of "include").

ivan-gammel on 14 Jul 2018

includeUnmatched should be something that happens automatically. If people need to do it, and forget, jib will generate incomplete docker images.

velo on 15 Jul 2018

how the algorithm should behave, when a dependency matches more than one layer

I think a first come first served approach.

velo on 15 Jul 2018

Regarding spring-boot-dependencies, it's a BOM

Well, BOM files don't really include dependencies. It only lock's versions, so, by itself it should affect anything.

velo on 15 Jul 2018

For Gradle one could have one layer for project dependencies or put it together with the SNAPSHOT deps.

@stigkj @loosebazooka has implemented this in #584 and will be available in the next release (version 0.9.7)

Another thing that I wonder, is if when I create a org.springframework.boot:spring-boot-dependencies, should it follow and include it's dependencies too?

@velo this is a great point and should be considered further
There was also this rejected configuration proposal that had a similar layers configuration that looked something like:

<layers>
  <layer>
    <matchDependencies>org.springframework:*, org.hibernate:*...</matchDependencies>
  </layer>
</layers>

Regarding the springframework groups, they are different: org.springframework does not include org.springframework.boot, but org.springframework.* does. It's a good question, how the algorithm should behave, when a dependency matches more than one layer. Probably, it should be included in the first matched layer and a warning should be logged for each subsequent match?

@ivan-gammel Great point too! Another option is to match the most strict pattern.

coollog on 15 Jul 2018

Great initiative!

As a relevant reference, I'd like to point out https://github.com/gclayburg/dockerPreparePlugin is a Gradle plugin focused exactly on that. It's very beneficial for projects with several microservices, all sharing a similar base (e.g. Spring Boot + some infrastructure). The plugin prepares a Dockerfile with corresponding build staging directory structured with 3 layers:

A shared layer (commonServiceDependenciesLayer1) where all the common dependencies would go. They would be shared across the different microservices layers thus reducing the overall deployed size. These don't change very often but when they do they typically change for all services.
3rd parties relating to this particular module (dependenciesLayer2). These don't typically change very often.
Classes layer for the specific module (classesLayer3).

Most commonly a change to the module would be translated into several KBs for the classes layer only. The image and layers produced would also be very consistent and deterministic regardless of previous build history (or which CI server instance has created it). The plugin is specific to Spring Boot projects, but the technique would apply more generally to any Java project.

liqweed on 11 Oct 2018

Related: #1436. I think this issue subsumes #1436.

chanseokoh on 20 Mar 2019

We have settled with the following three layers for dependency JARs:

"project dependencies" (#1436 - fixed recently)
SNAPSHOT dependencies (added a long time ago)
all other dependencies

(#1436 will be included in the next Jib 1.4.0 release. We will update here once it goes live.)

Jib will automatically classify dependencies of your Java application and put them into corresponding layers. We think these three opinionated layers would cover most use-cases reasonably well. Closing the issue.

chanseokoh on 21 Jun 2019

v1.4.0 has been released with finer dependency layers, as @chanseokoh described!

TadCordle on 16 Jul 2019

FYI @velo @liqweed @linux-china @ivan-gammel @stigkj @d5nguyenvan @Mart-Bogdan @Ameausoone @steven-sheehy @mdiskin @yamass

The Jib Extension Framework is now available with the latest Jib versions. You can easily extend and tailor the Jib plugins behavior to your liking.

We've written a general-purpose layer-filter extension that enables fine-grained layer control, including deleting files and moving files into new layers.

For general information about using and writing extensions, take a look at the Jib Extensions repo.

chanseokoh on 11 Jun 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings