Lombok: [BUG] Non-deterministic bytecode generation in certain environments

Created on 12 Feb 2020  ·  6Comments  ·  Source: projectlombok/lombok

Describe the bug
When using certain combination of annotations lombok may produce different bytecode for exactly same source and environment on subsequent compilations. This prevents me from making reproducible build.

To Reproduce
bug-example.zip
This is sample project should demonstrate problem. Just run build-until-sha-different.sh (assuming you have bash/diff/sha1sum/maven). It will iterate building this same project several times until subsequent builds produce class files that is not exactly byte-to-byte equal.
I've used some classes from guava library, but doubt this is hard requirement, though this problem is very sensible to some unknown factors so I'll describe my environment in more details below. Problem is not reproducible without lombok annotations so I assume this happens because of lobmok.

Expected behavior
Should get exactly same bytecode each time if nothing had changed in source or environment.

Version info (please complete the following information):
lobok version is latest release:

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.12</version>
            <scope>provided</scope>
        </dependency>

This is happening in docker container based on alpine:3.11 with java and maven installed:

# java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (IcedTea 3.15.0) (Alpine 8.242.08-r0)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
# mvn -version
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /usr/share/java/maven-3
Java version: 1.8.0_242, vendor: IcedTea, runtime: /usr/lib/jvm/java-1.8-openjdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-1050-kvm", arch: "amd64", family: "unix"



md5-f9e675864fc4b392be3623c13b698e06



# cmp -lb 1_SimpleBug.class 2_SimpleBug.class
4053  66 6     65 5



md5-0ac80f6cc3247a9adfb31be6e7d3aa06



1_SimpleBug.class
   #35 = Utf8               supSetStringParameter10
   #36 = Utf8               Lcom/google/common/base/Supplier;

2_SimpleBug.class
   #35 = Utf8               supSetStringParameter10
   #36 = Utf8               Lcom/google/common/base/Supplier;

They are identical, so why these pointers suddenly different?
So it looks like lombok, during bytecode generation, had shifted these values somehow.

Most helpful comment

All of this was for nothing. Maven was main culprit. First time I ran my build it downloads lots of stuff that maven require (apart from my project dependencies) and then start compilation in same jvm. Second time build runs there is no need to download so compilation starts immediately. I don't know exactly why but first invocation of compiler is somehow affected by state of the jvm and this cause it to produce slightly different bytecode.

Solution was to add true to my pom.xml file and build now is totally reproducible... though takes ~2x more time :)

All 6 comments

I was curious and tried it on my machine (openjdk version "11.0.6" 2020-01-14, no docker) and could not reproduce it. I wrote a different script hoping to make the search faster (instead of n pairs collecting all n results), but this was probably just a bad math of mine. Anyway, I'm attaching it: build-until-sha-different.pl.txt

So it looks like lombok, during bytecode generation, had shifted these values somehow.

Lombok does no bytecode generation, it just modifies the AST. Concerning the non-determinism, I have no clue. Could you try to reproduce the problem without docker? Or using delombok?

Disclaimer: I'm not a project owner.

This problem is very sensitive to some unknown factors in the environment and code structure. E.g. I also cannot reproduce it on same version of jdk (1.8.0_242 though AdobtJDK build) on my windows machine. It not reproducing every build (need to run build several times). I cannot reproduce it without NonNull annotation on even one of the parameters. I cannot reproduce it with delombok. I can't even reproduce it without guava Supplier class as type of the parameters. Only combination of linux + jdk build version (alpine uses musl so jdk is slightly different) + docker + certain structure of the code + lombok Builder and NonNull annotations making this issue to appear. So maybe this is not even lombok problem but jdk/compiler problem... I'm not 100% sure. I don't think docker is actually makes any difference though, it is just a convenient way of creation exact environment I need. I can give my docker setup that I've used:
This opens shell in container (1):

# docker run -it alpine:3.11 /bin/ash
/ # apk add openjdk8 maven

in another bash copy bug-example project into container:

# docker cp bug-example 339a7bda:/root/

switch back to (1):

/ # cd ~/bug-example
/ # ./build-until-sha-different.sh

Even for me this not always work. I.e. sometimes it does produce exact same bytecode. Though with my real code it happens 100% of the time but I can't put it here it is too big and have way too many dependencies.

Here is docker image that have everything baked-in:
https://drive.google.com/uc?id=1i04_dVL0Rp5rxXCMuHaS4LYREkZjAAW1&export=download
You can try to reproduce it yourself with:

# docker load -i bugexample.img
# docker run -w /root/bug-example --name bugtest bugexample /bin/ash build-until-sha-different.sh

I find it plausible that lombok is (part of) the problem here, and welcome any PR that fixes it, but, given these 3 facts:

  • There is a high likelyhood that there's either absolutely nothing we can do about it (javac issue), or that the fix would be man-months of work (if it's because the nodes we generate have the same position, and that javac, ordinarily generating elements based on file position, then goes by some secondary sorting order; one which is non-deterministic per VM, such as system identityhashcode – we can fix that, by pretty much rewriting how lombok works. Not really feasible to do for something like this).

  • Non-deterministic code gen usually does not matter; you'd have to care about deterministic bytecode and most java users do not. We'd love to cater to those that do, of course, but, the # of people who are suffering even if this bug occurs is naturally low.

  • And that, as @AngryGami 's (excellent!) reporting seems to indicate, the bug requires some fairly exotic scenarios to even show up in the first place.

Throw those aspects into a blender and I'm afraid this has extremely low priority. We'll get around to fixing it probably sometimes before the year 3854.

:) "nondeterministing" was a word :), but thank you 👍

All of this was for nothing. Maven was main culprit. First time I ran my build it downloads lots of stuff that maven require (apart from my project dependencies) and then start compilation in same jvm. Second time build runs there is no need to download so compilation starts immediately. I don't know exactly why but first invocation of compiler is somehow affected by state of the jvm and this cause it to produce slightly different bytecode.

Solution was to add true to my pom.xml file and build now is totally reproducible... though takes ~2x more time :)

Was this page helpful?
0 / 5 - 0 ratings