Presto: Mixed usage of Joda and j.u.time leads to incorrect result

Created on 10 Jun 2017  路  8Comments  路  Source: prestodb/presto

In Presto 0.178 (or earlier) running Java 8u101 (or later), we get

presto> select date_trunc('day', FROM_UNIXTIME(1496548289) AT TIME ZONE 'Pacific/Easter');
                 _col0
----------------------------------------
 2017-06-02 23:00:00.000 Pacific/Easter

This bug is mitigated by commit 12188921904004d969423e2181930773f731c851. However, the root cause persists.

The cause is that Presto uses both joda and java.util.time. Therefore, when tzdata version mismatches, the result can be completely wrong (with respect to any tzdata version).

As joda and java upgrades independently, there will inevitably be mismatch of tzdata. When that happens, issue like the above will happen again (in a different timezone).

Permanent solutions are more complicated. Possibilities under investigation by @electrum include

  • Use joda exclusively in Presto (this would involve shading joda in spi)
  • Make joda and java.util.time use the same tzdata

In the particular case shown above:

  • DateTimeFunctions.truncateDate uses Joda
  • @JsonValue SqlTimestampWithTimeZone.toString() uses java.util.time
  • Joda v2.8.2 uses tzdata 2015f. See Joda Release Notes 2.9+, Joda Old Release Notes
  • Java 8u101 or later uses tzdata 2016d or later. See Oracle: Timezone Data Versions in the JRE Software
  • This combination results in the incorrect result shown above. In 2016, Chile changed its DST rules from always on (+1 year-round) to only on in summer (+1 Aug-May, 0 May-Aug). This change was incorporated into IANA tzdata in 2016c.
stale

Most helpful comment

@electrum @findepi

Joda gets default zone provider this way:

    private static Provider getDefaultProvider() {
        // approach 1
        try {
            String providerClass = System.getProperty("org.joda.time.DateTimeZone.Provider");
            if (providerClass != null) {
                try {
                    Provider provider = (Provider) Class.forName(providerClass).newInstance();
                    return validateProvider(provider);
                } catch (Exception ex) {
                    throw new RuntimeException(ex);
                }
            }
        } catch (SecurityException ex) {
            // ignored
        }
        // approach 2
        try {
            String dataFolder = System.getProperty("org.joda.time.DateTimeZone.Folder");
            if (dataFolder != null) {
                try {
                    Provider provider = new ZoneInfoProvider(new File(dataFolder));
                    return validateProvider(provider);
                } catch (Exception ex) {
                    throw new RuntimeException(ex);
                }
            }
        } catch (SecurityException ex) {
            // ignored
        }
        // approach 3
        try {
            Provider provider = new ZoneInfoProvider("org/joda/time/tz/data");
            return validateProvider(provider);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        // approach 4
        return new UTCProvider();
    }

As a result, we can implement a custom org.joda.time.tz.Provider (public interface in joda) ourselves (by delegating to ZoneInfoProvider probably).

We can put all versions of tzdata in Presto distribution. Have our implementation of Provider detect the joda version of the current JVM, invoke ZoneInfoCompiler (which compiles to joda specific format) on the fly, and then construct ZoneInfoProvider with the compiled tzdata. All calls to our implementation of Provider will then delegate.

Alternatively, we can also precompile the tzdata with ZoneInfoCompiler and ship that (to avoid the compilation step at runtime).

Now, the question is how to detect what tzdata the current JVM is using. We could hardcode the mapping according to https://www.oracle.com/technetwork/java/javase/tzdata-versions-138805.html , but that poses two problems:

  • Users won't be able to run with non Oracle Java version or future Java version (we can probably provide an escape hatch that allows users to manually specify a system property)
  • tzdata can be updated independently of Java version using Oracle provided Timezone Updater Tool.

Since Java 8, there is a public API to get current tzdata version in JVM: answer from Andreas (currently 2nd) in https://stackoverflow.com/questions/7956044/java-find-tzdata-version-in-use-regardless-of-jre-version

All 8 comments

This bug is mitigated by commit 1218892. However, the root cause persists.

Shall we rename the issue to "Combined usage of Joda Time and Java Time leads to repeated problems due to time zone database discrepancies" ?

Possibilities [...] Use joda exclusively in Presto

Perhaps it would be worth noting, why using java time exclusively is not a considered option.

I just renamed it. Is it better now?

@dain did a bunch of investigation a long time ago on java.util.time performance. The primary problem is that java.util.time doesn't work with utcMillis (as long) directly. It is required that one provide an actual object, which has year/month/day/hour/... fields eagerly populated. This introduces both cpu/gc issues.

@dain can give you more context on this matter.

I think it might be more preferable to use joda exclusively. Updating joda version is easier than changing jvm version, also joda's time database got updated more frequently.

@electrum @findepi

Joda gets default zone provider this way:

    private static Provider getDefaultProvider() {
        // approach 1
        try {
            String providerClass = System.getProperty("org.joda.time.DateTimeZone.Provider");
            if (providerClass != null) {
                try {
                    Provider provider = (Provider) Class.forName(providerClass).newInstance();
                    return validateProvider(provider);
                } catch (Exception ex) {
                    throw new RuntimeException(ex);
                }
            }
        } catch (SecurityException ex) {
            // ignored
        }
        // approach 2
        try {
            String dataFolder = System.getProperty("org.joda.time.DateTimeZone.Folder");
            if (dataFolder != null) {
                try {
                    Provider provider = new ZoneInfoProvider(new File(dataFolder));
                    return validateProvider(provider);
                } catch (Exception ex) {
                    throw new RuntimeException(ex);
                }
            }
        } catch (SecurityException ex) {
            // ignored
        }
        // approach 3
        try {
            Provider provider = new ZoneInfoProvider("org/joda/time/tz/data");
            return validateProvider(provider);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        // approach 4
        return new UTCProvider();
    }

As a result, we can implement a custom org.joda.time.tz.Provider (public interface in joda) ourselves (by delegating to ZoneInfoProvider probably).

We can put all versions of tzdata in Presto distribution. Have our implementation of Provider detect the joda version of the current JVM, invoke ZoneInfoCompiler (which compiles to joda specific format) on the fly, and then construct ZoneInfoProvider with the compiled tzdata. All calls to our implementation of Provider will then delegate.

Alternatively, we can also precompile the tzdata with ZoneInfoCompiler and ship that (to avoid the compilation step at runtime).

Now, the question is how to detect what tzdata the current JVM is using. We could hardcode the mapping according to https://www.oracle.com/technetwork/java/javase/tzdata-versions-138805.html , but that poses two problems:

  • Users won't be able to run with non Oracle Java version or future Java version (we can probably provide an escape hatch that allows users to manually specify a system property)
  • tzdata can be updated independently of Java version using Oracle provided Timezone Updater Tool.

Since Java 8, there is a public API to get current tzdata version in JVM: answer from Andreas (currently 2nd) in https://stackoverflow.com/questions/7956044/java-find-tzdata-version-in-use-regardless-of-jre-version

@hellium01 Using joda exclusively could be hard. I'll let @electrum and @martint to weigh in on whether that is doable if we have a strong will.

Yes, I think we can get the tzdata version by getVersion method on ZoneInfoFile. So the method @haozhun proposed do seem favorable.

Reopening this issue as we removed the joda-to-java-time-bridge registration until the cache thrashing issue is resolved.

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aandis picture aandis  路  4Comments

electrum picture electrum  路  4Comments

haozhun picture haozhun  路  4Comments

shigechuanqi picture shigechuanqi  路  3Comments

sopel39 picture sopel39  路  3Comments