Openj9: JCL patch to remove the references to java.lang.String field coder

Created on 3 Oct 2017  路  8Comments  路  Source: eclipse/openj9

This is a question brought up during discussion of https://github.com/eclipse/openj9/pull/177#discussion_r142498127.

OpenJDK j.l.String has three instance fields as following:
private final byte[] value; private final byte coder; private int hash;

On the other hand, OpenJ9 j.l.String has three instance fields as following:
private final byte[] value; private final int count; private int hashCode;
OpenJ9 bundles coder and count into a single field thus getting the benefit of having a count which speeds up substring operations and StringBuilder interaction while also supporting string compression.
However this breaks Java 18.3 raw builds as following stacktrace:
Exception in thread "main" java/lang/NoSuchFieldError: java/lang/String.coder at java/io/UnixFileSystem.canonicalize0 (java.base@9/NativeMethod:4294967295) at java/io/UnixFileSystem.canonicalize (java.base@9/UnixFileSystem.java:170) at java/io/File.getCanonicalPath (java.base@9/File.java:618) at java/io/File.getCanonicalFile (java.base@9/File.java:643) at jdk/internal/loader/URLClassPath.toFileURL (java.base@9/URLClassPath.java:253) at jdk/internal/loader/URLClassPath.<init> (java.base@9/URLClassPath.java:188) at jdk/internal/loader/ClassLoaders.<clinit> (java.base@9/ClassLoaders.java:85) at java/lang/ClassLoader.initializeClassLoaders (java.base@9/ClassLoader.java:183) at java/lang/Thread.initialize (java.base@9/Thread.java:422) at java/lang/Thread.<init> (java.base@9/Thread.java:153)
In addition, user applications attempting reflection access of j.l.String field coder will succeed in OpenJDK but fail at OpenJ9 which usually causes confusion and defect against J9.
Also we can't afford to add another field to j.l.String which will double the footprint of each string instance because existing 3 instance fields along with J9Class* occupy 16 bytes, and adding another field actually will need 16 more bytes due to the unit of allocation is 16 bytes (i.e., everything is 16-bytes aligned).

So the question is: can java.lang.String field count be renamed to coder, and still keep the performance benefit provided by current field count?

fyi @fjeremic @DanHeidinga @pshipton

vm jdk10

All 8 comments

I'm uncertain this is possible. The values coder can take is strictly defined as one of these two:
https://github.com/eclipse/openj9/blob/dcd5bbd7a3854ed77792457094b292853af30f89/jcl/src/java.base/share/classes/java/lang/String.java#L74-L75

Bundling it with a count could result in false positives for someone reflectively accessing this field as in the example above. How would one handle that? Not to mention it is incredibly confusing to union these two fields of different types (int vs byte).

A better question should be why is java/io/UnixFileSystem.canonicalize0 accessing private fields of the String class?

I don't think matching what OpenJDK does is a particularly strong argument. Both OpenJDK and OpenJ9 are independent implementations of the same standard. Nothing in the standard says certain private fields must exist in the java.lang.String class.

Direct access to the internal implementation details of an implementation specific class would seem to be the problem as @fjeremic points out. In the original issue I did discuss a bit of why we have a count field and the performance optimizations it permits would outweigh removing it in my opinion based on past experience with the class.

Though the stacktrace shows java/io/UnixFileSystem.canonicalize0 throws NoSuchFieldError: java/lang/String.coder, grep says it is jni_util.c referring the field in question as following:
````
String_coder_ID = (*env)->GetFieldID(env, strClazz, "coder", "B");

......

jbyte coder = (*env)->GetByteField(env, jstr, String_coder_ID);
if (coder != java_lang_String_LATIN1) {
    return getStringBytes(env, jstr);
}

``` It appears this native depends oncoder` value via reflection to determine how to retrieve the string content.

It appears this native depends on coder value via reflection to determine how to retrieve the string content.

We have a coder() method for this though. They can just as easily call the method which would work for both OpenJDK and OpenJ9:

https://github.com/eclipse/openj9/blob/dcd5bbd7a3854ed77792457094b292853af30f89/jcl/src/java.base/share/classes/java/lang/String.java#L83-L89

This is a package protected method so I assume other parts of OpenJDK already call this. No reason why java/io/UnixFileSystem.canonicalize0 shouldn't either.

It appears the reasonable approach here is to apply a JCL patch which removes all references to j.l.String field coder.

Please engage with @andrew-m-leonard and company to get a patch for this change submitted to OpenJDK's 18.3 (formerly Java 10) repo. Getting this kind of thing fixed upstream is the best outcome for everyone - more maintainable code for OpenJDK and fewer patches for OpenJ9

An OpenJDK patch should replace all occurrences of j.l.String field coder reflection with method reflection for coder() instead. A code snippet to get coder of a string instance should look like following:
jbyte getStringCoder(JNIEnv *env, jstring strObject) { jclass strClass = NULL; jbyte coder = 0; strClass = (*env)->FindClass(env, "java/lang/String"); if (NULL != strClass) { jmethodID coderMethod = (*env)->GetMethodID(env, strClass, "coder", "()B"); if (NULL != coderMethod) { coder = (*env)->CallByteMethod(env, strObject, coderMethod); } } return coder; }
The target file to be patched is jni_util.c.
@andrew-m-leonard, pls let us know if anything else is needed to submit a patch for this change to OpenJDK Java 18.3 repo.

This is resolved by #1786

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pshipton picture pshipton  路  3Comments

JamesKingdon picture JamesKingdon  路  5Comments

PowerUser1234 picture PowerUser1234  路  3Comments

Mesbah-Alam picture Mesbah-Alam  路  5Comments

BillOgden picture BillOgden  路  6Comments