Graal: Introduce TruffleString

Created on 26 May 2020  路  7Comments  路  Source: oracle/graal

The goal of this issue is to publicly collect requirements and document the design and implementation of TruffleString, a language-agnostic string representation for string-like objects within Truffle languages.

Quoting @chumer:

The current design is immutable. Buffers are a different story we won鈥檛 cover in the first design. [...] lots of Ruby requirements coming in from the Ruby team. Now TruffleString (is) in the hand of the regex team, which is naturally a cross language team.

| Language Implementation | Type | Comments |
|:----------|:----------:|:-----------|
| Espresso | immutable | Implementation not (yet) public. |
| FastR | mutable | Uses byte[] and String within its CharSXPWrapper, which is used in RStringVecNativeData |
| GraalPython | immutable | Uses Java CharSequence as part of its PString. |
| Graal.js | immutable | Uses Java CharSequence within a DynamicObject as part of its JSString. |
| SOMns | immutable | Uses Java String as internal string representation. Has support for immutable symbols. |
| TruffleRuby | mutable | Uses ropes to provide mutability, has multiple encodings, not all encodings fully compatible with Unicode, zero-copy concatenation critical for performance of production code. |
| TruffleSqueak | mutable | Supports ByteString (byte[]) and WideString (int[]). A ByteString becomes a WideString when a value outside the byte-range (e.g. a unicode char) is put into it. |

Please put this issue on the Truffle project and assign it to @chumer and @djoooooe.

Edit 1: Add Espresso.
Edit 2: Incorporate @chrisseaton's Ruby comment extension (see https://github.com/oracle/graal/issues/2505#issuecomment-633982977).

tracking truffle

All 7 comments

For TruffleSOM and derivates (SOMns, Moth (a Grace)) we indeed expect Strings and Symbols (interned Strings) to be immutable like in Java.
Rope-like append and sharing of substrings might be a nice things to have, though, I am not too sure about the tradeoffs.

I wonder if the ropes optimization could somehow be implemented in an optional manner, so that language implementations could decide if they want to use it or not. Implementing ropes again and again in some languages seems redundant.

Also, I'm of course interested in what TruffleString means for interop: will it replace String as the exchange representation between languages? Does my language have to create TruffleStrings on the fly for interop if it does not use it internally?

The current plan would be to replace String with TruffleString in interop, but offer native support for String, i.e. the actual parameter type in Java will be Object, and passing a String would still be valid.

Language Implementation | Type | Comments
-- | -- | --
TruffleRuby | mutable | Uses聽ropes聽to provide mutability, has multiple encodings, not all encodings fully compatible with Unicode, zero-copy concatenation critical for performance of production code

You might want to consider support for viewing strings as sequences of extended grapheme clusters, which is increasingly supported by modern languages (e.g., Dart, Swift).

https://medium.com/flutter-community/working-with-unicode-and-grapheme-clusters-in-dart-b054faab5705

Tracking internally as GR-17176.

Thanks Boris. Please engage with us early and often on your designs and prototypes. My use-case is very sensitive to string performance and I can experiment on real workloads.

Was this page helpful?
0 / 5 - 0 ratings