Boa: String interning

Created on 17 Mar 2020  路  10Comments  路  Source: boa-dev/boa

Will there be a string interner for more affective memory use and string comparisons.

Like rustc does it.

optimisation

Most helpful comment

I think that the string-interner crate is the way to go. It already provides a really nice, safe interner, and they are even discussing the optimisations laid out in the post you mentioned in https://github.com/Robbepop/string-interner/issues/16.

This could probably allow avoiding the use of String in TokenKind, which would make it Copy pretty easily, and reduce the memory footprint and improve the CPU cache usage.

All 10 comments

More specifically rustc interner.

Rustc Symbol doc:

An interned string.

Internally, a `Symbol` is implemented as an index, and all operations
(including hashing, equality, and ordering) operate on that index. The use
of `rustc_index::newtype_index!` means that `Option<Symbol>` only takes up 4 bytes
...

Which makes comparisons between symbols trivial, it is just a comparison between two u32.

The cost is when interning a string, since it requires to do a HashMap lookup.
Getting the string from a symbol is trivial since it is only indexing into an array(Vec).

I found a implementation of this string-interner

Yep, this is interesting.
The plan was always to move to something more efficient but start with strings first. (just to make it easier for implementation against the spec)

I had a conversation the with Spidermonkey devs about how we use strings for internal slots, but later want to move to something more faster. I think they do something like this.

I don't know very much about https://crates.io/crates/string-interner so this will be new to me, but im happy to let this issue be the conversation point.

Nice! This is pretty mush how rustc does it. and how string-interner crate is implemented.

I think that the string-interner crate is the way to go. It already provides a really nice, safe interner, and they are even discussing the optimisations laid out in the post you mentioned in https://github.com/Robbepop/string-interner/issues/16.

This could probably allow avoiding the use of String in TokenKind, which would make it Copy pretty easily, and reduce the memory footprint and improve the CPU cache usage.

I will give this a look, and maybe #336 too, at the same time.

Hi @Razican

Correct me if I'm wrong!!!

String interning only has a good effect if the string that's it is interning is supposed to be immutable, like identifiers.
I don't think string interning should be used in ValueData::String since it can mutate.
For Example:

var x = "";
for(var i = 0; i < 100; i++) {
    x += "1"
}

Without string interning in ValueData::String, x will be one string in memory.
With string interning in ValueData::String there will be in memory "", "1", "11", "111", etc.
And these values will last till interner goes out of scope (pretty much the end of the program).
So were in a way leaking memory (not really we still have ponters to the strings in memory).

String interning only has a good effect if the string that's it is interning is supposed to be immutable, like identifiers.

Yes, I was just trying a couple of things, but it does seem that way!

Interesting review of interners here. This could maybe co-exist with #565, or replace it.

With this review in mind, it's maybe worth taking a look at lasso too.

This might partially conflict with #736. We might need to create our own interner that allows using UTF-16 strings.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

elasmojs picture elasmojs  路  4Comments

Razican picture Razican  路  5Comments

IovoslavIovchev picture IovoslavIovchev  路  4Comments

gadget114514 picture gadget114514  路  3Comments

Razican picture Razican  路  4Comments