Vector: Benefits and downsides of `Atom`s

Created on 21 Feb 2020  路  3Comments  路  Source: timberio/vector

In this issue I want to discuss are there still any real benefits in using Atoms as key values in LogEvent instead of Strings.

It seems to me that switching to BTreeMap (which uses actual strings comparisons instead of comparing hashes) and introduction of default log schemas (https://github.com/timberio/vector/pull/1769, which reduced use of static atoms) greately reduced benefits of using Atoms instead of plain Strings. On the other hand, maintaining code working with Strings directly is easier and some invocations of clone over the codebase could potentially be avoided with them.

data model performance idea tech debt

Most helpful comment

Some simple benchmarks show that we get about 6-7% more throughput by removing Atom in favor of std::string::String.

Better performance and better ergonomics make this a clear win, so we should do it.

All 3 comments

This is a great question! They definitely add a bit of friction to the development process. Their clone is very cheap, so I wouldn't worry about that too much.

I'd be curious to see the effects on throughput due to memory use (duplicate string keys on the heap) and CPU (interning and reusing atoms vs allocating and deallocating strings). With some numbers, we'd be able to weigh ergonomics in the code vs any performance differences.

@a-rodin Would using Strings ease integration with Lua and Javascript transforms? Seems like it would for WASM/FFI in general

Some simple benchmarks show that we get about 6-7% more throughput by removing Atom in favor of std::string::String.

Better performance and better ergonomics make this a clear win, so we should do it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

MOZGIII picture MOZGIII  路  3Comments

leebenson picture leebenson  路  3Comments

a-rodin picture a-rodin  路  4Comments