Entitas-csharp: Data locality / contiguous memory

Created on 20 May 2016 · 29Comments · Source: sschmid/Entitas-CSharp

This is an R&D issue. I'd be happy if you guys participate if you have ideas or experience in this field. The following problem is inherent to ECS architecture afaik.

History / current state / Motivation

Over time Entitas became faster and faster as I constantly optimize and profile all the features. The biggest leap in performance was to not store components in Dictionaries anymore but use Arrays instead. Component access time improved by a factor of 60x. This approach sacrifices memory in favor of speed. With the introduction of multiple pools I provided a mechanism to improve the memory footprint of entities to address this issue.
Currently entities store components in an array of a fixed size. If you have 200 components, each entity that you create has an array of capacity 200 to potentially store all components. Using multiple pools and assigning components to one or more of those pools results in entities with a smaller array. E.g. you have 2 pools, each with 100 components assigned (instead of 200), an entity only has to reserve memory for 100 components instead of 200. Awesome.

Currently this array is of type IComponent[], this means it stores pointers to the components. When you have a _class_ PositionComponent, the pointer to this object will be stored. If you create a _struct_ PositionComponent it will be boxed because the array is of type IComponent[] and again, the pointer is stored.

This means components are all over the place in the heap.
This is actually not really a problem. Entitas is battle tested on low-end mobile devices and proofed to be very efficient even with lots and lots of entities.

But since I'm striving for the best of the best performance possible it would be nice to find a solution to improve data locality / contiguous memory / CPU cache misses.

Ideas

Components as struct

This would currently not improve anything, because of boxing. Any ideas no avoid this?

Drop the array

Don't use an array at all... Use the code generator to generate actual fields in the entity. Generate multiple Entity types based on the pool to avoid every entity having fields for all possible components.

Your suggestions (share your ideas)

enhancement research rfc

Source

sschmid

Most helpful comment

https://jacksondunstan.com/articles/3453
https://jacksondunstan.com/articles/3468
https://jacksondunstan.com/articles/3325
https://jacksondunstan.com/articles/3740

AGeorgy on 3 Dec 2017

👍2

All 29 comments

If you have ideas that involves radical changes, I'm also fine with that :)

sschmid on 20 May 2016

The objective is to improve component access or component data access ?

I mean, to improve "entity.GetComponent()" or "component.someProperty" ?

What i did in Entitas++ was to assign a unique integer ID to each component type in runtime and then store in each entity a "Dictionary" of component pointers by their keys

std::map<ComponentId, IComponent*> mComponents;

I think this improves memory (and cache locality because entity's size?) but i think your solution is more faster anyway (or not? note it's a ordered map... key lookup should be fast, not as an array but...)

JuDelCo on 20 May 2016

I'm kind of dreaming about components being structs and they are all "next to each other" (contiguous memory) and no need to look up the reference first.

Using a dict in c# would not improve data locality afaik, at least as long the value is a ref to the actual component. I remember the component access (dict[key]) being a bottleneck

sschmid on 20 May 2016

Haven't checked yet, but aren't dict values are boxed, too (if they are structs)

sschmid on 20 May 2016

Blog post Data Structures for Entity Systems: Contiguous memory

sschmid on 20 May 2016

I think you can't do what you want because an entity is mutable though time, can have 2 components and suddenly add 10 more.

Maybe the solution is to offer a new type of "Entity" with fixed size and having always the components specified. An example use: "EntityTransform" (type) will have always a transform component (and not more, by previous declaration/design). This way there's no need to lookup for components and can be stacked in arrays. It's like having a Pool with only 1 component (or two, three...) but without lookup and always having those components.

JuDelCo on 20 May 2016

If I understand this correctly, the question is how to pack the components of _one entity_ together as tightly as possible, right? I think we need to reframe that question, since that isn't really what we want most of the time. When we get component A of an entity, we usually don't want component B of that same entity right after. What we rather want is component A of _another_ entity.

So how about this: All components of the same type are stored together in a shared array (so there will be one such array per component type, managed by the pool). An entity just holds an index into that array. When we loop over all the entities that have for example a Position component, all those components will probably already be cached. And this would even allow the components to be structs.

Storing components like this would need a bit of memory management (i.e. if we run out of space, we need to reallocate a bigger array) but that should be manageable.

Can this work or did I miss something?

arne-schroppe on 22 May 2016

@sschmid structs in dicts shouldn't be boxed as long as they implement IEquatable<T>.

I have written a few ECS's (too many), basically going with the approach @arne-schroppe describes. I use classes for components rather than structs. These get pre-allocated at startup so they are in the same area of the heap. While technically the CLR can move your stuff around in the heap, it's my, fairly limited, understanding things will generally stay put (open to clarification).

The topic of using managed memory for contiguous storage came up in one of the talks on the LMAX Disruptor (https://lmax-exchange.github.io/disruptor/). They basically said the same thing, I realise the original implementation is written in Java but I have tested the .net version and got almost identical results.

There are other options of course but none as simple as pre-allocated arrays of classes, imo. There will be the issue of growing the array as @arne-schroppe also mentioned. This would require a full copy of the current array, or, just accept you are going to have a cache miss at the end of the original array size, not really a big deal. The other thing, as mentioned in the t-machine post, the array will require defragmenting once you start adding removing entities. This can be solved, as I think is mentioned, with a lookup table (entityId -> instanceId).

I have more I wanted to write but have run out of time, will add soon.

braaad on 22 May 2016

Throwing in some links while I'm digging deeper into the matter

Modern C++: What You Need to Know (jump to 23:30)
Tips for Improving Time-Critical Code
Data Locality

From what I've read so far, there's a big potential to gain speed!

@braaad Looking forward for your next post :) That's exactly the kind of stuff I want to read :)

sschmid on 23 May 2016

So many repeated comments xD (And so many email notifications for the same)

Github bug?

JuDelCo on 23 May 2016

I'm sorry, yes, github had some issues
bildschirmfoto 2016-05-23 um 18 32 22

sschmid on 23 May 2016

@sschmid If you are going to go ahead with this, maybe a set of benchmarks need to be created similar to what you already have but doing a full simulation of some kind. It would need a range of scenarios, e.g. a large number of entities with few components, a small number of entities with a large number of components, frequent adding and removal of components etc. etc. There isn't really a goto implementation that is the best in all circumstances (that I am aware of). A compromise needs to be made to handle the general use case, or, provide the ability to swap the internal memory management out for different implementations easily.

I have been making a few notes as I have gotten time so here they are, not sure if they are useful or not. Mostly just from my own research / experience, nothing really out of the ordinary.

It's best to avoid "monolithic" entities and by that I mean storing state on the entity object itself. What I have done, and is a common way of doing it, is to just use struct for the entity which contains an ID. This ID is used as the key to access all it's relevant parts. I just have a free entity stack which I set it's initial size to be the max number of entities the system can handle. When you allocate a new entity you grab one from the stack and vice versa. The "component manager" can then make available that entity index for all it's component arrays.

Bit fields (or masks, whatever you want to call them) are pretty much the fastest, and easiest, way to handle entity filtering. With checking if an entity has all components being as simple as this return (this & other) == other;. However, you need 1 bit per component type (obviously) so can get large quite quickly. I went straight to using 128bit using two ulongs. Your use of pools though would alleviate this issue. Each bit field can be indexed by it's entity ID, like everything else.

I generate code for Copy methods on my components, I use this for templates(blueprints) and shuffling components around in memory (since nothing gets new'd at runtime). I don't use this for general immutability.

Some ideas I have had that I never got around to testing are:

The concept of static and dynamic components (the name static is probably a bit ambiguous). Static components can only be added when allocating a new entity. An example of static component would be Unity's transform component.
Treat components used a flags only (no data, just used for filtering) differently. Just add them to the component mask for the entity they are attached to. You already have this concept in Entitas.
Treat single instance components differently (once again you already have this concept). These gain no benefit from cache coherence and it's a waste to allocate an entire component array just for these that won't get used.

braaad on 26 May 2016

I had a discussion with @jamesmintram once about the size of the entity and why we use multiple pools. He proposed something interesting. As a pointer on an 64bit arch takes 8 bytes, what if the entity would store not an array of pointers but an array of indexes which can be 1 or 2 bytes big. Now if we say components are managed by a coordinator which has an array of components pre wormed, as @arne-schroppe and @braaad suggested, than entities can ask component coordinator/manager to replace, "destroy" or give them new components, where they get only an index. Components than can be structs or classes. As the coordinator holds only particular type of component no boxing should happen.
We should have an annotation, which let user define how many entities s/he intend to have in the game. If it is lower than 255 than we could even use 1byte index inside the array. 2bytes -> 65535 entities should be enough for most of the games. The annotation could be added to every component or just to the pool.

All that sad I think the most important point was mentioned by @braaad . We need a few very good tests in order to see if what we do make sense not only on "paper" but in real world scenario.
Also I liked the idea of FlagComponents being just part of the bit field/mask.

mzaks on 27 May 2016

👍1

Nice, that might improve both, memory footprint of an entity and data locality. I love where it's going :)

sschmid on 27 May 2016

Here is another relevant article
http://ithare.com/c-for-games-performance-allocations-and-data-locality/

mzaks on 29 May 2016

I sketched something in Swift
https://gist.github.com/mzaks/c6692a7a3c974ecedd3493322f017afc
Some features are language specific. e.g. I don't even need Component structs as Swift has Tuples, but generally the code reflects what I suggested on May 27th.
The definition of components can be done through unity UI or a DSL. And this way Entity, ComponentManager and Pools (I didn't introduced Pool in my sketch but I guess it is almost obvious how it would look) can be generated.

mzaks on 6 Jul 2016

@mzaks How would you go about efficiently combining position and velocity components for a movement system? I.e. how would you efficiently match a position component to it's according velocity component. To get the best performance from the data layout both component arrays should be iterated over in tandem. Your sketch makes it look like positions could be iterated over sequentially, and a second velocity array would basically require random access because the slot allocation doesn't enforce anything.

paraboxx on 14 Jul 2016

@paraboxx well to be precise my proposal will not even grant sequential access when iterating through positions only. Let's say we create 10 entities and give them all position components. This will result in entity 1 having position component at index 1 in position array, entity 2 at position 2 and so on. But what happens when now we destroy entities 5, 6, 7 and create entities 11, 12, 13, 14. Entity 11 will have its position at index 5, entity 12 at 6, entity 13 at 7 and entity 14 at 11. So we end up jumping around in the array anyways.
What I am trying to achieve with this architecture is to put the whole game state as compact as possible, hoping that it might fit into L2 or at least L3 cache. Making the footprint of an entity smaller and hoping that Component Manager stays also in reasonable size.
Lets say we have 200 entities with a position and velocity. I was very generous with position. I guess in a simple example a X and Y of two byte short should be enough, same for velocity. This means that our entity take 200 * 2 * 1 = 400 bytes and component manager 200 * 4 * 2 = 1600 bytes So all together we have 2000 bytes which is not that bad. Hm we also have the two nextOpenXYZ sets in the ComponentManager. To be honest I am not sure how much memory it consumes and if it will be loaded in to the memory as well, when we just using the get component methods on entities. But let's say we give every component another byte so 200 * 2 * 1 = 400 bytes. This makes a total of 2400 bytes.
To put thing into perspective, in current architecture the Entity is backed by an array of pointers. A pointer on a 64 bit CPU architecture is 8 bytes. Meaning that the entities alone will occupy 200 * 2 * 8 bytes = 3200 bytes

mzaks on 14 Jul 2016

@mzaks Ok yes, fitting as much as possible into the cache is a factor, too. I'm not sure how much good it does, though. For that to help there has to be a need to access the same data multiple times in close succession without anything big happening in between. There will be lots of other stuff going on between two movement updates that all potentially flush our data, Unity, rendering, physics, other programs running on the computer. This is why I think the most straightforward thing to do is to make sure data is accessed sequentially, to help the prefetcher do its work. But this is all guesswork without benchmarks and actual use cases.

paraboxx on 15 Jul 2016

@paraboxx The logical systems which change only the entities are executed on the same thread sequentially. Meaning that there should be no interrupts in between and there should be not that much data loaded into the cache during the execution of all logical systems. That sad I guess it also means that grouping of the systems is important. If you have systems which need to interact with GameObjects for example, it would be better to put them at the end of the systems list, so that logical systems can have the fastest throughput working only with Entities and components.

However I totally agree with you, those are just speculations without actually tests, they are purely theoretical.

mzaks on 15 Jul 2016

CppCon 2015: Vittorio Romeo
“Implementation of a component-based entity system in modern C++”

https://www.youtube.com/watch?v=NTWSeQtHZ9M

sschmid on 24 Jul 2016

Hey all, I'm a little late to the conversation but I've got some thoughts. For topics like this I think it's helpful to envision the ideal future and work backwards from there.

Smaller Entities

I imagine _entities_ as tightly packed structs containing only a few ulongs. As others eluded, bit fields are wonderful for conserving memory ~~but I didn't see anyone suggest using bit fields for the components themselves.~~ _EDIT: @braaad mentioned this and I think already described what I am describing here..._

Essentially, the code generator would create pool-bases entities and allocate the exact amount of space needed to store all the components and flags. Since we are using bit fields this is super efficient:

``` c#
// apologies if the code below isn't syntactically correct - I'm new to C#

[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 28)] // size is dynamic (code gen determined)
public struct PlayerEntity : IEntity {
[FieldOffset(0)]
public UInt32 id; // unique entity id
[FieldOffset(4)]
public byte[2] flags; // bit field for component flags
[FieldOffset(10)] // this offset is dynamic (code gen determined)
public byte[22] components; // bit field for attached components
};
```

This means that the actual size of an entity is 4 bytes + 1 byte per 8 flags + 1 byte per 8 components. Currently entities have a lot more responsibility than what is depicted above but perhaps that can be moved elsewhere.

If you have an Enemy pool that contains 200 components and 20 flags that gives you 32 bytes per entity. This means storing 1000 enemy entities only takes 32KB of memory. Of course each component's data would need to live somewhere but that is a different problem. From this, if we wanted a particular component's data we have a unique entity id and component offset (it's position in the bit field) that can be used as a unique index to get that data.

Component Managers

@arne-schroppe: All components of the same type are stored together in a shared array (so there will be one such array per component type, managed by the pool). An entity just holds an index into that array.

Yes! Except I don't think the entities need an index.

Since entities no longer store references to components they will need a component manager that takes the entity's id + the request component offset in the bit field and returns the actual component data. Again, in this architecture entities do not have a component "index" - they simply have a bit that tells them whether they have the component or not. The component manager takes the entity's id + bit position of component (which is unique for each pool) and solely based on that returns the data.

This means component managers would be tied to pools.

Best Practices

Use Pools

Pool usage == optimization opportunity

Logically Group Systems

@mzaks: If you have systems which need to interact with GameObjects for example, it would be better to put them at the end of the systems list, so that logical systems can have the fastest throughput working only with Entities and components.

I love this. This would be great in a best practices section.

cconstable on 3 Aug 2016

I'll also leave a couple of links:

https://www.youtube.com/watch?v=rX0ItVEVjHc
https://blog.molecular-matters.com/2013/07/24/adventures-in-data-oriented-design-part-3c-external-references/ <-- specifically about data oriented design for ECS's components
https://www.youtube.com/watch?v=ZHqFrNyLlpA

Stals on 27 Sep 2016

I recently made some tests with ideas I had. The results where more than exciting! It's fast - damn fast. And the memory is way less compared to Unity's approach. I might share more once I have the time to work more on that

sschmid on 31 Jan 2017

Which ideas did you try that made that "damn fast" execution?

KumoKairo on 31 Jan 2017

Short explanation:
Components as structs, preallocated array for each component type, entities as int.
Compared it to the Unity way, was a lot faster and about 10x less memory

sschmid on 31 Jan 2017

So a really basic ECS implementation in the end. But I'm thinking to build ontop of that.

sschmid on 31 Jan 2017

Thanks for you're input. I played with different solutions. While there is a potential performance benefits there will also be a drop of ease of use. I think Entitas is a great mix of high performance and simple-to-use. Will close

sschmid on 9 Apr 2017

https://jacksondunstan.com/articles/3453
https://jacksondunstan.com/articles/3468
https://jacksondunstan.com/articles/3325
https://jacksondunstan.com/articles/3740

AGeorgy on 3 Dec 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings