Over the past five days I've dedicated a large block of time into looking at alternatives to EF Core. This is a consideration that we have been talking about internally for a while now, for the following reasons:
store.Refresh() to retrieve a "live" version which can be modified and updated correctly.The first step was to decide whether we should be sticking with SQLite
Pro:
Con:
I an initial effort to move forward with simplicity in mind, I attempted switching to dapper + LiteDB. LiteDB is a very lightweight (and not heavily maintained or used) project which allowed to quickly test things out without changing our models too much.
The branch above is in a mostly working state, but brought up a few crippling issues:
While this isn't a killer, it feels like a step backwards in terms of code quality, readability and maintainability. Rider will give inspections on the SQL to ensure correctness to some extent, but this relies on the table name being present in the queries (which it isn't in my branch, to use our existing store layout).
I ran into this issue. It has a multi-year history and no sign of being fixed without local workarounds. This isn't necessarily Dapper's fault either – it's a shortcoming in the communications between the sqlite native/wrappers and the Dapper mappings which can't easily be resolved without a workaround that would affect other database engines too.
At this point I stopped looking towards dapper as it didn't feel like the correct direction, and would leave a lot of the issues that we have with EF core.
Realm is a lightweight database that boasts a "zero-copy" architecture. All queries on the database return live IQueryable<T> instances, and all models and relations are lazy-marshalled on access. This means very low memory and allocation overhead on retrieval. It also means you do not need to specify relations to include on retrieval – you have access to all of them to an infinite recursion depth. Here's some good further reading to get up to speed on the points of pain that can be encountered with realm.
For the record, this is my second attempt at using realm. This time I was more confident as over the last year I have been thinking about how we can work around the limitations it has which affect us:
One example which required changes is DatabasedKeyBinding : KeyBinding : RealmObject. This can be quite easily solved by flattening the inheritance somewhat. Another was LegacyScoreInfo, which tidied up very nicely, resulting in better code quality and better defined behaviour than before.
There are a few remaining cases such as APIBeatmap which is deriving BeatmapMetadata – this is a very ugly inheritance and one which I believe we should be looking to fix anyway.
I believe this limitation will actually benefit us from a code quality perspective, and make our data models more portable.
Each thread needs its own realm context. This is the same as EF and slots in nicely in our DatabaseContextFactory with no issue. The caveat is that objects retrieved on one thread cannot be accessed on any other thread.
This is something we do absolutely everywhere. Consider a simple class:
public class DifficultyIcon : CompositeDrawable, IHasCustomTooltip
{
private readonly BeatmapInfo beatmap;
private readonly RulesetInfo ruleset;
public DifficultyIcon(BeatmapInfo beatmap, RulesetInfo ruleset = null, bool shouldShowTooltip = true)
{
this.beatmap = beatmap ?? throw new ArgumentNullException(nameof(beatmap));
this.ruleset = ruleset ?? beatmap.Ruleset;
}
[BackgroundDependencyLoader]
private void load(OsuColour colours)
{
iconContainer.Children = new Drawable[]
{
new CircularContainer
{
Child = new Box
{
RelativeSizeAxes = Axes.Both,
Colour = colours.ForDifficultyRating(beatmap.DifficultyRating),
// ^ invalid in realm
},
},
new ConstrainedIconContainer
{
Icon = ruleset?.CreateInstance()?.CreateIcon()
// ^ invalid in realm
}
};
}
}
Another example is that we store a Bindable<BeatmapInfo> at an OsuGame level. Without special attention the whole game breaks due to these cross-thread accesses.
I did a good amount of reading on how realm should be used and it seems to favour the mobile app development model, where you:
For us, we use a different model with our BackgroundDependencyLoader async flow. We expect visuals or UI elements to take longer than a frame to load in some cases and don't want to be inhibited by the main UI thread. This is a very different approach but one that will not be changing any time soon, as it is one of the core features/paradigms of osu-framework.
So, we have two options available to use realm in our ecosystem:
Copying the data from realm into an in-memory version would completely solve this issue with no side-effects, but does limit the usefulness and performance benefits of realm (and in some people's eyes is outright blasphemy).
While I agree we definitely shouldn't be doing this everywhere as the copy-to-memory overhead is actually quite large when compared to EF core, it does have its place. In my implementation so far there are two use-cases which require this:
BeatmapInfo but does not wish to affect the databased version.RulesetStore and RulesetInfo in general, which is passed around and available game-wide as a bindable, but (currently) never changes. The overhead of using a realm-backed object here would be higher than detaching/copying, even if threading was not an issue.I have implemented this using Automapper, via the following implementation:
private static readonly IMapper mapper = new MapperConfiguration(c =>
{
c.ShouldMapField = fi => false;
c.ShouldMapProperty = pi => pi.SetMethod != null && pi.SetMethod.IsPublic;
c.CreateMap<BeatmapDifficulty, BeatmapDifficulty>();
c.CreateMap<BeatmapInfo, BeatmapInfo>();
c.CreateMap<BeatmapMetadata, BeatmapMetadata>();
c.CreateMap<BeatmapSetFileInfo, BeatmapSetFileInfo>();
c.CreateMap<BeatmapSetInfo, BeatmapSetInfo>()
.ForMember(s => s.Beatmaps, d => d.MapFrom(s => s.Beatmaps))
.ForMember(s => s.Files, d => d.MapFrom(s => s.Files))
.MaxDepth(2);
c.CreateMap<DatabasedKeyBinding, DatabasedKeyBinding>();
c.CreateMap<DatabasedSetting, DatabasedSetting>();
c.CreateMap<FileInfo, FileInfo>();
c.CreateMap<ScoreFileInfo, ScoreFileInfo>();
c.CreateMap<SkinInfo, SkinInfo>();
c.CreateMap<RulesetInfo, RulesetInfo>();
}).CreateMapper();
public static T Detach<T>(this T obj) where T : RealmObject
{
if (!obj.IsManaged)
return obj;
return mapper.Map<T>(obj);
}
Mapping specifications must be made for each model, including collection members and lookup depth. I believe the above can be further optimised by reducing inclusions further, as I believe there are some circular references present at a lookup depth of 2.
This is the recommended approach and realm actually provides an API for passing objects between threads. While the API doesn't work so well for us, because it requires explicit implementation at every usage, the concept can still be used by us by re-fetching via primary key from each thread.
Doing this manually would be quite a pain, so I came up with another proposal...
I ended up creating a class to manage passing realm objects around inside osu!:
public class RealmWrapper<T> : IEquatable<RealmWrapper<T>>
where T : RealmObject, IHasPrimaryKey
{
public string ID { get; }
private readonly ThreadLocal<T> threadValues;
public readonly IDatabaseContextFactory ContextFactory;
public RealmWrapper(T original, IDatabaseContextFactory contextFactory)
{
ContextFactory = contextFactory;
ID = original.ID;
var originalContext = original.Realm;
threadValues = new ThreadLocal<T>(() =>
{
var context = ContextFactory?.Get();
if (context == null || originalContext?.IsSameInstance(context) != false)
return original;
return (T)context.Find(typeof(T).Name, ID);
});
}
public T Get() => threadValues.Value;
public RealmWrapper<TChild> WrapChild<TChild>(Func<T, TChild> lookup)
where TChild : RealmObject, IHasPrimaryKey => new RealmWrapper<TChild>(lookup(Get()), ContextFactory);
public static implicit operator T(RealmWrapper<T> wrapper)
=> wrapper?.Get().Detach();
public bool Equals(RealmWrapper<T> other) => other != null && other.ID == ID;
}
This is the core of the realm implementation and probably the most important class. It allows us to use realm with minimal changes (initially) but also expand our usage to include realm's observable changes and zero-copy optimisations as we require. In my branch (at the time of writing this) RealmWrapper is used correctly by the BeatmapCarousel and MusicController, as these are scenarios where performance is important.
All data stores should return RealmWrapper<T> (BeatmapManager already does in my branch). Consumers that want to benefit from the realm-backed object should accept a RealmWrapper<T> instead of a T, then "unwrap" it locally via Get(), on usage.
What makes this tick:
Get() will ensure you have a realm-backed object usable on the current thread, via a re-lookup using the primary key.T allows it to be used in all existing usages. This will force a copy-to-memory via Detach(), so it does come with an overhead.Further improvements I would like to make to RealmWrapper<T>:
Live<T> or LiveData<T> to better denote what it is.ThreadLocal is probably overkill. I think storing the last instance is enough (and will be more memory/performance efficient).IsSameInstance check into Get() depends on realm instance lifetime management direction.IBindable or similar if we can.ctor which creates an unmanaged RealmWrapper via WrapAsUnmanaged. This allows classes provided models (especially tests) to do so without explicitly changing the code to wrap inside the RealmWrapper class.Using the above two methods combined, my branch is in a usable state for importing beatmaps and playing the game. Some auxiliary functionality like selecting the current skin in settings will crash the game, still. I wanted to get the song select and gameplay loop working so I could benchmark the new structure and ensure we have not regressed, before continuing any further.
We save around 1.5 seconds on startup which used to be consumed by EF core, on a multi-core system. The benefit will be larger with fewer cores, I believe.


Startup memory usage is also lower than EF:

...although things seem to even out after a longer play session (needs further investigation)
Realm is made to be run on non-spinning solid storage media. In order to test realm under the worst scenario possible, I ran some read-heavy benchmarks on an average speed USB drive.
RealmObjectRealmObject.RealmObject.RealmObject using a primary key (in a table with 10,000 rows). Basically what we will be doing on each thread when accessing a RealmWrapper<T> the first time.RealmObject.With no external load:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------- |-----------:|----------:|----------:|-------:|-------:|------:|----------:|
| TestUnmanagedRealmWithConstruction | 827.3 ns | 16.52 ns | 20.90 ns | 0.0544 | 0.0267 | - | 288 B |
| TestUnmanagedRealm | 637.7 ns | 4.02 ns | 3.56 ns | 0.0496 | - | - | 264 B |
| TestManagedRealm | 2,098.5 ns | 24.94 ns | 23.33 ns | 0.0839 | - | - | 456 B |
| TestManagedRealmWithFetch | 9,945.9 ns | 143.28 ns | 127.01 ns | 0.1984 | 0.0916 | - | 1096 B |
| TestBasicWithConstruction | 378.2 ns | 2.17 ns | 1.93 ns | 0.0472 | - | - | 248 B |
| TestBasic | 626.3 ns | 3.51 ns | 3.28 ns | 0.0496 | - | - | 264 B |
Worst case is around one order of magnitude difference in property retrieval performance.
Let's try again with high external write load induced by dd if=/dev/urandom of=/Volumes/UNTITLED/test-output bs=1024 count=40000000000 (100% IO saturation):
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------- |------------:|----------:|----------:|-------:|-------:|------:|----------:|
| TestUnmanagedRealmWithConstruction | 859.1 ns | 16.77 ns | 22.96 ns | 0.0544 | 0.0267 | - | 288 B |
| TestUnmanagedRealm | 626.3 ns | 9.26 ns | 8.66 ns | 0.0496 | - | - | 264 B |
| TestManagedRealm | 1,953.2 ns | 39.06 ns | 73.36 ns | 0.0839 | - | - | 456 B |
| TestManagedRealmWithFetch | 10,317.6 ns | 284.44 ns | 338.60 ns | 0.1984 | 0.0916 | - | 1096 B |
| TestBasicWithConstruction | 384.3 ns | 2.53 ns | 2.36 ns | 0.0472 | - | - | 248 B |
| TestBasic | 632.4 ns | 12.45 ns | 11.64 ns | 0.0496 | - | - | 264 B |
Thankfully, it looks like either realm or OS file cache is enough to ensure little performance deviation. Writes on the other hand are noticeably slower (during the test setup when inserting rows, this is noticeable), but this is to be expected and out of the scope of testing for now (writes will generally be async so are not relevant).
In a more optimal scenario, here's the same benchmarks on my desktop with SSD backing:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------------------------------- |------------:|----------:|----------:|-------:|-------:|------:|----------:|
| TestUnmanagedRealmWithConstruction | 839.4 ns | 15.75 ns | 16.17 ns | 0.0544 | 0.0267 | - | 288 B |
| TestUnmanagedRealm | 642.6 ns | 8.02 ns | 7.11 ns | 0.0496 | - | - | 264 B |
| TestManagedRealm | 2,036.5 ns | 37.74 ns | 33.45 ns | 0.0839 | - | - | 456 B |
| TestManagedRealmWithFetch | 10,130.0 ns | 113.70 ns | 100.79 ns | 0.1984 | 0.0916 | - | 1096 B |
| TestBasicWithConstruction | 384.5 ns | 2.14 ns | 1.67 ns | 0.0472 | - | - | 248 B |
| TestBasic | 613.3 ns | 5.80 ns | 5.14 ns | 0.0496 | - | - | 264 B |
It's safe to say that for read workloads, at least on macOS under my testing conditions, there is no issue with slower IO. This should probably be tested on windows for comparison, but I think it should behave sanely enough to not be an issue.
I will add more benchmark results as to how this translates into usage in osu! as they become available to me, but the following shows what a worst-case scenario can look like (this is around 10 filters running at song select doing >100k comparisons and many more property retrievals)

Right now, realm is consuming more space on disk than I would hope for. Initially, the realm database was 2gb after an import of 564 beatmap sets. This was somewhat alleviated by reducing the number of transactions during the import process (196mb after). Performing a manual compact on the realm database brings it within sane bounds, at about 75% the size of our SQLite database:

There's existing documentation around online about database file bloat and this will require further reading and investigation, but as it can be resolved by a manual compact after large import processes, is not a blocker.
Realm.Refresh() to be run regularly on main thread, and not run every Get()ItemAdded/ItemRemoved to remove schedulesPrimaryKeys, including fixing RulesetInfo's IDlong primary key instead of GUIDRealmWrapperAPIBeatmap : BeatmapMetadata and DummyRulesetInfo : RulesetInfo)I have spend around 50 hours on this so far (~40 of that in realm). I do still plan to continue investigating realm as it does seem to be, at very least, objectively a better solution than EF core.
Until now, our models and stores have been quite haphazard. In some cases, I'm not even sure how things are working correctly. The threading model realm uses forces some degree of consistency and structure that will help us better define out data usage.
At the point of writing this I estimate another 50 hours of work required to get the full game into a flawlessly working state, including addressing all the points above. The remaining work feels like the downhill part of this journey – I am only posting this as most of the points of pain have been resolved.
I plan to do this over the coming weeks.
The most value of EF are change tracking and cascade loading. If change tracking is unacceptable, EF will be not so friendly.
FYI: change tracking can have better performance if using notification, which requires a lot of boilerplate code. https://docs.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.changetrackingstrategy?view=efcore-3.0
EF also has a large startup overhead and memory presence (mostly reflection related). I understand that change tracking can actually improve performance, but as mentioned, our threading model is very non-standard and leads to headaches when attempting to use it.
And at the end of the day, SQLite is a bad backend for EF or anything else, as it doesn't type so nicely, nor migrate.
As a side note for anyone looking to test the attached realm branch, please build it twice for now to bypass the compile-time errors (there should be three remaining). It still works as expected.
Also ignore the errors in test projects. I have not updated tests yet.
a lot of boilerplate code
Do you mean PropertyChanged calls? It is not that bad.
OsuDbContext.Event calls can be simplified. It is already done in Prism framework.
I've written such helpers like Prism in my UI application. Experience is still bad because it still requires 6 lines of one property.
As peppy said, change tracking is not the only problem, even not the major one.
You're welcome to attempt that. It doesn't work without fixes (many other users have reported the same).
Things are feeling good, so I began looking into the realm filesize issue. The problem here is that we are holding realm contexts open on all background threads indefinitely.
My first thought was to clean them up. I added read usage tracking so we can use a finalizer in DatabaseUsage (new base class) to close the instance. Unfortunately, realm does not allow closing open instances unless on the original thread.
This doesn't work with RealmWrapper<T>, where we really can't manually dispose via a using.
In its current form anyway. Two paths forward are:
IDisposable usages from RealmWrapperThe Get() method should be made to return a disposable usage. This could be disposed after usage to show that the underlying realm context can be freed. Probably getting a bit too complicated to the point it's probably not what we are looking for.
My current leaning is that we should be detaching/copying data away from realm whenever it is to be persisted to a field/property, or passed over a thread. This should probably be a given for all Manager class retrieval methods and should be denoted with a Get prefix or similar.
If a component wished to usage realm-sourced data in more efficient or complex ways, it would access the realm context directly or use methods exposed by manager classes with a prefix that explained this. Such usages would not be cross-thread safe and handling that would be up to the consumer.
Current path forward:
RealmWrapper outside of manager classes. This will require experimentation in MusicController and BeatmapCarousel specifically, to more efficiently query and consume data across threads.I don't foresee this being something we can make a switch to within this month, but will continue investigation.
Also I'd note that the same issues exist with EF-core and are not specific to realm. We were getting by until now by basically detaching all data in all cases.
I remember having file size issues with Realm earlier this year, and it was mostly resolved by ensuring we didn't do too many bulk writes. IIRC, since transactions are copy-on-write, doing a large transaction will reserve a lot of space in the DB file.
In our example we were moving around hundreds of megabytes of serialised CLLocation objects and we were easily going over 1GB within a few days' usage. We solved it by batching the writes into smaller chunks.
That's not the issue, it's definitely what I explained in my last post. Already ensured transactions are of realistic sizes (one beatmap set at a time).
Realm 6.0 beta details have now been released. Not yet available via .NET API but at a glance, "Frozen Objects" look to solve the one major issue I was facing until now.
I will continue keeping my realm branch up-to-date with master, but not make any further changes until 6.0 becomes available for our consumption.
Moving out of milestone for now. Will become a current task once we have Realm 6.0 to test against.
Thanks for attempting to be useful, but it has only been released for cocoa, not .NET.
Oddly enough, is says that in January, listing it as GA meant that .NET would be supported, but according to your comment, it's not. The Realm blog hasn't been updated, either since the Realm 6.0 post..
Please refrain from these comments. Software takes time and it's not ready yet. You can be rest assured I'm following progress.
Just as an update, it looks like May at earliest for a revisit to this.
Most helpful comment
Please refrain from these comments. Software takes time and it's not ready yet. You can be rest assured I'm following progress.