Mobx: Question: How do you avoid "leaking" memory while making sure there is only one instance of each domain object?

Created on 1 Mar 2017 · 21Comments · Source: mobxjs/mobx

One of the benefits of MobX is using real references to domain objects and knowing that you only have one instance of each domain object. The documentation suggests using stores to instantiate domain objects and to ensure there is only one instance of each. To do this the store needs to keep track of existing instances based on a domain object ID, for example using a simple ID-to-DomainObject lookup structure like Map<string,DomainObject>.

The thing I cannot figure out is how to avoid "leaking" those domain object instances when they are no longer needed by the application.

Example

Suppose you are implementing a typical forum with threads for discussion. A central domain object would be the Thread. There will typically be thousands of threads on the server. The user arrives at the front page which displays the 50 most recent threads as well as the 10 most active (hot) threads. Some of the hot threads are also in the list of 50 most recent threads. The ThreadStore creates 50-60 Thread instances depending on how many are common between the two lists.

The user moves on to the second page of most recent threads. The list of 10 hot threads remains the same, while the 50 most recent threads are replaced with the 51-100 most recent. If there was no global store referencing the Thread instances then the previous threads could be garbage collected. However the ThreadStore doesn't know if the "hot threads" list (or some other part of the application) is still referencing any of those Threads so it must keep tracking them. The ThreadStore cannot forget the previous 1-50 instances, so there are now 40-50 unused Thread instances which are still in memory. If the user keeps navigating through the pages then there will be hundreds and eventually thousands of Thread instances remaining in memory. If the app is intended to be realtime then there will also be subscriptions per Thread causing wasted network traffic and work for the server.

MobX vs other systems

This challenge isn't really specific to MobX. Any system which tries to only have a single instance of each domain object will have the same problem. I think most systems avoid the problem by not having single instances. Instead they let each component own and manage the instances themselves. Instances can be garbage collected when no components reference them anymore since there are no global/long-living store keep them in memory.

How to approach this?

So I am wondering how people are dealing with this (if at all). I even expected to find libraries which handle this kind of maintenance since it seems like it would be a common challenge, but so far I have come up short. The lack of libraries and discussions about this makes me suspect that I am overthinking this, but I also can't see a simple and clean solution.

Some of the approaches I have considered:

Don't worry about it. Users will normally not accumulate enough instances for this to be a problem. Probably applies to 9 out of 10 apps. However, some applications (mail client, issue tracking system, dashboards, facebook, etc) can potentially be left running for days and weeks without reload.
Don't worry about it, but handle the long-running edge cases by triggering a page reload when total number of instances reaches a certain threshold. Dirty, but reload would be a rare failsafe. Caching and saving state in local storage could make it nearly invisible to the user.
Manual reference counting. Any time a component keeps a reference to an instance it should notify the store through something like object.takeRef(), and release when finished using object.releaseRef(). The store or domain object maintain a reference counter, and the store can forget an instance when its reference count reaches 0. This would be slightly annoying and fairly error prone though.
Aggregate domain objects in query result sets to avoid explicitly calling .takeRef(). A component can create needed queries on construction and destroy them on unmount. The query result set calls .takeRef() and .releaseRef() automatically as domain objects are added to or removed from the result set. The problem with this is that nothing stops a component from passing a domain object along to a different part of the application, which has the same downside as point 3.
Don't try to have a single instance for each domain object. This removes the need for a central store keeping references to all domain objects, so instances will be garbage collected when no component are using them any more. Need to use 'object.id' to compare identities. If instances are to share a subscription to be updated then the component will need to remember to unsubscribe. This could cause issues if the component has passed the instance along to another part of the application since the instance will still exist yet its subscription is ended.

Is there an obvious way to think about this that I am missing here? Most examples seem to go with point 1 (I think).

Thanks for reading! :)

Source

sveinatle

👍9

Most helpful comment

My preferred solution is "don't worry about it" combined with "don't use global singleton stores for lists of things" (instead, use per-"page" stores)

So once a user navigates away "sufficiently" from the original UI, that UI "page"'s object and all its stores go away. If data needs to be saved, its better to save it to a more persistent storage anyway to avoid data loss in the event of a page reload.

Additionally, you could have a routing system that allows page stores to pass model objects to eachother, thereby preserving them only while necessary.

spion on 2 Mar 2017

👍4

All 21 comments

Good questions :). I often tend to go for option 3.

There are some useful tools in mobx that can help you though to automatie more stuff, especially the onBecome(Un)observed hooks of atoms. With these you could check if any reaction (component) is still using a certain value. Not perse elegant, but an interesting approach would be something like:

const todo = new Todo(json)
mobx.extras.getAtom(todo, "id").onBecomeUnobserved = () => {
  disposeTheTodo(todo)
}

This will call the onBecomeUnobserved handler as soon as as the "id" field is not in use anywhere as far as mobx nows. (Note that this does not guaruantee that the reference to the object is not used anymore! So you need to do some invariant checking that the todo is not used accidentally. For exeample mobx-utils fromResource also uses these handlers to stop update subscriptions with the server.

Also, have you considered using weak maps and always look up items by id? In that case the GC can do the work for you. Although debugging leaks might still be tedious.

mweststrate on 2 Mar 2017

@sveinatle

Funnily enough, I'm trying to tackle exactly this problem right now. My current approach is also to manage references. Not through counting, but through a set of reference objects. I've uploaded a work-in-progress, as yet untested, unused, of what I'm currently experimenting with:

https://gist.github.com/jamiewinder/944e038c86e19d81c4f5a455288b48f2

Basically, it acts as a central store for data models (which are assumed to be classes) which are keyed by type, or type + key, and accessed in the same way. Accessing a data can be done anonymously (null ref) or by passing any object along that identifies who is using that data. Once a part of your system has finished with this, it can unref and this can cause data objects to be removed.

For example, you getOrAdd your 'hot topics' where the ref is 'hottopics', and your page topics for page 1 where ref is 'pagetopics-1'. If your page changes, you unref 'pagetopics-1' which will remove any topics whose only ref was the page, but preserving those still used by hot topics. In reality, I don't use strings for these, but object instances (often this makes sense).

Hopefully this is explaining well enough! Let me know if this is at all helpful. The code itself is hours old, so is probably not 100% there.

EDIT: Here's my original non-observable version (uses ES2015 Map and Set rather than observable.shallowMap which makes things more straightforward since their keys can be any type):
https://gist.github.com/jamiewinder/d148f7289ee980ce4275d5e75d470304

jamiewinder on 2 Mar 2017

My preferred solution is "don't worry about it" combined with "don't use global singleton stores for lists of things" (instead, use per-"page" stores)

Additionally, you could have a routing system that allows page stores to pass model objects to eachother, thereby preserving them only while necessary.

spion on 2 Mar 2017

👍4

The tricky part comes when you want to say 'topic#xxx's title is now ABC'. If you aren't tracking your models centrally then you won't have any one model to update.

jamiewinder on 2 Mar 2017

@jamiewinder Yeah that will work while you're on the page. But when you navigate away from the "current UI" (e.g. open a new page) all such stores would go away. Works well for some use cases...

In the OP example, each page would instantiate their own ThreadStore (a map of id to topic). As the user navigates, the HotThreads list is the one that gets passed around due to it being shared for all the views. HotThreads will pre-populate each page's newly created ThreadStore, then the rest of the threads (PageThreads) will be fetched and consolidated with those already in TopicStore from HotTopics.

When the user goes away from a page, that page's TopicStore goes away too, so only the HotTopics references remain.

The general idea is to not keep things in memory that aren't necessary for the current page.

spion on 2 Mar 2017

The general idea is to not keep things in memory that aren't necessary for the current page.

Which can be tricky to ascertain on complex systems.

I think my own problem relates fairly well to the OP example, so I'll use this as an example.

My system is a job manager. A 'Job' is a fundamental data model, which can be loaded from a service through a number of means. I have a 'current day jobs' request which'll return information for all jobs assigned today. I have an 'unassigned jobs' request which'll return a page of jobs that aren't yet assigned. I have an 'late jobs' request which'll tell me which jobs are late etc.

I also have a web socket connection to the server which'll tell me if the job properties have changed. If they are, the job model is updated to match. In order to be able to do this, I have a central 'Jobs' collection so if I get told 'Job #x is now called 'Xyz'' then I can just do jobs.get(x).setName('Xyz').

This is why I think I need a central collection.

However a central collection being populated from a number of requests can lead to other complications. For example, if I load a page of my 'late jobs' then I may find I've already got a model for some of my jobs from the 'current day jobs' request. That's the role getOrAdd in my example gist; either given me the model for job#x, or create a model for job#x from some data).

Secondly, you start to need to track references because you want to avoid the collection 'leaking' like OP describes it. If I change from page 1 to page 2 of my 'late jobs' list, I want to remove the models from page 1 unless other parts of the system are using it. That's what the refs do; they serve to identify parts of the application that are using the data model. So for example:

Job 1 - Refs: [Today, LatePage]
Job 2 - Refs: [LatePage]
Job 3 - Refs: [Today]
Job 4 - Refs: [Today, LatePage]
Job 5 - Refs: [LatePage]

If I switch pages on my late jobs, I can unref 'LatePage' which here will remove jobs 2 and 5 because it'd leave them without references:

Job 1 - Refs: [Today]
~~Job 2 - Refs: []~~
Job 3 - Refs: [Today]
Job 4 - Refs: [Today]
~~Job 5 - Refs: []~~

The alternative is that if I get told that a job has changed, that I have to look for the duplicate models in the various stores in my application - TodaysJobs, LateJobs, UnassignedJobs - and update them all. This doesn't scale very well and can easily trip you up as the system grows. I think the above approach scales much better after the initial pain of having to track references.

OP - is this the kind of problem you're describing?

jamiewinder on 2 Mar 2017

This is why I think I need a central collection.

Hmm, not sure I'm explaining it right. Yes, a central collection - but a new central collection for every new page being opened, thats the idea.

Basically, instead of removing models you don't need, you copy references to models that you do need and discard the old cache.

The pattern would be a JobStore class that can load() any item by ID:

class JobStore {
  constructor() { this.jobMap = new Map(); } 
  load(id) { fetch from server, or cached in jobMap. Makes sure jobMap is used whenever possible }
  loadPage(id) { same as above, but entire page of items }
  preload(items) { preload items from a list into jobMap }
  // includes websocket subscription methods
}

The page viewmodel initializes a new JobStore and preloads late or unassigned jobs from previous page's store, if any. (They are only preloaded if they're also necessary on the current page)

class TodaysJobsPage {
  constructor(params) {
    this.jobStore = new JobStore()
    let {lateJobs, unassignedJobs} = params.previousPage;
    if (lateJobs) this.lateJobs = this.jobStore.preload(lateJobs)
    if (unassignedJobs) this.unassignedJobs = this.jobStore.preload(lateJobs)
    this.jobStore.loadPage(n).then(items => this.todayJobs = items);
  }
}

Once the constructor is finished preloading items, the old jobService goes away and becomes available for garbage collection, together with the old jobs. Only preloaded references from the previous page survive. The new service contains a fresh deduplicated collection of jobs, which are also referenced by the TodaysJobsPage model from normal arrays and objects. Therefore if there is a method such as

setSelected(id) {
  this.selectedJob = this.todayJobs[id];
}
updateSelected(newTitle) {
  this.selectedJob.title = newTitle;
}

this will affect the same job in todayJobs, which is the same reference present in the current JobService, which in turn ensured that its the same reference in lateJobs or unassignedJobs if they're present.

spion on 2 Mar 2017

I think the problem is that while you use ID-to-DomainObject structure to prevent state duplication of domain objects (so there is just 1 instance of each domain object), that structure itself is a state duplication. The list of IDs is duplicated here.
I think the proper solution is to make the list of IDs in ThreadsStore computed from IDs provided by HotThreads and RecentThreads stores.
So whenever IDs of HotThreads or RecentThreads changes, the list of IDs in ThreadStore is recomputed and so is the array with Thread instances.
The threads provided by HotThreads and RecentThreads are computed from the threads provided by ThreadStore (based on identity match).
There is a room for modifications/optimizations, but the idea is the same - make IDs in ID-to-DomainObject structure computed from the storages they require them...

urugator on 2 Mar 2017

@mweststrate

onBecomeUnobserved is interesting. Although I cannot see a way to take advantage of it yet since as you say it doesn't guarantee that the reference to the object is not used any more. The object could still be in the .state of a component, and its fields could become observed again through that.

I have thought about weak maps, but they don't help in these situations because the key must be an object. Strings are not supported. That means that you still need to find a specific instance to perform the lookup, so we're back at square one.

sveinatle on 2 Mar 2017

@jamiewinder

OP - is this the kind of problem you're describing?

Yes, that is a good example of what I'm talking about.

Your approach is as far as I can tell somewhat similar in principle to option 4, in the sense that you maintain the references via collections which can then be discarded as a whole. My phrasing in option 4 muddles two concepts because it was based on an experiment where I tried to handle refs at the same time as I tried to provide a dynamic result set from a query.

The QueryResult I referred to in option 4 plays a similar role as 'pagetopics-1' in your case. My motivation was that all queries are represented by component-owned Query-instances which maintain a QueryResultSet<DomainObjectType> each. The component which owns the query could adjust the arguments on the query instance (e.g. change page number, change sort order, change filter). The query object would then fetch a new result from the server and add/remove domain objects in its QueryResultSet as needed (while adjusting the ref count for each object it adds/removes). Since components need to query/subscribe to data somehow anyway, I thought it would be nice to wrap the refcounting into that to make it automatic.

The advantage with your implementation is that it focuses on handling the refs, so it is more reusable. Another advantage with saving refs the way you do is that it makes it easier to track down leaks. In my case I only had the refcount so I could never know who forgot to release the object.

sveinatle on 2 Mar 2017

@spion

So once a user navigates away "sufficiently" from the original UI, that UI "page"'s object and all its stores go away.

How do you know when the user has navigated sufficiently away though? I can see that working if the application is sufficiently "page-oriented". In that case any navigation to a new page means that all domain state can be forgotten except the parts that you manually transfer as you described. It does require some care to make sure you always pass along all the necessary objects though. If you forget to pass something along then there will be long-living components keeping references to domain objects which the new page store doesn't know about. The next time something else requests an instance of one those objects then they will get a new instance from the store. At that point you have two instances of the same domain object. Less page-oriented user interfaces with more long-living components might not have particularly clean transitions, which would make this a bit more tricky.

sveinatle on 2 Mar 2017

@urugator

Yes, the problem is caused by the store keeping references to the objects. I like your idea for avoiding that. If everyone who consumes a domain object reports back to the central store then this could work. It doesn't need to suffer performance wise either (from searching) if the store keeps track of which consumers can provide an instance of each domain object: let consumer = store.consumers[objectId][0]; let object = consumer.objects[objectId]. Making sure all consumers report their existence back to the central store reminds me a bit of adding a ref like @jamiewinder is doing though. Not sure how different it would be in practise.

sveinatle on 2 Mar 2017

So far it seems like there is no single silver bullet that works in all situations. Care will be required no matter which strategy is selected. Either manually adding/removing references, manually passing data from one page store to the next page store or making sure all consumers report back to the store.

sveinatle on 2 Mar 2017

@sveinatle, haven't you considered using Maps of maximum size, so the last used (via .get()) keys are being deleted on exceeding this limit?

andykog on 2 Mar 2017

👍3

If you forget to pass something along then there will be long-living components keeping references to domain objects which the new page store doesn't know about. The next time something else requests an instance of one those objects then they will get a new instance from the store.

No, if you forget to pass things around, they would simply be lost. The worst case scenario here is that things will be no longer cached anywhere and will have to be re-fetched - duplication or memory leaks will not happen.

I can see that working if the application is sufficiently "page-oriented".

Indeed, although I imagine you'd be able to stretch it quite a lot for whenever there is a single "changeable" list component (displays different pages) and several non-paginated ones (for a certain entity) - the changeable one will own the store. But I also think that avoiding global store singletons at all costs as a general strategy might be a helpful starting point for other cases.

edit: merging with next comment: Looks like @andykog 's idea might be the most promising one - a LRU map of observable objects. although the map might need to be large enough to ensure that currently used objects are never thrown out, otherwise it would loose track of them. A LRU map based on a mobx spy that tracks whenever any of its elements are observed to set the timestamp?

spion on 3 Mar 2017

@sveinatle

which consumers can provide an instance of each domain object

I am not sure if we understand each other... I don't suggest to keep domain object instances in HotThreads/RecentThreads, only the information needed to obtain them (IDs).
Here is simplified, unoptimized, untested example of what I have in mind:

class ThreadsStore {  

  @observable threads = [];         

  @computed get ids() {
     // Collect all IDs
     return _.union(this.hotThreads.ids, this.recentThreads.ids); 
  }  

  constructor() {
    this.hotThreads = new HotThreads(this);
    this.recentThreads = new RecentThreads(this);
    // Fetch threads when this.ids change and update this.threads
    reaction(() => this.ids, ids => {
      fetchThreads(ids).then(action(threads => this.threads = threads)); 
    })
    // Introduce cache to avoid re-fetching/updating all instances
  }      
}

class HotThreads {
  @observable ids = [];  // expose hot threads ids  

  constructor(threadsStore) {    
    this.threadsStore = threadsStore;
  }

  refresh() {
    // update hot threads ids
    fetchHotThreadIds().then(action(hotThreadIds => this.ids = hotThreadIds));
  }  

  @computed get threads() {    
    // Use ThreadStore to obtain thread instances
    // EDIT: doing it like this you may loose ordering, consider it only as demonstration
    return this.threadsStore.threads.reduce((threads, thread) => {      
      if (untracked(() => this.ids.includes(thread.id))) { // untracked because we don't want to recompute when this.ids changes, but only when this.threadStore.threads changes
        threads.push(thread); 
      }
      return threads;
    }, [])
    // instead of computing, the threads could be set from ThreadsStore reaction (which would be a better approach I quess)    
  } 
}

The idea is in principle the same as with facebook's Relay/GraphQL.
The components (HotThreads,RecentThreads) exposes the information about what data they need, these informations are collected by ThreadsStore and used to provide these data to the components.
In the example I use IDs instead of GraphQL query fragments, so obtaining thread requires two server roundtrips (first to get IDs and second to get actual domain objects). That can be avoided if you expose "fetch function" instead of a set of IDs. Let the "fetch function" run in ThreadStore and pair the result with store...now you know which store needs which instances and also which instances are overall needed.

EDIT: Btw the same problem is often solved by EntityManagers (just in different context).

EDIT2: I also want to point out that ensuring an existence of single instance of domain object is not enought to ensure overall consitency. The problem is that identity map doesn't capture the relation between the objects. It is true that if you change Thread name at one place, this change will get propagated to other places where the thread is used, but if there is some relation like ordering and the relation can't be resolved client side, then you still have a problem. And notice that client side object can also relate to remaining objects in DB.

urugator on 3 Mar 2017

@spion

No, if you forget to pass things around, they would simply be lost.

I was thinking about the case where there is a global/long living component holding references to things (which necessitate passing those things along to the next page in the first place). If nothing global is holding references then yeah, they will lost.

@andykog

haven't you considered using Maps of maximum size, so the last used (via .get()) keys are being deleted on exceeding this limit?

Doesn't this assume that components and mobx stores (e.g. UiState) never hold references to things, and always access them via .get() every time? If they first get an object through .get() and then store that object somewhere in their own state, then the LRU order will not be kept up to date?

@urugator

I am not sure if we understand each other...

Yeah, I misunderstood. So the store is responsible for pulling in lists of IDs from all active consumers (hotThreads, recentThreads). The difference from typical reference counting is that your solution reacts to changes in consumers automatically without manually managing references. So consumers need to notify the store when they are destroyed (for the store to stop including their id-lists), instead of notifying the store about individual references.

For my project I ended up doing basic reference counting. Just a single, simple store with .get(entityId : string) : Entity, .remember(obj : Entity) and .forget(obj : Entity). The entities themselves provide .take() and .release() and are responsible for notifying the store to remember and forget as needed. So far this works well enough, though I might add something to the entities so that they will complain if they are accessed after being destroyed (would indicate someone forgot to .take()). Also maybe add an argument to .take() identifying who is taking in order to debug missing .release() more easily (similar to @jamiewinder's RefType).

sveinatle on 20 Mar 2017

@sveinatle

Doesn't this assume that components and mobx stores (e.g. UiState) never hold references to things, and always access them via .get() every time? If they first get an object through .get() and then store that object somewhere in their own state, then the LRU order will not be kept up to date?

What problem are you concerned about with storing references in component? If we are talking about avoiding memory leaks while rendering collections, it will still work ok as far as the list renderer uses map.get() for filtering, the list will be rerendered after the item is no longer available in the map and the component will be unmounted. If you are talking about possibility of keeping link to a “dead” or duplicating item in component that doesn't depend on presence in the collection (e.g. “order #243 page”) then the same problem can arise even without using LRU cache maps, right?

andykog on 21 Mar 2017

@andykog

Agreed, the list renderer would not be a problem in that case assuming it is reactive wrt to its source collection. And yes, the problem I am concerned about is the "dead" and subsequent duplicate when something receives a non-reactive reference. For example if a user selects an item from search result list and that item is given to a different component to render more details. The detail-component is not a child of the search result list and doesn't care if then list no longer contains item, or whether the list even still exists. The detail-component should just keep rendering the same item until the user selects something else to look at, and to do that it will need to save a reference either in its local state or some dedicated place in an encompassing UiState or similar. If the detail-component doesn't somehow regularly access map.get() for that item (to maintain its LRU status) then it will eventually be cleaned up by the store (e.g. if the user keeps loading different search results until the selected item is pushed out of the cache).

Yes, the same problems arise without LRU cache. That was the purpose of posting this question, to hear how people are dealing with this :)

sveinatle on 21 Mar 2017

This is exactly why WeakMaps were created. Use them and please don't add tons hard to maintain buggy code to your apps.