Mobx-state-tree: RFC: initialize instances upon usage

Created on 10 May 2018  Â·  19Comments  Â·  Source: mobxjs/mobx-state-tree

Right now MST can have performance issues when re-hydrating huge arrays or large trees.
This happens because MST converts snapshots to their observable counterpart upon snapshot application, and that may take a lot of resources.

So why not we create first the internal Node from the initial snapshot and eventually create the observable instance only when the value is first referenced?

e.g. If we have a huge list of Todos, the observable array instance will be created, then the react-virtualized component will access the first 20 items, so only that 20 todo instances will be accessed and their observable counterpart will be created, next we scroll the view and the other instances will be created while scrolling.

Only change would be the afterCreate hook firing upon value conversion (so afterAttach may fire first).
But we can alternatively keep the current execution order and value will be converted only if the self is actually used by the afterCreate hook.

This change would require "Node" to store the initialSnapshot and return that into "@computed get snapshot" if the value is not initialized yet.

This would also solve situations where validate runs twice.

Looking forward comments and wild ideas! :)

breaking change has PR

Most helpful comment

Good day!
I've done a raw implementation of this idea and have following results to share.
I made a simple repo for tests (can be found here), but the baseline is

const Child = types
  .model('Child', {
    type: types.optional(types.enumeration(['A', 'B', 'C']), () => {
      const rnd = Math.random();
      if (rnd < 0.33) {
        return 'A';
      } else if (rnd < 0.66) {
        return 'B';
      } else {
        return 'C';
      }
    }),
    num: types.optional(types.number, () => getRandomArbitrary(1, 100)),
    children: types.array(GrandChild)
  })
  .extend(self => {
    const views = {
      get parent() {
        return getParent(self, 2);
      },
      get numCopy() {
        return self.num;
      }
    };

    const actions = {
      setNum(value: number) {
        self.num = value;
      }
    };

    return {views, actions};
  });

const Parent = types
  .model('Parent', {
    first: types.string,
    last: types.string,
    num1: types.number,
    num2: types.number,
    children: types.array(Child)
  })
  .extend(self => {
    const views = {
      get name() {
        return `${self.first} + ${self.last}`;
      },
      get sum() {
        return self.num1 + self.num2;
      }
    };

    const actions = {
      setNum1(value: number) {
        self.num1 = value;
      },
      setNum2(value: number) {
        self.num2 = value;
      }
    };

    return {views, actions};
  });

const ParentSnapshotBase = {
  first: 'Foo',
  last: 'Bar',
  num1: 1,
  num2: 2,
  children: Array(100).fill({
    children: Array(10).fill({})
  })
};

function CreateOneParent() {
  console.time('create');
  const parent = Parent.create(ParentSnapshotBase);
  console.timeEnd('create');
  console.log('Parent:', parent);
  return parent;
}

We create one parent, which has 100 children, each child having 10 own children in turn, resulting in 1001 object nodes.

Below are four screenshots of memory heaps + creation time.
1) current master in dev
01-master-dev
2) deferred implementation in dev
02-deferred-dev
3) current master in prod
03-master-prod
4) deferred implementation in prod
04-deferred-prod

Results seem quite impressive. Speedtest also shows significant improvement (master@left, deferred@right):

Small Model

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 31ms | x 100 | 0.3ms avg |<--->| 30ms | x 100 | 0.3ms avg |
| 287ms | x 1,000 | 0.3ms avg |<--->| 295ms | x 1,000 | 0.3ms avg |
| 2,571ms | x 10,000 | 0.3ms avg |<--->| 2,590ms | x 10,000 | 0.3ms avg |
| 298ms | x 1,000 | 0.3ms avg |<--->| 297ms | x 1,000 | 0.3ms avg |
| 31ms | x 100 | 0.3ms avg |<--->| 31ms | x 100 | 0.3ms avg |

Medium Model

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 59ms | x 100 | 0.6ms avg |<--->| 59ms | x 100 | 0.6ms avg |
| 552ms | x 1,000 | 0.6ms avg |<--->| 549ms | x 1,000 | 0.5ms avg |
| 5,149ms | x 10,000 | 0.5ms avg |<--->| 5,125ms | x 10,000 | 0.5ms avg |
| 563ms | x 1,000 | 0.6ms avg |<--->| 563ms | x 1,000 | 0.6ms avg |
| 59ms | x 100 | 0.6ms avg |<--->| 61ms | x 100 | 0.6ms avg |

Large Model - 0 children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 286ms | x 100 | 2.9ms avg |<--->| 204ms | x 100 | 2.0ms avg |
| 2,739ms | x 1,000 | 2.7ms avg |<--->| 2,002ms | x 1,000 | 2.0ms avg |
| 293ms | x 100 | 2.9ms avg |<--->| 207ms | x 100 | 2.1ms avg |

Large Model - 10 small & 10 medium children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 529ms | x 50 | 10.6ms avg |<--->| 236ms | x 50 | 4.7ms avg |
| 2,561ms | x 250 | 10.2ms avg |<--->| 1,180ms | x 250 | 4.7ms avg |
| 536ms | x 50 | 10.7ms avg |<--->| 235ms | x 50 | 4.7ms avg |

Large Model - 100 small & 0 medium children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 1,501ms | x 50 | 30.0ms avg |<--->| 560ms | x 50 | 11.2ms avg |
| 7,539ms | x 250 | 30.2ms avg |<--->| 2,951ms | x 250 | 11.8ms avg |
| 1,534ms | x 50 | 30.7ms avg |<--->| 570ms | x 50 | 11.4ms avg |

Large Model - 0 small & 100 medium children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 2,852ms | x 50 | 57.0ms avg |<--->| 1,099ms | x 50 | 22.0ms avg |
| 14,928ms | x 250 | 59.7ms avg |<--->| 5,478ms | x 250 | 21.9ms avg |
| 2,992ms | x 50 | 59.8ms avg |<--->| 1,100ms | x 50 | 22.0ms avg |

Implementation currently lives at my fork and all mob-x-stat-tree package tests are passing. Although I had to rewrite some of them.

Two most notable breaking changes are:

  1. afterCreate and afterAttach hooks now happen at the same time
  2. postProcessSnapshot now must be defined on type directly (as preProcessSnapshot) as there are cases when we want to postprocess initial snapshot without actually initializing observables

Besides that there are many internal changes, which should be reviewed carefully.

I didn't create merge request, because many tests from other packages are failing. But I hope i'll manage to fix them by the end of the week.

All 19 comments

Will the data validate upfront, or also lazy?
Because if you will get data validation error lazy, its very unexpected behavior

No, data validation will be performed synchronusly upon Node creation :)

..so, lazily ;-)?

Op do 10 mei 2018 om 12:11 schreef Mattia Manzati <[email protected]

:

No, data validation will be performed synchronusly upon Node creation :)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/mobxjs/mobx-state-tree/issues/810#issuecomment-388012598,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABvGhIrdWXytkkYhKL0HhPEf2KdmLF-5ks5txBJcgaJpZM4T5nMg
.

Whops, I was not clear enough :)
The entire node tree will be validated and then created (as happens right now), only the observable instance will be delayed to the first access :)

Here's some pseudocode, basically you end up creating just the entire Node tree instead of the whole observable tree :)

class Node{

  constructor(public readonly type: IType<any, any>, public readonly initialSnapshot: any, public parent: Node){
    typecheck(type, initialSnapshot)

    // create child nodes here
    childNodes = type.initializeChild(initialSnapshot)
  }

  get snapshot(){
    return this.hasInitialized ? this.type.getSnapshot(this) : this.initialSnapshot
  }

  get value(){
    if(!this.hasInitialized) this.storedValue = this.createNewInstance()
    return this.storedValue
  }
}

So the validation is not the main CPU intensive task? do you have benchmarks on it?

Not yet, but there were issues reporting slow creation even with typecheck turned of :)

Right, i get it; lazily instantiate when walking into the tree. Like immer
does with proxying.

That sounds quite smart. I think it is even easy to implement.

There might be a risk with lifecycle hook. For example if an object keeps
itself in sync with a server, that connection is only established when
first accessed instead of immediately. Not sure if that is a real use case
though

Op do 10 mei 2018 18:18 schreef Mattia Manzati notifications@github.com:

Not yet, but there were issues reporting slow creation even with typecheck
turned of :)

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
https://github.com/mobxjs/mobx-state-tree/issues/810#issuecomment-388104539,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABvGhKS1uEXVhjTYcbver48ItbuxaBcQks5txGhmgaJpZM4T5nMg
.

Yep, that was totally inspired by immer!
Basically this would add only a small overhead for creating the node and for snapshot validation (if not disabled by production mode), as the initialSnapshot will not be duplicated but just referenced and already allocated in memory.

I've been playing around with this idea for some time.
First of all, it seems like creating new observable instance (node.storedValue) is cheap, but necessary, because $treenode/toJSON/applyPatch/applySnapshot are applied to it at the very beginning of node creation. The heavy part is finalizeNewInstance, so I moved it into separate method of node.
Second, i added isLazy parameter all through type.instantiate, createNode and node.constructor.
The next step was to implement get value() {} logic from this comment, but most code through mst uses storedValue directly, so i moved storedValue (result of createNewInstance, empty observable) to a new field baseValue and changed storedValue to

public get storedValue() {
        if (!this._valueInitialized) this._finalizeValue()
        return this.baseValue
    }

```typescript
private _finalizeValue() {
this._valueInitialized = true
let sawException = true
try {
this._isRunningAction = true
this._finalizeNewInstance(this, this._initialValue)
this._isRunningAction = false

        if (!this._parent) this.identifierCache!.addNodeToCache(this)
        else this._parent.root.identifierCache!.addNodeToCache(this)

        this.fireHook("afterCreate")
        this.state = NodeLifeCycle.CREATED
        sawException = false
    } finally {
        if (sawException) {
            // short-cut to die the instance, to avoid the snapshot computed starting to throw...
            this.state = NodeLifeCycle.DEAD
        }
    }

    const snapshotDisposer = reaction(
        () => {
            return this.snapshot
        },
        snapshot => {
            this.emitSnapshot(snapshot)
        },
        {
            onError(e) {
                throw e
            }
        }
    )
    this.addDisposer(snapshotDisposer)

    this.finalizeCreation()
}
Also I had to changle all `finalizeNewInstance` implementations to use `objNode.baseValue` instead of `objNode.storedValue`, but that's much less changes, than swapping all `.storedValue` to `.value`.

So the pseudocode of object-node's constructor looks like:
```typescript
constructor(
        type: IType<any, any>,
        parent: ObjectNode | null,
        subpath: string,
        environment: any,
        initialValue: any,
        storedValue: any,
        finalizeNewInstance: (node: INode, initialValue: any) => void = noop,
        isLazy: boolean = false
    ) {
        ...
        this.baseValue = storedValue
        this._initialValue = initialValue
        this._finalizeNewInstance = finalizeNewInstance
        ...
        if (finalizeNewInstance === noop || !isLazy) {
            this._finalizeValue()
        }
}

At this point all tests are passing, because isLazy is always false so all nodes are still fully initialized.

For now i'm tryig to use isLazy = true for array's childen, which goes down to this code (complex-types/array.ts):

function valueAsNode(...) {
    ...
    // nothing to do, create from scratch
    return childType.instantiate(parent, subpath, parent._environment, newValue, isLazy: true)
}

The first problem is that order of hooks firing changed ("it should trigger lifecycle hooks" test) from:

[
//creating tree depth first
"new todo: Get coffee",
"new todo: Get biscuit",
"new todo: Give talk",
"new store: 3",
//finalizeNodeCreation
"attach todo: Get coffee",
"attach todo: Get biscuit",
"attach todo: Give talk",
//first action from test - detaching talk todo
"detach todo: Give talk",
...
]

to:

[
//creating tree depth first and finalizing it, no "new todo" logs, because those are lazy thus not created
"new store: 3", 
//first action from test - detaching talk todo, but reconciling array children touches every item, so those are created "on fly" and immediately attached
"new todo: Get coffee",
"attach todo: Get coffee",
"new todo: Get biscuit",
"attach todo: Get biscuit",
"new todo: Give talk",
"attach todo: Give talk",
"detach todo: Give talk", "-",
...
]

So the first question, maybe we should provide new hook (i.e. initialized?) in place of deferred afterCreate? We can not leave "afterCreate" in place, becuase user-code may depend on properties, which are not set (finalizeNewInstance haven't run yet).

The second problem is reference type. It heavely uses identifierCache. Later one uses node.identifier, which in turn tries to access node.storedValue[this.identifierAttribute]. This means we can't fulfill identifierCache before finalizeNewInstance was called, which effectively eliminates laziness.
Array initialization goes this way: finalizeNewInstance -> applySnapshot -> willChange (through intercept) -> reconcileArrayChildren -> valueAsNode
The only place we can interfere is reconcileArrayChildren/valueAsNode:

function reconcileArrayChildren(childType, parent,...) {
    ....
    const childIsLazy = canChildBeLazy(parent, childType)
    ...
    valueAsNode(childType, parent, "" + newPaths[i], newValue, oldNode, childIsLazy)
}
function valueAsNode(..., isLazy?) {
    ...
    // nothing to do, create from scratch
    return childType.instantiate(parent, subpath, parent._environment, newValue, isLazy)
}

canChildBeLazy is quite complex:

  • isLazy = childType.shouldAttachNode
  • for each parent up to root:

    • if parent is ModelType, getMembers(...)

    • for every property



      • if its instanceof BaseReferenceType - store it


      • if its instanceof ArrayType and its subtype is instanceof BaseReferenceType - store it



    • if any reference properties are stored for this parent:

    • if childType is instance of any of stored reference properties - isLazy = false

    • if childType is union and one of it's subtypes is... well, you know: isLazy = false

  • if at any point isLazy === false - break and return it

I'm pretty sure that i've missed a case or two :) Moreover, this approach forces initialization of all children, but only one/few of them could be really referenced. But it will get even more complex to understand which.
So the second question is am I missing something? Maybe we should use different approach for references?

Thrid problem is that using this approach:

get snapshot(){
    return this.hasInitialized ? this.type.getSnapshot(this) : this.initialSnapshot
  }

We loose types.optional defaults if we access .snapshot before .(stored)Value, because initialValue === "payload given to .create()". Should we somehow get them during craeteNode and and enrich initalValue? Or should we use isLazy = false for any models with optional types?

Any comments and ideas are welcome, because there is no point in going further (i.e. trying to make maps and nested models lazy too) before we have answers.

Fist of all, thanks a lot @k-g-a for your work! :D
I think I did'nt explained it so well and you missed a pretty important point of the proposal; back to the pseudo code:

class Node{
  constructor(public readonly type: IType<any, any>, public readonly initialSnapshot: any, public parent: Node){
    typecheck(type, initialSnapshot)

    // create child nodes here
    childNodes = type.initializeChild(initialSnapshot)
  }
//...
}

As you can see, the child nodes are now stored inside the parent node (instead of storing them inside the observable instance).
This means you can have collection without initializing the observable array and all of their members to observable objects.
Upon snapshot application we make the entire tree, but instead of making it of observable objects, we make it of node instances.
This brings to the answer to your question:

  1. Should we replace the current afterCreate hook?
    We could do that, but I'd prefer to slowly deprecate it if no one actually use it. With the new deferred logic we can still implement that by checking when creating the node if there's an afterCreate hook in the type.
  2. What about reference types?
    Identifiers are resolved from the snapshot by the type. Nodes are now hold by other nodes and all created upon root node snapshot application. This means we can continue to scan for reference as we do now; only change is that instead of looping throw the observable object for childs with identifier to register in the cache, we loop into the childNodes of the node.
    That's an important change, and that's why I purposed it for MST 3 :)
    Hope I explained myself, I'm sorry, but i've been tinkering with this for a while, and maybe I skipped right to the end of the argument.

Thanks for your time @k-g-a !

Good day, @mattiamanzati!
Thanks for the explanation, it seems much clearer now :)
Before I proceed to further implementation, one more question: what about optional values? Those are currently evaluated and filled during observable instance initialization. So should we enrich initialSnapshot from your pseudocode with those values?

Yep, exactly! :)

@mattiamanzati, any chance you know the answer to this question on spectrum? As I'm going to work around object-node, could touch this part too.

Not sure, but may be a left over from the code when we didn't use scoped functions from self

Good day!
I've done a raw implementation of this idea and have following results to share.
I made a simple repo for tests (can be found here), but the baseline is

const Child = types
  .model('Child', {
    type: types.optional(types.enumeration(['A', 'B', 'C']), () => {
      const rnd = Math.random();
      if (rnd < 0.33) {
        return 'A';
      } else if (rnd < 0.66) {
        return 'B';
      } else {
        return 'C';
      }
    }),
    num: types.optional(types.number, () => getRandomArbitrary(1, 100)),
    children: types.array(GrandChild)
  })
  .extend(self => {
    const views = {
      get parent() {
        return getParent(self, 2);
      },
      get numCopy() {
        return self.num;
      }
    };

    const actions = {
      setNum(value: number) {
        self.num = value;
      }
    };

    return {views, actions};
  });

const Parent = types
  .model('Parent', {
    first: types.string,
    last: types.string,
    num1: types.number,
    num2: types.number,
    children: types.array(Child)
  })
  .extend(self => {
    const views = {
      get name() {
        return `${self.first} + ${self.last}`;
      },
      get sum() {
        return self.num1 + self.num2;
      }
    };

    const actions = {
      setNum1(value: number) {
        self.num1 = value;
      },
      setNum2(value: number) {
        self.num2 = value;
      }
    };

    return {views, actions};
  });

const ParentSnapshotBase = {
  first: 'Foo',
  last: 'Bar',
  num1: 1,
  num2: 2,
  children: Array(100).fill({
    children: Array(10).fill({})
  })
};

function CreateOneParent() {
  console.time('create');
  const parent = Parent.create(ParentSnapshotBase);
  console.timeEnd('create');
  console.log('Parent:', parent);
  return parent;
}

We create one parent, which has 100 children, each child having 10 own children in turn, resulting in 1001 object nodes.

Below are four screenshots of memory heaps + creation time.
1) current master in dev
01-master-dev
2) deferred implementation in dev
02-deferred-dev
3) current master in prod
03-master-prod
4) deferred implementation in prod
04-deferred-prod

Results seem quite impressive. Speedtest also shows significant improvement (master@left, deferred@right):

Small Model

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 31ms | x 100 | 0.3ms avg |<--->| 30ms | x 100 | 0.3ms avg |
| 287ms | x 1,000 | 0.3ms avg |<--->| 295ms | x 1,000 | 0.3ms avg |
| 2,571ms | x 10,000 | 0.3ms avg |<--->| 2,590ms | x 10,000 | 0.3ms avg |
| 298ms | x 1,000 | 0.3ms avg |<--->| 297ms | x 1,000 | 0.3ms avg |
| 31ms | x 100 | 0.3ms avg |<--->| 31ms | x 100 | 0.3ms avg |

Medium Model

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 59ms | x 100 | 0.6ms avg |<--->| 59ms | x 100 | 0.6ms avg |
| 552ms | x 1,000 | 0.6ms avg |<--->| 549ms | x 1,000 | 0.5ms avg |
| 5,149ms | x 10,000 | 0.5ms avg |<--->| 5,125ms | x 10,000 | 0.5ms avg |
| 563ms | x 1,000 | 0.6ms avg |<--->| 563ms | x 1,000 | 0.6ms avg |
| 59ms | x 100 | 0.6ms avg |<--->| 61ms | x 100 | 0.6ms avg |

Large Model - 0 children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 286ms | x 100 | 2.9ms avg |<--->| 204ms | x 100 | 2.0ms avg |
| 2,739ms | x 1,000 | 2.7ms avg |<--->| 2,002ms | x 1,000 | 2.0ms avg |
| 293ms | x 100 | 2.9ms avg |<--->| 207ms | x 100 | 2.1ms avg |

Large Model - 10 small & 10 medium children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 529ms | x 50 | 10.6ms avg |<--->| 236ms | x 50 | 4.7ms avg |
| 2,561ms | x 250 | 10.2ms avg |<--->| 1,180ms | x 250 | 4.7ms avg |
| 536ms | x 50 | 10.7ms avg |<--->| 235ms | x 50 | 4.7ms avg |

Large Model - 100 small & 0 medium children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 1,501ms | x 50 | 30.0ms avg |<--->| 560ms | x 50 | 11.2ms avg |
| 7,539ms | x 250 | 30.2ms avg |<--->| 2,951ms | x 250 | 11.8ms avg |
| 1,534ms | x 50 | 30.7ms avg |<--->| 570ms | x 50 | 11.4ms avg |

Large Model - 0 small & 100 medium children

|Time|Runs|Average|<--->|Time|Runs|Average|
|----|----|-----|---|----|----|-----|
| 2,852ms | x 50 | 57.0ms avg |<--->| 1,099ms | x 50 | 22.0ms avg |
| 14,928ms | x 250 | 59.7ms avg |<--->| 5,478ms | x 250 | 21.9ms avg |
| 2,992ms | x 50 | 59.8ms avg |<--->| 1,100ms | x 50 | 22.0ms avg |

Implementation currently lives at my fork and all mob-x-stat-tree package tests are passing. Although I had to rewrite some of them.

Two most notable breaking changes are:

  1. afterCreate and afterAttach hooks now happen at the same time
  2. postProcessSnapshot now must be defined on type directly (as preProcessSnapshot) as there are cases when we want to postprocess initial snapshot without actually initializing observables

Besides that there are many internal changes, which should be reviewed carefully.

I didn't create merge request, because many tests from other packages are failing. But I hope i'll manage to fix them by the end of the week.

Closing, will be part of MST 3

There might be a risk with lifecycle hook. For example if an object keeps itself in sync with a server, that connection is only established when first accessed instead of immediately. Not sure if that is a real use case though

So I have this exact problem now. Any suggestions on how to handle it? Is there a way to indicate that a node should not be lazily instantiated? I'd like to set up syncing with a server using afterCreate, regardless of whether the node is being accessed locally yet...

Loop over the children from some parent

On Wed, 13 Nov 2019, 17:36 Speros Kokenes, notifications@github.com wrote:

There might be a risk with lifecycle hook. For example if an object keeps
itself in sync with a server, that connection is only established when
first accessed instead of immediately. Not sure if that is a real use case
though

So I have this exact problem now. Any suggestions on how to handle it? Is
there a way to indicate that a node should not be lazily instantiated? I'd
like to set up syncing with a server using afterCreate, regardless of
whether the node is being accessed locally yet...

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/mobxjs/mobx-state-tree/issues/810?email_source=notifications&email_token=AAN4NBCVSWXRWN5EYZEJFQLQTQ3KBA5CNFSM4E7GOMQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED7AETA#issuecomment-553517644,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAN4NBCE3DEZOAGO26JLQPTQTQ3KBANCNFSM4E7GOMQA
.

Was this page helpful?
0 / 5 - 0 ratings