Orm: Data hydration caching

Created on 14 Aug 2016  路  5Comments  路  Source: doctrine/orm

Doctrine's hydration becomes bottleneck when database returns lot of data. We're reluctant to enable query result cache, because our queries spit out results quick enough and we would have to manage cache invalidation.

Instead, I propose to implement caching in hydrateRowData() (or somewhere around this place). Just generate hash of the array returned from the database and use it as cache id. Then deserialize the data from the cache. Great thing about this approach is that as long as entity class has not been changed, we don't have to worry about cache invalidation. Correct me if I'm wrong.

Can't Fix Improvement

Most helpful comment

The fact that hydration is a bottleneck is a known fact. We cannot fix it now, but we are trying to come up with solutions for 3.x (or later, depending on what we will be able to do).

As @kimhemsoe said, research on this is already happening.

For 2.x, this is not going to change. You can use the hydration cache at your own risk though: https://github.com/doctrine/doctrine2/blob/31a0c02b066764ec7b55eedc5f3cab78235a4e4b/lib/Doctrine/ORM/AbstractQuery.php#L479-L519

Meanwhile, a glimpse of what is going to happen (hopefully):

  • come up with a "hydration DSL" which says how you get from record -> object
  • convert that "hydration DSL" into a closure
  • write a code inliner that can integrate the generated closure into callers
  • generate custom hydrators on a per-DQL or per-record-structure basis

Note that hydration will always remain an O(n) or greater operation: what we can do is just reduce the cost of every single operation.

Closing here. No resolution path until we figured out all the bits.

All 5 comments

Looks like you are talking about second level cache that already implemented

No, current implementation of second level cache generates hash of input sql query. Cache I'm talking about generates hash of database output. It doesn't skip database call. Purpose of that is to speed up specifically this process:
blackfire io_2016-08-15_00-19-23

Interesting idea, hashing the SQL does mean if the data changes you are stale. Hashing the output from the database scalar only means the hydration process is out of date (which likely won't change without code change).

Im not sure how possible this is, but sounds like a good performance increase for those who do not want/cant cache the database query.

Ocramius Have already played around with some ideas for making hydration faster,
if want a look at where doctrine maybe will heading.

https://ocramius.github.io/blog/doctrine-orm-optimization-hydration/
https://github.com/Ocramius/GeneratedHydrator

The fact that hydration is a bottleneck is a known fact. We cannot fix it now, but we are trying to come up with solutions for 3.x (or later, depending on what we will be able to do).

As @kimhemsoe said, research on this is already happening.

For 2.x, this is not going to change. You can use the hydration cache at your own risk though: https://github.com/doctrine/doctrine2/blob/31a0c02b066764ec7b55eedc5f3cab78235a4e4b/lib/Doctrine/ORM/AbstractQuery.php#L479-L519

Meanwhile, a glimpse of what is going to happen (hopefully):

  • come up with a "hydration DSL" which says how you get from record -> object
  • convert that "hydration DSL" into a closure
  • write a code inliner that can integrate the generated closure into callers
  • generate custom hydrators on a per-DQL or per-record-structure basis

Note that hydration will always remain an O(n) or greater operation: what we can do is just reduce the cost of every single operation.

Closing here. No resolution path until we figured out all the bits.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kcassam picture kcassam  路  3Comments

doctrinebot picture doctrinebot  路  4Comments

doctrinebot picture doctrinebot  路  4Comments

weaverryan picture weaverryan  路  3Comments

doctrinebot picture doctrinebot  路  4Comments