Cache slamming is a big issue few know about, making most of the caching systems pretty ineffective, including those in Phalcon.
Creating cache with process synchronisation would rise Phalcon into higher level, well above other frameworks in that regard.
So what's the problem with Phalcon cache system - regardless if it's based on Files, Memcached or anything else?
It's lack of process synchronisation.
Here is example:
Let's say we have very resource consuming work, that we want to cache. Let's say it involves several DB calls, and overally it takes few seconds to complete, which is very long on busy sytems.
Now, on busy system, where can be few or more requests per sec. demanding such resource, here is what happens when resource is not cached, or expired:
First process fails to extract cached resource, so it begins to create it, which will take few seconds and a lot of server power (processor/memory/io)
In the meantime, when first process is creating the resource and consuming server resources, other precesses are trying to read cache, fails, and doing the same work what process nr 1 is doing.
Performance downspike happens, everything is slowed down, and it continues to the moment when last of the processes will put the resource in the cache.
This is called cache slamming and it's wrong!
There should be only one process creating the resource while other should wait and sleep until process nr 1 will finish and put resource in cache, after that rest of the waiting processes should be woken up and extract resource from cache.
Te best solution I've found is to use the sync library and reader/writer class - as it's very reliable and works the same under Windows and Linux:
http://php.net/manual/en/class.syncreaderwriter.php
but it requires different approach to creating the resource, NOT like this:
if (!resurce in cache) {
1. create resource
2. put in cache
}
It must be done like this:
object with interface to create resource ( let's say method: createResourceForCache )
object is passed to cache manager, cache manager is doing synchronisation, and finally returns the resource created by object passed to the manager
Something like this:
$cachedResourceCreator = new \MyClassImplementingInterfaceForCreatingResource();
$manager = new \CacheManager( $cacheMethod, $expireTimeInSeconds );
$resourceWeWant = $manager->getCachedResource($cachedResourceCreator);
Where $cacheMethod can be filesystem, memcached, and so on...
Cache slamming is big issue on busy systems when there is a lot of caching, and it's worth to do it better in the next big version of Phalcon (4?)
Well written and describes the issue thoroughly. There's always a fine line from where the framework ends and what can be implemented in user-land. I'm not sure if this belongs in Phalcon's default adapters but like you said adding another layer that doesn't impact existing usage via some sort of CacheSynchronizationManager API where original adapters can be used independently (thus, no issues with those who don't want to incur the overhead of synchronization) may be the optimal approach.
Will definitely need to investigate the approaches for syncing overhead though. Curious as to the penalty for this.
I hope, I will find time, to finish my project for universal "no slamming cache" for PHP, which will be based on synchronisation solutions im already using - just gathered toghether into one universal package.
I will create repository/project in GitHub, and it could be starting poing to further discussion and implementation in Phalcon/Zephyr. ETA 14-21 days.
Agree it's a problem, we should solve it in next phalcon version.
Im also thinking about multiple options for creating the resource:
When I read about this, I immediately think of dead locks.
The thing is - very nice idea, and it should get attention to details as much as possible.
If we're talking about deadlocks, the synchornisation solution im proposing (SYNC library and ReaderWriter class: http://php.net/manual/en/class.syncreaderwriter.php), forbids nested locks, that is: if you try to aquire another lock within process the previous lock will be released.
This preventing deadlocks.
BUT
Im thinkig about synchro interface, where advanced user will have opportunity to change default lock mechanism (SYNC library) to different one that allow nested locks.
Anyway i think the default synchro mechanism should be one without possibility of nested locks / deadlocks, because IMO it's too advanced and confusing topic for most of the users - they just want framework easy to use without potential problems like deadlocks.
Exactly my point. I've never used that PECL lib before, but from theoretical point of view - concern is orchestration of such process i.e. avoidance of deadlocks. ;)
:100: agree with your point to exclude nested locks - that's simply too much for a human brain and not required at first place.
Actually, during working on it, i've discovered that SyncReaderWriter class is allowing nested write locks, and thus providing possibility of deadlocks.
Im thinking about protecting it from happening by monitoring if any lock has been acquired using global static variable, but it will not prevent deadlocks by making locks out of the Cache Class...
The other thing that somewhat protects from deadlocks, is a lock timeout. It will be set to for example 30 sec by default, and will be configurable at cache construction stage, independently for every cache object instance.
But the best way I always use, is to not include caching in the process of creating value for caching, that's why i never had a problem with deadlocks despite thinking nested locks are not allowed by Sync extension.
Example of working code:
//////////////
// Very simple, two line code for getting and creating the resource
// of group "products" and id/key "11500" in cache for 20 sec
$cache = new \inopx\cache\CacheMethodFile('inopx_cache');
echo $cache->getCachedValue_('products', 11500, 20, function() { return 'I was created at '.date('Y-m-d H:i:s'); });
More sophisticated example:
//////////////
// The Cache instance, save method = file
$cache = new \inopx\cache\CacheMethodFile('inopx_cache');
$lifetimeInSeconds = 20;
$group = 'products';
$key = 11500;
//////////////
// The Creator closure/callback
$firstNames = ['Albert', 'Mark', 'Fiona', 'Helen', 'Alex', 'Robert', 'Rachel', 'Jeff'];
$lastNames = ['Smith', 'Washington', 'Jones', 'Jackson', 'Green', 'Harris', 'Walker', 'Hall', 'Turner'];
$create = function() use($firstNames, $lastNames) {
error_log('Executing create closure at '.date('Y-m-d H:i:s'));
// Sleep for testing concurrency
sleep(8);
// Our resource is a currend datetime plus random first and last name
return date('Y-m-d H:i:s').' '.$firstNames[rand(0, count($firstNames)-1)]. ' ' .$lastNames[rand(0, count($lastNames)-1)];
};
//////////////
// Get cached value and create it first if its not exists or expired
$value = $cache->getCachedValue_($group, $key, $lifetimeInSeconds, $create);
echo 'value = '.$value;
By the Way, for file cache method im using special "clustering dir" algorithm to limit number of files and subdirectories per cache directory. It's important for huge number of entries in the cache, as many files in one directory may increase disk seek times on some filesystems. In the next week i'll create repository and explain more in that matter,
There is also another thing worth explaining: why use group not just key.
it's just convinient, for at least few reasons. For example in file caching system, the group can be a name of directory to store cached resources.
Second: often cached resource will represent some kind of entity coming from SQL table, it's convenient to use SQL Table name as group and entity id as key, insted of combining those two things to create unique key. The caching system is doing it on its own when it's needed.
Hello,
I've created repository called: php-no-slam-cache (https://github.com/tztztztz/php-no-slam-cache), where first, tested version of the caching system resides.
It comes with testing script intended to be run in command line windows concurrently.
In the next days I will add some detailed documentation and some examples to it, and then we can talk about porting this mechanism into Phalcon.
Despite deadlock danger, it can be pretty safely used for some internal Phalcon caching like model definitions or annotations, it can take some time, and synchronisation would help here.
Besides that, synchronisation could be disabled by default.
Whole system uses callbacks for creating resources, and it's generally simple.
One thing that I may change in next days is the transformation process of resource FROM <-> TO storage.
Currently it's pretty safe but conservative: serialisation and then encoding by base64 for safe storage of special characters/binary data.
Thank you for contributing to this issue. As it has been 90 days since the last activity, we are automatically closing the issue. This is often because the request was already solved in some way and it just wasn't updated or it's no longer applicable. If that's not the case, please feel free to either reopen this issue or open a new one. We will be more than happy to look at it again! You can read more here: https://blog.phalconphp.com/post/github-closing-old-issues
pity this was closed?
Well when I find some time (hopefully in the next 30 days), I will make bridge between my cache system with synchro (https://github.com/tztztztz/php-no-slam-cache) and Phalcon, for easy replacement of Phalcon native caching system. If anyone is interested please follow my repository.
Well, sorry guys, but I just realised that's quite impossible to implement php-no-slam-cache cache system into Phalcon.
Here's why.
Typical / Phalcon cache system:
php-no-slam-cache:
You define recipe for creating resource in form of callback
Calling cache method (files, memcached, etc.) providing key and recipe -> First and only call to php-no-slam-cache System.
Caching method / manager will execute callback only if item does not exist in the cache, synchronising whole thing, preventing cache slamming/stampede from happening and protecting Your System.
Sadly, it can't be done with Phalcon Cache interfaces, as they are constructed to fit the traditional, double cache system call way, which is not very efficient in case of heavy load.
You can still use php-no-slam-cache in Phalcon for critical parts, but You can't register it as Cache Provider that will for example automatically intercept Queries of ORM and cache it in the background.
Anyway I will still develop that library as it is very usefull at least for me.
Most helpful comment
Well, sorry guys, but I just realised that's quite impossible to implement php-no-slam-cache cache system into Phalcon.
Here's why.
Typical / Phalcon cache system:
php-no-slam-cache:
You define recipe for creating resource in form of callback
Calling cache method (files, memcached, etc.) providing key and recipe -> First and only call to php-no-slam-cache System.
Caching method / manager will execute callback only if item does not exist in the cache, synchronising whole thing, preventing cache slamming/stampede from happening and protecting Your System.
Sadly, it can't be done with Phalcon Cache interfaces, as they are constructed to fit the traditional, double cache system call way, which is not very efficient in case of heavy load.
You can still use php-no-slam-cache in Phalcon for critical parts, but You can't register it as Cache Provider that will for example automatically intercept Queries of ORM and cache it in the background.
Anyway I will still develop that library as it is very usefull at least for me.