October: CodeParser concurrent access error

Created on 1 Jul 2015  路  24Comments  路  Source: octobercms/october

Receiving concurrent access to my october website is throwing this exception:

exception 'Symfony\Component\Debug\Exception\FatalErrorException' with message 'Class 'Cms5593c06736213_2556179689Class' not found' in /var/app/current/modules/cms/classes/CodeParser.php:162
Stack trace:

0 {main}

Trying to figure out what is happening, I found a bug(?) on php (or Apache or OS ?) that when a file is being creating and another process o the same time check its existence it returns true, indicating that the file exists. But when try to access, the file is not totally created, thowing this exception.
Its happening very often. I receive ~100 simultaneous visitors along the day.

I could reproduce this problem executing Apache Bench with 5 or more concurrent requests.
ab -n 100 -c 5 http://localhost/

It's annoying me.
Anyone have this problem?
Perhaps it could be a server problem? (we use amazon servers)

High Completed Bug

Most helpful comment

I am attempting to use October for the web of my OS project, https://github.com/bearishsun/bansheeengine/. But I appear to have ran into this same problem:

PHP Fatal error: 聽Class 'Cms57a7a2f63233a_132285811Class' not found in /data/web/e53796/html/BansheeWeb/modules/cms/classes/CodeParser.php on line 162

I have found no exact cause of the error. The site sometimes runs fine, and sometimes pops the error. After a few moments (can be a few seconds, to a few hours) it starts running fine again.

Aside from the error in CodeParser.php I am also getting same type of errors in various .html pages of my theme.

All 24 comments

I didn't mention what I've already tried to solve it.
I tried to lock the file. First passing the thrid param to File::put($path, $fileContents, true), then with flock. Neither one worked.

I've tried to use another file working like semaphore for each file parsed. Even writing these files on /dev/shm witch I think is faster, it didn't work.

Finally I've tried to use real semaphores (like sem_get, sem_acquire...) and It works! But when I deployed, was a matter of time to start to throw errors about the semaphore.

Well, I hope this help someone to help me hehe.

After some research about this problem, I have a reasonable solution for me.
It didn't solve completely but reduced the problem drastically.

https://gist.github.com/shina/17171ba5be986959ebad

Can you point out what the differences are in this class?

Basically it uses the Cache class instead of file manipulation.

https://gist.github.com/shina/3b37a2cf23efea3f2de3/revisions?diff=split

@shina It would appear the gist has disappeared...

I am attempting to use October for the web of my OS project, https://github.com/bearishsun/bansheeengine/. But I appear to have ran into this same problem:

PHP Fatal error: 聽Class 'Cms57a7a2f63233a_132285811Class' not found in /data/web/e53796/html/BansheeWeb/modules/cms/classes/CodeParser.php on line 162

I have found no exact cause of the error. The site sometimes runs fine, and sometimes pops the error. After a few moments (can be a few seconds, to a few hours) it starts running fine again.

Aside from the error in CodeParser.php I am also getting same type of errors in various .html pages of my theme.

We added some logic to check for these instances, but it looks like this is caused by some race condition. Can you provide some replication instructions?

Sadly no, it appears random on my end. Can you give me more insight in when is CodeParser supposed to generate those cache files/classes (I have no knowledge of October or Laravel internals)? It feels like that is something that should be generated once during install and not during runtime, am I right? If I know when and why they're created I might be able to track down the reproduction steps.

The file is used for caching the page contents, it should only be generated once, that's true. However it appears to be conflicting when generated twice at the same time, since the class name is randomly generated.

I wonder if this might have something to do with it:
http://blog.kevingomez.fr/til/2015/07/26/why-is-uniqid-slow/

This isn't a fix, but possibly a way to reduce the problem significantly. CodeParser creates a cached object containing all the file paths and modified times of the generated files, and that data gets cached for 24 hours. When the cache expires (after 24h) the CodeParser will rebuild the cache data, but it will also re-parse the files. This is when the error seems to happen.

The second step of re-parsing the files should not be necessary. If the file exists it should be enough to compare the time modified of the source and generated file. If they match they can just be added to the cache as they are.

By not generating the files excesively the error should be minimized. However the true source of the problem eludes me. I have actually tried disabling the cache so the file parsing happens on every hit, but was still unable to reproduce the problem - it's still random, my site worked fine for the last three days and now it started acting up again.

We managed to find out more about the issue by adding a bunch of trace logs, and reducing the cache time to 5 minutes to make the error more reproducible. The site crashed as usual, but now we have a view of everything that was happening before hand. I have included the log (with only relevant information for that particular access) and modified CodeParser.php (so you know where in code are the log messages coming from).

CodeParserCrash.zip

The error happens when two users request the page concurrently (or if Apache launches two threads accessing the page for whatever other reason). As you can see from the log two requests for the exact same layout/pages are being made at the same time, both executing the same code pretty much line by line.

As a specific example you can look at the beginning of the log where CodeParser::parse is running for "banshee3d/layouts/default.htm" concurrently. The request that actually fails is the one for "banshee3d/partials/head.htm".

As long as the two requests run in order, everything works fine. One request just overwrites another. You can see that with every layout/page parse in the log, until the last one. Last one fails because the first request finished writing to the file after the second request, yet the second request gets to record the class name. The system them tries to instantiate the class name that it doesn't recognize and the issue happens.

I haven't thought of a solution yet, but it likely involves using mutexes to prevent mutual execution of that code. Also the cache rebuilding should be avoided as I wrote above, unless the file was actually modified, just for performance if nothing else.

Actually mutexes wouldn't work since technically different requests would each have their own copy. Rather some sort of lock file should be used for mutual exclusion using flock().

We might have a fix, currently testing it for about a couple of days. Will see how it holds up in a few more and submit a PR if it works.

2328 should do it

Welp, it took 4 hours of straight coding and debugging, but I think the issue is solved. Thanks everyone who helped with this!

Benchmarking code used:

ab -n 100 -c 10 http://october.dev/

Worthwhile noting that Twig handles the writing of these cache files differently. Instead of using this code, like we do:

if (!@file_put_contents($path, $fileContents, LOCK_EX)) {

Twig uses:

if (false !== @file_put_contents($tmpFile, $content) && @rename($tmpFile, $key)) {

This is an interesting strategy to avoid collisions:

  1. Write to temp file first
  2. Rename temp file to target

However. collisions are still possible and still happen in October due to Twig's approach. When the destination file is locked, the rename() function call will return access denied.

During testing there were about 5 instances where Twig is now throwing an error.

[2016-08-27 03:27:10] dev.ERROR: exception 'RuntimeException' with message 'Failed to write cache file "~\storage/cms/twig/3b/3b6e2953e7ebc59583c3e5b2d3ebcfeb8ea580a427a761088115f1c7f1d7a160.php".' in ~\vendor\twig\twig\lib\Twig\Cache\Filesystem.php:82

These are the only remaining errors I was able to detect. We may need to convert Twig's file system cache to use our methodology to prevent this from occuring...

Looking at your fix, collisions still seem possible to me:
Thread A calls rebuild()
Thread B calls rebuild() (and overwrites anything A has written)
Thread B exits rebuild() and stores class name in cache
Thread A exits rebuild() and stores class name in cache

At this point class name for A will be in cache, yet class name generated for B will be in the file. Solution provided by @MarcoROG should avoid that.

Incorrect, see the LOCK_EX on the file_put_contents call

Thread A calls rebuild() (locks the file)
Thread B calls rebuild() and is sleeping (waits for thread A to unlock the file)
Thread A exits rebuild() and stores class name in cache (now has a head start on Thread B)
Thread B awakens and overwrites anything A has written
Thread B exits rebuild() and stores class name in cache
Next thread use correct class name , or self-repair if class name doesn't match (if somehow Thread B beats Thread A)

It is unfortunate that Thread B is intentionally slowed down but this really should be a 0.00000001% possibility that 2 simultaneous threads are spawn camping on a new page. Either way it won't result in an error, just trivially slow performance.

In addition, your scenario shown above is actually fine too because Thread A's contents with Thread B's class name will self repair in Thread C. Thanks to the code provided by @MarcoROG

There is another issue will cause October to choke with simultaneous threads, caused by Twig. I'm not sure what their stance is on this but it would appear they allow atomic threads to fail when spawn camping. Probably because it is so rare. We may end up adopting this approach to align ourselves and simplify the code.

Try to disable opcache in .htaccess.

php_flag opcache.enable Off

Or you can disable the CMS configuration option forceBytecodeInvalidation (see config/cms.php)

Last weeks i getting sometimes errors similar to this issue, looks like issue is back?

But for cache iam using Redis, iam not sure why its trying to access cache file in storage folder.

ErrorException: file_get_contents(/var/www/html/storage/cms/cache/d3/61/filter.htm.php): failed to open stream: No such file or directory in /var/www/html/modules/cms/classes/CodeParser.php:298

then

Symfony\Component\Debug\Exception\FatalThrowableError: Class 'Cms5cb641e7cb229649974821_39d638926a085ef8282ca859221b2b3aClass' not found in /var/www/html/modules/cms/classes/CodeParser.php:181

Was this page helpful?
0 / 5 - 0 ratings