This issue has been haunting us a long time now. Initially we did not know why this was randomly happening (Issue #708 ) later we found out that the re-connecting in general was not working (Issue #716 ) which was fixed by taylor (thank you!) but this only is now working correctly for redis instances that are not password protected.
In a situation where horizon is running against a redis instances that is password protected, and redis is shortly (or long) unavailable, even after redis is back, horizon will not recover spitting out NOAUTH errors like this one:
ERROR: NOAUTH Authentication required. {"exception":"[object] (RedisException(code: 0): NOAUTH Authentication required. at /var/www/vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php:111)
[stacktrace]
#0 /var/www/vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php(111): Redis->lLen('commands:master...')
#1 /var/www/vendor/laravel/framework/src/Illuminate/Redis/Connections/PhpRedisConnection.php(440): Illuminate\\Redis\\Connections\\Connection->command('llen', Array)
#2 /var/www/vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php(211): Illuminate\\Redis\\Connections\\PhpRedisConnection->command('llen', Array)
#3 /var/www/vendor/laravel/framework/src/Illuminate/Redis/Connections/PhpRedisConnection.php(482): Illuminate\\Redis\\Connections\\Connection->__call('llen', Array)
#4 /var/www/vendor/laravel/horizon/src/RedisHorizonCommandQueue.php(51): Illuminate\\Redis\\Connections\\PhpRedisConnection->__call('llen', Array)
#5 /var/www/vendor/laravel/horizon/src/MasterSupervisor.php(263): Laravel\\Horizon\\RedisHorizonCommandQueue->pending('master:d4711be9...')
#6 /var/www/vendor/laravel/horizon/src/MasterSupervisor.php(240): Laravel\\Horizon\\MasterSupervisor->processPendingCommands()
#7 /var/www/vendor/laravel/horizon/src/MasterSupervisor.php(213): Laravel\\Horizon\\MasterSupervisor->loop()
#8 /var/www/vendor/laravel/horizon/src/Console/HorizonCommand.php(63): Laravel\\Horizon\\MasterSupervisor->monitor()
#9 [internal function]: Laravel\\Horizon\\Console\\HorizonCommand->handle(Object(Laravel\\Horizon\\Repositories\\RedisMasterSupervisorRepository))
#10 /var/www/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(32): call_user_func_array(Array, Array)
#11 /var/www/vendor/laravel/framework/src/Illuminate/Container/Util.php(36): Illuminate\\Container\\BoundMethod::Illuminate\\Container\\{closure}()
#12 /var/www/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(90): Illuminate\\Container\\Util::unwrapIfClosure(Object(Closure))
#13 /var/www/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(34): Illuminate\\Container\\BoundMethod::callBoundMethod(Object(App\\Extensions\\Illuminate\\Foundation\\Application), Array, Object(Closure))
#14 /var/www/vendor/laravel/framework/src/Illuminate/Container/Container.php(590): Illuminate\\Container\\BoundMethod::call(Object(App\\Extensions\\Illuminate\\Foundation\\Application), Array, Array, NULL)
#15 /var/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(202): Illuminate\\Container\\Container->call(Array)
#16 /var/www/vendor/symfony/console/Command/Command.php(255): Illuminate\\Console\\Command->execute(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Illuminate\\Console\\OutputStyle))
#17 /var/www/vendor/laravel/framework/src/Illuminate/Console/Command.php(189): Symfony\\Component\\Console\\Command\\Command->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Illuminate\\Console\\OutputStyle))
#18 /var/www/vendor/symfony/console/Application.php(1011): Illuminate\\Console\\Command->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#19 /var/www/vendor/symfony/console/Application.php(272): Symfony\\Component\\Console\\Application->doRunCommand(Object(Laravel\\Horizon\\Console\\HorizonCommand), Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#20 /var/www/vendor/symfony/console/Application.php(148): Symfony\\Component\\Console\\Application->doRun(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#21 /var/www/vendor/laravel/framework/src/Illuminate/Console/Application.php(93): Symfony\\Component\\Console\\Application->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#22 /var/www/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(131): Illuminate\\Console\\Application->run(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#23 /var/www/artisan(37): Illuminate\\Foundation\\Console\\Kernel->handle(Object(Symfony\\Component\\Console\\Input\\ArgvInput), Object(Symfony\\Component\\Console\\Output\\ConsoleOutput))
#24 {main}
Important. This ONLY happens with the phpredis driver, predis is not affected.
FROM redis:5.0.2
COPY redis.conf /usr/local/etc/redis/redis.conf
CMD [ "redis-server", "/usr/local/etc/redis/redis.conf" ]
where the redis.conf looks like this
requirepass testpassword
We would welcome a PR if you want to solve this.
I'm currently not in a situation to be able to provide a PR. Did some initial digging (especially starting where @taylorotwell fixed the reauth (https://github.com/laravel/framework/pull/30778) but can not make sense of the issue.
Happy to invest time however for testing & other investigations.
This is really causing us more and more troubles. After every maintenance cycle at midnights on tuesdays I need to restart all horizon pods to get around this issue, or our log stack is flooded the next day.
I'll put together a standalone project with docker-compose & everything else ready for anybody to just download & docker-compose up to reproduce the issue easier, as I am not able to identify even the place where this should be fixed in the code.
Hopefully should have this ready in the next couple of days, and would be very grateful if somebody would chip in.
This morning i took a fresh laravel-installation, added horizon and make the necessary dockerfiles.
To my surprise i was not able to reproduce the error.
But also going back to our existing project (where we didn't update laravel or horizon since reporting this issue) i'm currently only able to reproduce this sporadically. It seems to be that it depends on what state horizon is in, this failure occurs when redis goes away, but as i'm not able to reproduce it reliably anymore, i'll have to start investigating more.
I'm already attaching the lightweight example project that anybody can use to test horizon against a password protected redis instance.
Requirements for test environment:
To run that test environment:
docker-compose builddocker-compose run horizon composer installdocker-compose up -dAfter that i am stopping redis (in the log output from the horizon container you will see horizon spitting out errors). In the times when i was able to reproduce the error, after restarting redis again (and expecting horizon to recover) horizon would continue to spit out the 'NOAUTH' errors as mentioned in the initial ticket description.
I'll keep you posted as soon as i've found the way to reliably reproduce the error.
I believe to have finally found the issue. I can reproduce the error, but only maybe 1 out of 20 attempts. It depends on the state of the redis-connection at the time of reboot of the redis server.
The underlying problem seems not to be the way laravel-reconnects (this was correctly handled in laravel/framework#30778 ) but that phpredis (not predis) itsself has a 'reconnect' feature that is not reliable remembering to auth.
There are 2 ancient but seemingly partially resolved tickets (even if they are closed): https://github.com/phpredis/phpredis/issues/515 and https://github.com/phpredis/phpredis/issues/403
whereby exactly this behaviour is described.
My problem is that i can not force this situation to happen reliably, so until i can do that, i'm not able to create a ticket in phpredis.
However the problem can be solved on side of laravel if we would not only catch the phrase 'went away' in src/Illuminate/Redis/Connections/PhpRedisConnection.php but also 'NOAUTH' assuming there is a password in the configuration (to prevent false positive in case you actually forgot to set a password in your config) and in that case reconnect. I tested this and it works as expected without that i can forsee any negative impacts.
This is the only way i'm able to explain this behaviour.
Can somebody doublecheck my assumptions & conclusions?
UPDATE:
I just found that a new stable version of phpredis has been released (5.2) and in there, there is a ticket concerning persistent connections (which is irrelevant to this problem as it happens for both situations) but there is a part of the code change so that AUTH gets sent before PING what would make sense could be solving this issue (without laravel having to do anything). The Ticket is here: https://github.com/phpredis/phpredis/issues/1668 and i'll now be rolling out the new Version of phpredis to see if this issue still persists.
I'll keep this issue updated with new information.
Developer of phpredis here, feel free to open an issue in our repo if the work in phpredis/phpredis#1668 doesn't solve the problem.
@graemlourens we are having this same problem and have been looking for a solution for a while now. I think I have finally been able to consistently produce the problem locally.
However the problem can be solved on side of laravel if we would not only catch the phrase 'went away' in src/Illuminate/Redis/Connections/PhpRedisConnection.php but also 'NOAUTH' assuming there is a password in the configuration (to prevent false positive in case you actually forgot to set a password in your config) and in that case reconnect. I tested this and it works as expected without that i can forsee any negative impacts.
Adding NOAUTH to the check as you suggested does help to solve the problem on laravels side.

There is no reliable way I can see right now to also do a config check for a password, and the exception is still thrown regardless, however, the horizon worker does not get stuck and keep throwing these exceptions so it does solve that issue.
@pulkitjalan thx for your followup. However i consider this as a temporary plaster, as its not laravels job to handle bugs (presumed) from other components.
I'd love to report a bug in phpredis as @michael-grunder suggested, but my problem is that i can not reproduce it reliably, so making an issue is currently premature.
We'll have to currently temporary fix our systems with this NOAUTH-Message catching, but I will continue to try and find the conclusive evidence that its an issue with phpredis.
If anybody has any valuable insight i'd be delighted to hear. My time on this issue is racking up fast :)
Furthermore we will be updating all our packages (laravel & horizon etc) as well as all other system components to see if this magically makes this issue disappear.
We have updated to latest 6.* laravel & 3.* horizon, as well as upgraded to phpredis 5.2 but sadly there is no difference.
@pulkitjalan could you elaborate how you are able to reproduce this reliably locally? Any special thing you're doing to reliably trigger this issue or are you just shutting down redis and it happens 100% of the times, and reliably?
Once redis and queues are up and running, im posting lots of test jobs into the queue then issuing the following to redis:
redis-cli client kill skipme yes
That generally causes an endless loop of the workers throwing the NOAUTH exception.
Very interesting. Killing connections while horizon is running works as expected - all workers / supervisors reauth and reconnect. I am not able to reproduce it with your approach - even if i go into nuke mode and constantly kill all client connections.
Are you sure you're running on phpredis 5.2 (and are you using persistent connections or not?)
Hmmm that’s odd, that method works for me.
Yea, we are running the same setup, phpredis 5.2, larval 6.* and horizon 3.*. Not using persistent connections, maybe that the problem?
--
Pulkit Jalan
[email protected]
On 16 Mar 2020, 07:13 +0000, Graem Lourens notifications@github.com, wrote:
Very interesting. Killing connections while horizon is running works as expected - all workers / supervisors reauth and reconnect. I am not able to reproduce it with your approach - even if i go into nuke mode and constantly kill all client connections.
Are you sure you're running on phpredis 5.2 (and are you using persistent connections or not?)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
I tested with persistent and not persistent connections, i am not able to reproduce the error with the approach you mentioned.
Futhermore: How does that command even work for you if you're not authing against your redis instance? Running your command without first authing is rejected.
Currently there is another bug in phpredis that has been preventing us of further largescale tests https://github.com/phpredis/phpredis/issues/1721. We're waiting for phpredis 5.2.1 so we can then continue our tests, because the only place we can reliably reproduce currently is in our kubernetes environments. No idea whats different there.
Hmmm very odd.
Ah sorry for auth this is the command:
redis-cli -a YOUR_PASSWORD client kill skipme yes
Update: We're now on this stack and waiting for phpredis 5.2.1 to continue testing:
Horizon Version: v3.7.2
Laravel Version: v6.18.1
PHP Version: 7.4.3
Redis Driver & Version: phpredis 5.2.0
@pulkitjalan If you're able to reliably reproduce it, do you agree that this seems to be a phpredis issue, and not a laravel issue (laravel is re-authing correctly, but obviuosly not on NOAUTH exceptions).
Are you able to reproduce this even without laravel? then we would be able to close this ticket and start creating one in phpredis. But as i'm not able to reproduce it locally, it's premature for me to create a ticket as i still don't know where the issue is originating.
I haven't tested this outside of laravel, and I can only reproduced it only with horizon. I do agree that there is an issue with phpredis here, but I think laravel and horizon need to be more resilient to this also. It needs to check for the NOAUTH exception and reconnect. Laravel handles database infrastructure related issues quite well, similarly it should handle for cache too.
I can open a pr with the additional check for laravel 6, but I don't know if that would be classed as a bug fix?
--
Pulkit Jalan
[email protected]
On 16 Mar 2020, 10:01 +0000, Graem Lourens notifications@github.com, wrote:
@pulkitjalan If you're able to reliably reproduce it, do you agree that this seems to be a phpredis issue, and not a laravel issue (laravel is re-authing correctly, but obviuosly not on NOAUTH exceptions).
Are you able to reproduce this even without laravel? then we would be able to close this ticket and start creating one in phpredis. But as i'm not able to reproduce it locally, it's premature for me to create a ticket as i still don't know where the issue is originating.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
I would be very surprised if such a PR would be let through, as already stated we're trying to do a 'plaster fix' for an unknown situation. If phpredis is not the issue, then the only thing would be that laravel is NOT passing the password while re-authing, but i can not find any evidence of this, as its using the same connector to re-connect, as to initially connect, but i haven't gone very very deep down the rabbit hole yet (as still fighting to reproduce).
I suspect we'll also have to fork and work off the fork until this issue is identified, but i'd not like to see horizon merge such fixes if i'm honest.
Update: phpredis 5.2.1 was released which has fixed various issues including a segfault issue https://github.com/phpredis/phpredis/issues/1721
After that we have ran several tests, including having a normal maintenance of all our systems yesterday (where lots of things happen, and until now every time horizon had an issue) and the problem for now has magically disappeared.
It is really bothering me however that i can not pinpoint what the issue was. It could have been a combination of various factors that i'm just not able to correlate at this point. I'll leave this ticket open for another 1-2 weeks to have a few maintenance cycles behind us, and also do a few more redis failure tests.
If this doesn't happen anymore, i'll close the ticket. If anybody ever finds the reason, please let me hear about it, so i can have peace of mind :)
@pulkitjalan if you're still able to reliably reproduce it after updating to the stack i mentioned (most important latest phpredis), feel free to take over this ticket.
Current stack:
Horizon Version: v3.7.2
Laravel Version: v6.18.1
PHP Version: 7.4.3
Redis Driver & Version: phpredis 5.2.1
Did you use persistent connections?
I.e. via configuration, see this code from the framework (not horizon) => https://github.com/laravel/framework/blob/95a8f05bb9b6769dfdad58eb550777b59f398b5b/src/Illuminate/Redis/Connectors/PhpRedisConnector.php#L130
Yes, we (now) do. But when i initiated this ticket, at that time we did not. So it happened with both persistent and not persistent connections.
I'm still able to reproduce this on php-redis 5.2.1 and persistence switched off, using the testing method outlined above. It does take a few tried this time, but I think that maybe down to some changes in our setup mainly to do with minimum workers. Once the bad worker restarts or terminates the problem goes away. If I get a chance, ill try and run it in our staging environment where the problem was very persistent.
I can confirm that this sometimes only happened for specific supervisors / processes. Not all of them always went into crazy mode. This speaks for a 'timing' / 'state' issue and probably is the cause of why its so hard to reproduce.
The issue has happened again and so @pulkitjalan i can confirm its not resolved yet.
Trying to find more evidence and reproducibility, but for now i'm empty handed.
Been looking for a solution. Same here. I was not able to reproduce the error yet. Laravel Error logs are getting jammed from Horizon with the error mentioned above.
@wolkenheim welcome to the self-help-group :) Are you using persistent connections (and phpredis, not predis?)
We're now in the process of migrating to laravel 7 so this will be our next testing milestone to see if for some reason this has now disappeared, which currently is our only hope.
Very grateful for any input from anybody about how to reproduce or at least what the cause could be.
I´m not sure if I am of any help here. Just a few infos on our setup: we run Laravel in Kubernetes. This is a one container deployment with Supervisor inside the PHP container.
Dockerimage is php:7.4-apache and PhpRedis installed via pecl and docker-php-ext-enable redis
Laravel v7.4.0
Horizon v4.2.1
Persistent connections: Laravel defaults (which I guess is non persistent: https://laravel.com/docs/7.x/redis)
I thought that supervisor would be our problem here. So the horizon worker starts to run amok and supervisor is not shutting it down. For our use case a simple solution would be to run the apache web server and the horizon processes in their own container. In that case it would look like: Redis goes down, Horizon starts to produce the NO_AUTH exception at some point. Kubernetes realizes that the container is malfunctioning via its liveness probe and shuts it down. If it comes back up either Redis can be reached or cannot be reached.
This is just a hypothesis yet. It still could be the case that horizon when running wild is sending out a false liveness check so that woulnd´t work. However, it does not solve the above problem.
Other options that I see:
Anyway. Thanks @graemlourens for all the time you invested here, appreciated 👏
PS: Tried to work with the horizon_test.zip docker setup. I could reproduce the NO_AUTH error one time. Afterwards I only got a "Redis connection lost" error. And Horizon was able to recover afterwards and I could dispatch new jobs to the queue which were displayed in the docker logging output. So my attempt was to start redis separately with docker-compose up redis and shut it down with ctrl + c.
@wolkenheim thank you for your thoughts.
Very good to hear that you were able to reproduce it with the horizon_test.zip i provided.
Temporary solution until we find the root cause is mentioned by @pulkitjalan but requires you to fork/patch laravel framework, and is in no way a permanent or even intermediate solution.
There is a new upcoming release of PHPRedis on its way, fixing a known issue about bad persistent connection states. https://github.com/phpredis/phpredis/issues/1742
The branch phpredis 5.2.1-liveness is the one that will fix this issue.
Still i can not understand though why this also happened for non-persistent connections, but at this point i'm desperate to try anything.
This morning also in our app containers (that are also using persistent redis connections) connections to redis were randomly failing, even if redis was open, what leads to the conclusion that there is some situation where certain persistent redis connections get 'bad' and when used, fail unexpectedly. This would also support the theory that this issue has nothing to do with horizon or laravel in general.
We'll update to the new phpredis version as soon as we can and hope that i can come back with good news...
Hi everyone :wave:
I don't actually think phpredis/phpredis#1742 will solve this issue. It could theoretically be related, but that's more of a performance bug where we're sending far too many ECHO calls to Redis to keep the persistent connections from going out of sync.
That said, I do believe it's been fixed via phpredis/phpredis@35372a1f64f643cf4ce52c62c61e326e7d6a1e6e
I pulled down the docker-compose setup @graemlourens created and was able to trigger the error somewhat reliably by doing the following while tailing the horizon logs:
docker-compose exec redis bash
$ while true; do redis-cli -a testpassword client kill type normal; done
Basically I'm nuking every redis connection in a loop and eventually it always gets into the NOAUTH state when I use phpredis 5.2.1.
However, I have not been able to put it into that state using the develop branch of phpredis. I haven't bisected the problem to 100% confirm that is the commit but I suspect it is.
I'll do some more testing tomorrow and if i can confirm this commit has fixed the problem will package up a release candidate with just this change included.
Feel free to try as well and let me know if you can still trigger the issue using our develop branch.
Cheers!
Mike
@michael-grunder hey mike, that sounds great. And indeed, i linked the wrong issue/commit. I actually meant that one you wrote, thx for correcting me.
I did a quick test and it seems that the auth fix in the develop branch actually is working, but this is just really a highlevel-check without any deeper understanding on phpredis itsself. I'll leave that kind of review to others with more knowledge about your package.
I'm really excited to roll this out to our environments as soon as you have packaged and released this new version. I can then happily give you more long term feedback when its been in Operations for a few weeks.
Kind regards, Graem
I've created a release branch for phpredis 5.2.2RC1 with fixes for this issue.
My goal is the smallest change possible so it is simply 5.2.1 with two additional commits:
If people have the time it would be awesome for others to confirm that it does in fact solve the problem.
Assuming all goes well I can get the RC out over the weekend with a GA release to follow shortly after.
@michael-grunder @graemlourens thanks to you both for all your time and help on this so far.
@michael-grunder thank you for your update. We'll be releasing a new version (of our app) with phpredis 5.2.2 in the next 2 days and then monitor the situation over the next 2 weeks. I'll get back to you if we still see any sign of trouble. After nearly 4 months this would be a delight to be able to put this one to bed! :)
That's awesome! Thanks @michael-grunder, will ship the new version soon to test.
@michael-grunder and everybody else: phpredis 5.2.2 seems to (for now) have solved the issue for the NOAUTH error. We did see very weird other behaviours when having short network interruptions during a maintenance last tuesday, but this probably has more to do in general about persistent connections, and not specifically about this issue.
@pulkitjalan @wolkenheim how is it working for you since the phpredis update?
I'll be leaving this ticket still open for confirmation of everybody involved, as well as still letting at least 2 maintenance cycles complete (1.5 weeks) and then will close the ticket.
Thanks for the update, we have not had a chance to push out the update yet, was hoping to do so today but it looks like we might not be able to for another week. I'll report back once we get a chance.
--
Pulkit Jalan
[email protected]
On 15 May 2020, 09:32 +0100, Graem Lourens notifications@github.com, wrote:
@michael-grunder and everybody else: phpredis 5.2.2 seems to (for now) have solved the issue for the NOAUTH error. We did see very weird other behaviours when having short network interruptions during a maintenance last tuesday, but this probably has more to do in general about persistent connections, and not specifically about this issue.
@pulkitjalan @wolkenheim how is it working for you since the phpredis update?
I'll be leaving this ticket still open for confirmation of everybody involved, as well as still letting at least 2 maintenance cycles complete (1.5 weeks) and then will close the ticket.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
As an update: The past few maintenance cycles were without issues and the NOAUTH issue is definitively solved with phpredis 5.2.2
However persistent redis connections are still extremely unstable and making problems for us. @michael-grunder hinted us towards other things we can try, but we've now had to focus on other issues at hand so we reverted back to not-persistent connections, which are now working fine.
I'm closing this ticket and thanking everybody involved, especially @michael-grunder for all his work at phpredis!
Thanks all!
Most helpful comment
Hi everyone :wave:
I don't actually think phpredis/phpredis#1742 will solve this issue. It could theoretically be related, but that's more of a performance bug where we're sending far too many
ECHOcalls to Redis to keep the persistent connections from going out of sync.That said, I do believe it's been fixed via phpredis/phpredis@35372a1f64f643cf4ce52c62c61e326e7d6a1e6e
I pulled down the docker-compose setup @graemlourens created and was able to trigger the error somewhat reliably by doing the following while tailing the horizon logs:
Basically I'm nuking every redis connection in a loop and eventually it always gets into the
NOAUTHstate when I use phpredis5.2.1.However, I have not been able to put it into that state using the develop branch of phpredis. I haven't bisected the problem to 100% confirm that is the commit but I suspect it is.
I'll do some more testing tomorrow and if i can confirm this commit has fixed the problem will package up a release candidate with just this change included.
Feel free to try as well and let me know if you can still trigger the issue using our develop branch.
Cheers!
Mike