The server crashes and stops with error Fatal Python error: Cannot recover from stack overflow.
The full stacktrace can be found here: matrix-stackoverflow.log
The only thing that I guess that could help to reproduce the bug is to edit log.yaml and set every log levels to INFO or even DEBUG.
I expect that my matrix server doesn't crash.
I can't reproduce this error - can you try against python 3.6? It may be version dependent
related to #4191?
@neilisfragile seems dup, ill try to make our docker image switch to py3.6 , and it should then reproduce exactly your error.
I encountered this error while having lot of instabilities with matrix, with some strange errors due to encoding and errors loop (an error handler that failed, triggering an error handler that fail and so on).
However, I was not able to clearly identify the problem at the moment, and even now I struggle to understand why Matrix went wild. It seems that setting the error verbosity to ERROR mitigated the problem.
I am a bit lost as some stacktraces seem normal as this one:
2018-11-26 08:53:36,253 - twisted - 244 - CRITICAL - - Unexpected exception from twisted.web.client.FileBodyProducer.stopProducing
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/web/_newclient.py", line 195, in _callAppFunction
function()
File "/usr/local/lib/python3.5/dist-packages/twisted/web/client.py", line 1087, in stopProducing
self._task.stop()
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/task.py", line 497, in stop
self._checkFinish()
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/task.py", line 507, in _checkFinish
raise self._completionState
twisted.internet.task.TaskStopped
In my case, I suppose that the reported error here has been triggered by this error, which seem to be also reported in #4191 :
2018-11-26 09:04:47,977 - synapse.state.v1 - 196 - ERROR - PUT-260673-$1543223055125NpvOU:encom.eu.org-$2458-4-4-6T06252.913452|Marina's_Silly_Donkey_Sits_Snowpain's_Good_Newt:ponies.im- Failed to resolve state
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/dist-packages/synapse/storage/events_worker.py", line 364, in _enqueue_events
for e in res if e
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: {"$2458-4-4-1T34409.752380|Queen Chrysalis's\u205fUpset\u205fKangaroo\u205fSteals\u205fCranky Doodle Ryan\u2019s\u205fFantastic\u205fLeopard:ponies.im": _EventCacheEntry(event=<FrozenEvent event_id='$2458-4-4-1T34409.752380|Queen Chrysalis's\u205fUpset\u205fKangaroo\u205fSteals\u205fCranky Doodle Ryan\u2019s\u205fFantastic\u205fLeopard:ponies.im', type='m.room.member', state_key='@/dev/ponies:ponies.im'>, redacted_event=None)}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/dist-packages/synapse/storage/events_worker.py", line 197, in _get_events
defer.returnValue(events)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: [<FrozenEvent event_id='$2458-4-4-1T34409.752380|Queen Chrysalis's\u205fUpset\u205fKangaroo\u205fSteals\u205fCranky Doodle Ryan\u2019s\u205fFantastic\u205fLeopard:ponies.im', type='m.room.member', state_key='@/dev/ponies:ponies.im'>]
During handling of the above exception, another exception occurred:
[...]
File "/usr/local/lib/python3.5/dist-packages/synapse/state/v1.py", line 301, in key_func
return -int(e.depth), hashlib.sha1(e.event_id.encode('ascii')).hexdigest()
builtins.UnicodeEncodeError: 'ascii' codec can't encode character '\u205f' in position 42: ordinal not in range(128)
And a verbosity greater than ERROR. But it's only speculation.
But the stacktrace is so huge, and I struggle to find where is the useful information in this stacktrace.
Here are the current logs of my server which really look like to #4191 but seem a bit different too (maybe due to python3?).
stderr.txt
Sorry for this not so useful bug report, I think you could close it in favor of #4191, even if I suspect that this error can cause many different stacktraces and even crash the whole server.
Thanks for the help, and if you want that I repost some elements from the last logfile I have attached in #4191, just ask.
For us the current workaround was to filter with CRITICAL, ERROR was not sufficient
2018-11-26 08:53:36,253 - twisted - 244 - CRITICAL - - Unexpected exception from twisted.web.client.FileBodyProducer.stopProducing
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/web/_newclient.py", line 195, in _callAppFunction
function()
File "/usr/local/lib/python3.5/dist-packages/twisted/web/client.py", line 1087, in stopProducing
self._task.stop()
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/task.py", line 497, in stop
self._checkFinish()
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/task.py", line 507, in _checkFinish
raise self._completionState
twisted.internet.task.TaskStopped
This is https://github.com/matrix-org/synapse/issues/4003.
2018-11-26 09:04:47,977 - synapse.state.v1 - 196 - ERROR - PUT-260673-$1543223055125NpvOU:encom.eu.org-$2458-4-4-6T06252.913452|Marina's_Silly_Donkey_Sits_Snowpain's_Good_Newt:ponies.im- Failed to resolve state
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/dist-packages/synapse/storage/events_worker.py", line 364, in _enqueue_events
for e in res if e
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: {"$2458-4-4-1T34409.752380|Queen Chrysalis's\u205fUpset\u205fKangaroo\u205fSteals\u205fCranky Doodle Ryan\u2019s\u205fFantastic\u205fLeopard:ponies.im": _EventCacheEntry(event=<FrozenEvent event_id='$2458-4-4-1T34409.752380|Queen Chrysalis's\u205fUpset\u205fKangaroo\u205fSteals\u205fCranky Doodle Ryan\u2019s\u205fFantastic\u205fLeopard:ponies.im', type='m.room.member', state_key='@/dev/ponies:ponies.im'>, redacted_event=None)}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.5/dist-packages/synapse/storage/events_worker.py", line 197, in _get_events
defer.returnValue(events)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: [<FrozenEvent event_id='$2458-4-4-1T34409.752380|Queen Chrysalis's\u205fUpset\u205fKangaroo\u205fSteals\u205fCranky Doodle Ryan\u2019s\u205fFantastic\u205fLeopard:ponies.im', type='m.room.member', state_key='@/dev/ponies:ponies.im'>]
During handling of the above exception, another exception occurred:
[...]
File "/usr/local/lib/python3.5/dist-packages/synapse/state/v1.py", line 301, in key_func
return -int(e.depth), hashlib.sha1(e.event_id.encode('ascii')).hexdigest()
builtins.UnicodeEncodeError: 'ascii' codec can't encode character '\u205f' in position 42: ordinal not in range(128)
This is #4226, and not related to #4191 afaict.
A couple of things which might be worth trying here:
-n as per https://github.com/matrix-org/synapse/issues/4240#issuecomment-442839481.Let me know if either of the above help.
I have already set the encoding for my log file.
log.yaml
version: 1
formatters:
precise:
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'
filters:
context:
(): synapse.util.logcontext.LoggingContextFilter
request: ""
handlers:
file:
class: logging.handlers.RotatingFileHandler
formatter: precise
filename: /var/log/matrix-synapse/homeserver.log
maxBytes: 104857600
backupCount: 10
filters: [context]
level: ERROR
encoding: utf8
console:
class: logging.StreamHandler
formatter: precise
level: ERROR
loggers:
synapse:
level: ERROR
synapse.storage.SQL:
level: ERROR
ldap3:
level: ERROR
ldap_auth_provider:
level: ERROR
root:
level: ERROR
handlers: [file, console]
I was running synapse with /usr/bin/python3 -m synapse.app.homeserver -c /etc/matrix-synapse/homeserver.yaml, I have changed it for /usr/bin/python3 -m synapse.app.homeserver -n -c /etc/matrix-synapse/homeserver.yaml as suggested.
If I still encounter some bugs, I will try to switch from ERROR to CRITICAL in log.yaml.
Anyway, I will update this comment after a while to see how synapse behave.
for what it's worth, I think most of those level settings are irrelevant, because they will be inherited from the root logger.
I get the same log as here
The server crashes and stops with error
Fatal Python error: Cannot recover from stack overflow.
The full stacktrace can be found here: matrix-stackoverflow.log
Not sure if this is any useful, but for me it was enough to increase the level to ERROR in file and disabling the level in root to make synapse run again
handlers:
file:
class: logging.handlers.RotatingFileHandler
formatter: precise
filename: /var/log/synapse/homeserver.log
maxBytes: 104857600
backupCount: 10
filters: [context]
encoding: utf8
level: ERROR
console:
class: logging.StreamHandler
formatter: precise
filters: [context]
level: INFO
loggers:
synapse:
level: INFO
synapse.storage.SQL:
# beware: increasing this to DEBUG will make synapse log sensitive
# information such as access tokens.
level: INFO
root:
#level: ERROR
handlers: [file, console]
Ok no it just happened again
Since the creation of this bug report, I have updated matrix many times and have not encountered this bug anymore. Feel free to close this issue.
so the basis of the problem is:
stderrstderr to the twisted logger (at level ERROR) unless --no-redirect-stdio (aka -n) is set on the commandlineSo there are two problems here. One is that stderr is piped back through the log subsystem, which would be fixed with something like https://github.com/matrix-org/synapse/issues/1539#issuecomment-435084421, but also there was an initial error which set off the cascade.
Not sure if this is any useful, but for me it was enough to increase the level to ERROR in file and disabling the level in root to make synapse run again
This sort of thing can reduce the amount of logging that is written, thus reducing the likelihood of hitting an error in the log subsystem - but as you have discovered, is not a complete solution.
It is tricky to figure out what is going on to cause the initial error, beyond obvious things like /var/log/synapse running out of diskspace. The easiest approach is probably:
--daemonize, or if you're using synctl, set --no-daemonize).--no-redirect-stdio on the commandlineconsole from the list of handlers under the root logger, to reduce the amount of stuff going to stderrstderr to somewhere usefulstderrclosing as a dup of #4240