Synapse: federation_stream_position table should have a unique index

Created on 9 Oct 2018  路  8Comments  路  Source: matrix-org/synapse

The symptoms of this bug are:

  • People on other servers cannot read your messages
  • Typing notifications (and presence) are sent correctly
  • Synapse logs contain the following exception:
CRITICAL - - Unhandled error in Deferred:
CRITICAL - - 
Traceback (most recent call last):
  File "env/lib/python2.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "env/lib/python2.7/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "env/lib/python2.7/site-packages/synapse/metrics/background_process_metrics.py", line 193, in run
    yield func(*args, **kwargs)
  File "env/lib/python2.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "env/lib/python2.7/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "env/lib/python2.7/site-packages/synapse/federation/transaction_queue.py", line 247, in _process_event_queue_loop
    "events", next_token
  File "env/lib/python2.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "env/lib/python2.7/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "env/lib/python2.7/site-packages/synapse/storage/_base.py", line 325, in runInteraction
    *args, **kwargs
  File "env/lib/python2.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "env/lib/python2.7/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "env/lib/python2.7/site-packages/synapse/storage/_base.py", line 374, in runWithConnection
    inner_func, *args, **kwargs
  File "env/lib/python2.7/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "env/lib/python2.7/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "env/lib/python2.7/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "env/lib/python2.7/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "env/lib/python2.7/site-packages/twisted/enterprise/adbapi.py", line 306, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "env/lib/python2.7/site-packages/twisted/enterprise/adbapi.py", line 297, in _runWithConnection
    result = func(conn, *args, **kw)
  File "env/lib/python2.7/site-packages/synapse/storage/_base.py", line 370, in inner_func
    return func(conn, *args, **kwargs)
  File "env/lib/python2.7/site-packages/synapse/storage/_base.py", line 246, in _new_transaction
    r = func(txn, *args, **kwargs)
  File "env/lib/python2.7/site-packages/synapse/storage/_base.py", line 854, in _simple_update_one_txn
    raise StoreError(500, "More than one row matched")
StoreError: 500: More than one row matched

This has now been reported by two users, @TheForcer at https://github.com/matrix-org/synapse/issues/3750#issuecomment-424516412, and @SylvainCecchetto at #4018.

The basic problem is that federation_stream_position contains too many rows, for example:

sqlite> select * from federation_stream_position;
type        stream_id 
----------  ----------
federation  -1        
events      -1    
federation  -1        
events      -1    
federation  -1        
events      13943    

I have yet to understand how this happens. If anybody else sees it, it would be very helpful to get Synapse logs from the point at which the More than one row matched error starts.

good first issue maintenance p2

All 8 comments

Hi!
It seems like the problem is still here for me :-/

However I have that:

matrix_synapse=> select * from federation_stream_position;
    type    | stream_id
------------+-----------
 federation |        -1
 events     |    101244
(2 lignes)

It has worked for at least 2 or 3 messages but now I have the same symptoms

@SylvainCecchetto unless you have the More than one row matched error in your logs it is a different problem.

I just check my log and there is still this error (only the part with the error bellow)

[...]
2018-10-09 16:57:21,135 - synapse.util.caches.response_cache - 151 - INFO - GET-275- [sync]: no cached result for [(@sylvain:matrix.cecchettosylvain.fr, 30000, u's101230_231789_4_8701_146_1_3134_201_3', u'0', False, u'SCFQMFMPWA')], calculating new one
2018-10-09 16:57:24,507 - synapse.handlers.presence - 350 - INFO - - Handling presence timeouts
2018-10-09 16:57:24,511 - synapse.handlers.typing - 77 - INFO - - Checking for typing timeouts
2018-10-09 16:57:24,531 - synapse.rest.media.v1.preview_url_resource - 388 - INFO - expire_url_cache_data-249- Running url preview cache expiry
2018-10-09 16:57:24,537 - synapse.metrics - 374 - INFO - - Collecting gc 0
2018-10-09 16:57:24,542 - synapse.rest.media.v1.preview_url_resource - 467 - INFO - expire_url_cache_data-249- Deleted 0 media from url cache
2018-10-09 16:57:24,701 - synapse.storage.TIME - 218 - INFO - - Total database time: 0.000% {_update_client_ips_batch(2): 0.000%, get_expired_url_cache(1): 0.000%, get_url_cache_media_before(1): 0.000%} {}
2018-10-09 16:57:29,507 - synapse.handlers.presence - 350 - INFO - - Handling presence timeouts
2018-10-09 16:57:29,512 - synapse.handlers.typing - 77 - INFO - - Checking for typing timeouts
2018-10-09 16:57:32,334 - synapse.access.https.8448 - 218 - INFO - POST-276- ::ffff:212.47.245.0 - 8448 - Received request: POST /_matrix/federation/v1/groups/+loria:sh.dalleau.me/renew_attestation/@sylvain:matrix.cecchettosylvain.fr
2018-10-09 16:57:32,339 - synapse.federation.transport.server - 130 - INFO - POST-276- Request from sh.dalleau.me
(HERE) ==> 2018-10-09 16:57:32,351 - synapse.http.server - 85 - INFO - POST-276- <SynapseRequest at 0x7fecf80bfb48 method='POST' uri=u'/_matrix/federation/v1/groups/+loria:sh.dalleau.me/renew_attestation/@sylvain:matrix.cecchettosylvain.fr' clientproto='HTTP/1.1' site=8448> SynapseError: 500 - More than one row matched
2018-10-09 16:57:32,355 - synapse.access.https.8448 - 287 - INFO - POST-276- ::ffff:212.47.245.0 - 8448 - {sh.dalleau.me} Processed request: 0.020sec/0.001sec (0.012sec, 0.000sec) (0.001sec/0.003sec/1) 59B 500 "POST /_matrix/federation/v1/groups/+loria:sh.dalleau.me/renew_attestation/@sylvain:matrix.cecchettosylvain.fr HTTP/1.1" "Synapse/0.24.1" [0 dbevts]
2018-10-09 16:57:32,356 - synapse.metrics - 374 - INFO - - Collecting gc 0
2018-10-09 16:57:34,507 - synapse.handlers.presence - 350 - INFO - - Handling presence timeouts
2018-10-09 16:57:34,511 - synapse.handlers.typing - 77 - INFO - - Checking for typing timeouts
2018-10-09 16:57:34,531 - synapse.rest.media.v1.preview_url_resource - 388 - INFO - expire_url_cache_data-250- Running url preview cache expiry
2018-10-09 16:57:34,543 - synapse.rest.media.v1.preview_url_resource - 467 - INFO - expire_url_cache_data-250- Deleted 0 media from url cache
2018-10-09 16:57:34,701 - synapse.storage.TIME - 218 - INFO - - Total database time: 0.000% {_update_client_ips_batch(2): 0.000%, get_expired_url_cache(1): 0.000%, update_remote_attestion(1): 0.000%} {}
2018-10-09 16:57:39,507 - synapse.handlers.presence - 350 - INFO - - Handling presence timeouts
2018-10-09 16:57:39,512 - synapse.handlers.typing - 77 - INFO - - Checking for typing timeouts
[...]

hum, it sounds like you have the same problem with a different table, which is causing those errors. They should not affect outgoing messages, though. Could you open a new issue?

for those affected, this sql should fix it (shut down synapse first for safety):

CREATE TEMPORARY TABLE f1 (LIKE federation_stream_position);
INSERT INTO f1 (SELECT type, max(stream_id) FROM federation_stream_position GROUP BY type);
DELETE FROM federation_stream_position;
INSERT INTO federation_stream_position (SELECT * FROM f1);

I think one cause of this might be if people create a new database, and then attempt to restore a backup over it - so you end up with the -1 from the clean database, as well as the later stream position from the backup.

An easy fix to stop people getting caught out for it would be for us to put a unique index on the table (I don't know why it doesn't have one)

I think one cause of this might be if people create a new database, and then attempt to restore a backup over it - so you end up with the -1 from the clean database, as well as the later stream position from the backup.

I do confirm that's what I did.

My table had that list of rows:

type    | stream_id

------------+-----------
federation | -1
events | -1
federation | -1
events | [someBigNumber]
(4 rows)

The command you gave solved the problem perfectly.

Thanks a lot :smiley:

Was this page helpful?
0 / 5 - 0 ratings