At now Russian government try to block Telegram and deny connections to many subnetworks.
So our Synapse homeserver have lost direct connection to some other Matrix servers, eg t2bot.io.
And we got the problem, that messages sent from t2bot.io server (direct connection to them is denied) are not received by our homeserver, till other server send any event to this room. After receiving event from other matrix server (direct connection to which is allowed) - room is synced normally and receive all missed messages.
Here is log records:
2018-04-26 12:22:06,540 - synapse.http.matrixfederationclient - 223 - WARNING - None- {PUT-O-2992} Sending request failed to t2bot.io: PUT matrix://t2bot.io/_matrix/federation/v1/send/1524682873693/: TimeoutError('',)
2018-04-26 12:22:06,541 - synapse.http.outbound - 247 - INFO - None- {PUT-O-2992} [t2bot.io] Result: TimeoutError('',)
2018-04-26 12:22:06,541 - synapse.federation.transaction_queue - 499 - WARNING - - TX [t2bot.io] Failed to send transaction: User timeout caused connection failure.
2018-04-26 12:32:39,985 - synapse.federation.transaction_queue - 590 - INFO - None- TX [t2bot.io] {1524682873896} Sending transaction [1524682873896], (PDUs: 0, EDUs: 2, failures: 0)
2018-04-26 12:32:39,986 - synapse.http.outbound - 165 - INFO - None- {PUT-O-3199} [t2bot.io] Sending request: PUT matrix://t2bot.io/_matrix/federation/v1/send/1524682873896/
2018-04-26 12:32:50,384 - synapse.http.matrixfederationclient - 223 - WARNING - None- {PUT-O-3199} Sending request failed to t2bot.io: PUT matrix://t2bot.io/_matrix/federation/v1/send/1524682873896/: TimeoutError('',)
So this problem can give problems when people try to talk on two closed networks via one server as main relay (eg relay from normal internet to .onion).
As workaround for this type of problems - maybe try to periodically ack other federated servers for updates on this room, if direct connection to some federated server for this room is denied long time?
ftr, t2bot.io is not located in Russia. It is located in Germany and should have an appropriate IP address in that region. During the times in that log, t2bot.io seemed to be struggling for unknown reasons, so it's very possible that whatever load it was trying to process was causing timeout errors.
The problems is that Russian providers blocks access from Russian IP to random IP subnets in whole world, when trying to block Telegram: https://github.com/sxiii/russian-blackout
This issue is not about t2bot.io, but about whole problem when some 2 matrix federated servers have no direct access to each other, only via third server.
AFAIK this is a known problem in Synapse, if not in Matrix protocol itself (I don't know which version is true). As of now, Synapse's federation works strictly within "push" model. A homeserver will never pull events from other homeservers on its own; its only chance to understand that it missed something is when a transaction comes in with unknown ancestors. At this point, it will start to actively "pull" the missing ancestor nodes, but nothing more.
The possible solution is providing two ip addresses for domain zone, that have broken connection, and configure proxy server for incoming/outgoing messages. Example:
As result, I will repair federation problem without moving my synapse server to other server.
this is basically a dup of https://github.com/matrix-org/synapse/issues/1386, with a bit of https://github.com/matrix-org/matrix-doc/issues/469 thrown in
(and dup #2528)
Most helpful comment
AFAIK this is a known problem in Synapse, if not in Matrix protocol itself (I don't know which version is true). As of now, Synapse's federation works strictly within "push" model. A homeserver will never pull events from other homeservers on its own; its only chance to understand that it missed something is when a transaction comes in with unknown ancestors. At this point, it will start to actively "pull" the missing ancestor nodes, but nothing more.