Synapse: Federated room on homeserver not updated on broken connection from one of other servers

Created on 26 Apr 2018  路  6Comments  路  Source: matrix-org/synapse

At now Russian government try to block Telegram and deny connections to many subnetworks.
So our Synapse homeserver have lost direct connection to some other Matrix servers, eg t2bot.io.

And we got the problem, that messages sent from t2bot.io server (direct connection to them is denied) are not received by our homeserver, till other server send any event to this room. After receiving event from other matrix server (direct connection to which is allowed) - room is synced normally and receive all missed messages.

Here is log records:

2018-04-26 12:22:06,540 - synapse.http.matrixfederationclient - 223 - WARNING - None- {PUT-O-2992} Sending request failed to t2bot.io: PUT matrix://t2bot.io/_matrix/federation/v1/send/1524682873693/: TimeoutError('',)
2018-04-26 12:22:06,541 - synapse.http.outbound - 247 - INFO - None- {PUT-O-2992} [t2bot.io] Result: TimeoutError('',)
2018-04-26 12:22:06,541 - synapse.federation.transaction_queue - 499 - WARNING - - TX [t2bot.io] Failed to send transaction: User timeout caused connection failure.
2018-04-26 12:32:39,985 - synapse.federation.transaction_queue - 590 - INFO - None- TX [t2bot.io] {1524682873896} Sending transaction [1524682873896], (PDUs: 0, EDUs: 2, failures: 0)
2018-04-26 12:32:39,986 - synapse.http.outbound - 165 - INFO - None- {PUT-O-3199} [t2bot.io] Sending request: PUT matrix://t2bot.io/_matrix/federation/v1/send/1524682873896/
2018-04-26 12:32:50,384 - synapse.http.matrixfederationclient - 223 - WARNING - None- {PUT-O-3199} Sending request failed to t2bot.io: PUT matrix://t2bot.io/_matrix/federation/v1/send/1524682873896/: TimeoutError('',)

So this problem can give problems when people try to talk on two closed networks via one server as main relay (eg relay from normal internet to .onion).

As workaround for this type of problems - maybe try to periodically ack other federated servers for updates on this room, if direct connection to some federated server for this room is denied long time?

Most helpful comment

AFAIK this is a known problem in Synapse, if not in Matrix protocol itself (I don't know which version is true). As of now, Synapse's federation works strictly within "push" model. A homeserver will never pull events from other homeservers on its own; its only chance to understand that it missed something is when a transaction comes in with unknown ancestors. At this point, it will start to actively "pull" the missing ancestor nodes, but nothing more.

All 6 comments

ftr, t2bot.io is not located in Russia. It is located in Germany and should have an appropriate IP address in that region. During the times in that log, t2bot.io seemed to be struggling for unknown reasons, so it's very possible that whatever load it was trying to process was causing timeout errors.

The problems is that Russian providers blocks access from Russian IP to random IP subnets in whole world, when trying to block Telegram: https://github.com/sxiii/russian-blackout

This issue is not about t2bot.io, but about whole problem when some 2 matrix federated servers have no direct access to each other, only via third server.

AFAIK this is a known problem in Synapse, if not in Matrix protocol itself (I don't know which version is true). As of now, Synapse's federation works strictly within "push" model. A homeserver will never pull events from other homeservers on its own; its only chance to understand that it missed something is when a transaction comes in with unknown ancestors. At this point, it will start to actively "pull" the missing ancestor nodes, but nothing more.

The possible solution is providing two ip addresses for domain zone, that have broken connection, and configure proxy server for incoming/outgoing messages. Example:

  • t2bot.io server have IP 178.63.27.77
  • my synapse homeserver have domain name mymatrix.org pointing to IP 1.1.1.1
  • my homeserver datacenter block direct connections from IP 1.1.1.1 to IP 178.63.27.77 and backwards.
  • I add additional A-record for domain name mymatrix.org, pointing to other server with IP 2.2.2.2, that located in other datacenter and have no blocked connection to IP 1.1.1.1 and IP 178.63.27.77.
  • I install second server at IP 2.2.2.2, and configure proxy for proxying incoming queries (from IP 178.63.27.77) to IP 1.1.1.1, and outgoing queries (from IP 1.1.1.1 to 178.63.27.77).
  • t2bot.io server must try to send events randomly to IP 1.1.1.1 and 2.2.2.2, and repeats when sending is failed.
  • Additionally I must configure my homeserver to send outgoing messages to t2bot.io via my proxy server (eg via rewriting t2bot.io IP to my proxyserver, and nginx reverse proxy on proxy side)
  • Also I must force connect to IP 1.1.1.1 my homeserver clients, for this I will create new domain direct.myhomeserver.org for configuring it in Matrix clients to don't use IP 2.2.2.2 for connections.

As result, I will repair federation problem without moving my synapse server to other server.

(and dup #2528)

Was this page helpful?
0 / 5 - 0 ratings