Currently we use tcp for intra cluster connections. Why don't we use sctp for cluster connections?
Sctp should be ideal for intra cluster connections, since there could be streams for each commit which can be transmitted as individual message without blocking other commits if there is a packetloss.This could increase the overall throughput and latency on slow or lossy connections.
Else sctp is capable to mark messages with high prio, which means we use intra-connection QoS, which might reduce the ping for heartbeats and important messages as well.
If you decide to implement sctp, tcp should be still supported - might it be as failback or as explicit option in the configfile. :)
This partially intersects with https://github.com/rethinkdb/rethinkdb/issues/2331 , though using sctp sounds like it might be the better option.
I'm going to put this into backlog because high packet loss is probably not an issue for most deployments and there are a bunch of other things I think we should do before this.
Prioritization of messages would be very interesting though to optimize the allocation of network bandwidth!
Great, thank you for your fast response. :)
I think high RTTs on interconnections are inevitable in multi datacenter environments. While we use tcp for all intercluster communications, one lost packet will effect all queries at the same time, because the receiving tcp-stack will wait until the tcp retransmission is arrived, before all remaining data is available for the application on the socket.
So a usual RTT of about 60ms and some packets being lost per second, caused by a full link and an red-qdisc, the cluster performance would be decreased immediately, even if the lost packets belong to a bulk data transfer which in itself isn't time critical.
Prioritization of messages would be very interesting though to optimize the allocation of network bandwidth!
Well ... intra-connection QoS does not help increase the network bandwidth, but gonna help on bulk-transfers like cluster-reorderings or bulk write operations to perform in background while heartbeats, write acks and other retain at low latency.
This has been a growing issue as we continue to scale our application. It's really hampering our ability to grow, which in turn makes RethinkDB a less viable solution in the long run.
Obviously the problem is within our network conditions (thanks Level3), but having a better resilience to problems would be really nice. The real struggle is that this really prevents us from maintaining a 12-factor app infrastructure with RethinkDB.
-A long time RethinkDB fanboy
Most helpful comment
This has been a growing issue as we continue to scale our application. It's really hampering our ability to grow, which in turn makes RethinkDB a less viable solution in the long run.
Obviously the problem is within our network conditions (thanks Level3), but having a better resilience to problems would be really nice. The real struggle is that this really prevents us from maintaining a 12-factor app infrastructure with RethinkDB.
-A long time RethinkDB fanboy