Observed behaviour:
I discovered I could trigger the behaviour using cat bigfile
, and after digging around some more, found this to be an MTU issue.
Using tcpdump on the client side at the VPN tunnel I see:
14:01:58.018503 IP xxxxxxx.60001 > vpn-10-50-36-1.xxxxx.57929: UDP, bad length 1272 > 1168
14:01:58.018843 IP xxxxxxx.60001 > vpn-10-50-36-1.xxxxx.57929: UDP, length 779
I suspect this is a consequence of buggy behaviour on the part of the VPN ... failing to advertise a lower MTU and simply truncating packets in flight.
This ticket might be useful in providing another data point if there's work to make mosh more tolerant of MTU in general.
That’s really odd, if the VPN is truncating rather than dropping oversized packets.
What version of Mosh do you have on the server?
What is the MTU on the VPN? What type of VPN is it?
And can you get a tcpdump capture with -v
so we can see if the DF bit is set?
Also, the easy way to diagnose this is to start tmux/screen, do whatever hangs your session. Then type the keystrokes to switch to a new/empty screen. If your screen comes alive, then it’s almost certainly MTU.
The VPN is Cisco AnyConnect on Mac, and the observed behaviour is very odd.
I observe this behaviour on 1.3.2. (I also reverted to my stable version of 1.2.4 to confirm.)
Recompiling the server to force a small MTU makes the problem go away.
The tcpdump expression I used was incorrect. Applying a correct pattern I see:
4 0.019986 10.184.151.232 10.50.36.1 IPv4 1200 Fragmented IP protocol (proto=UDP 17, off=0, ID=a501) [Reassembled in #5]
5 0.020046 10.184.151.232 10.50.36.1 UDP 108 60001 → 52963 Len=1252
6 0.020290 10.184.151.232 10.50.36.1 UDP 847 60001 → 52963 Len=815
The VPN interface has MTU 1200, so this looks good.
My client is running on OSX, apply dtruss I see the following sequence repeated:
pselect(0x10, 0x10CFF6AEC, 0x0) = 1 0
recvmsg(0xB, 0x7FFF52C56960, 0x80) = -1 Err#35
recvmsg(0xC, 0x7FFF52C56960, 0x80) = -1 Err#35
recvmsg(0xD, 0x7FFF52C56960, 0x80) = 821 0
.. repeats ..
So the client receives the short packets (around 800 octets), but the reassembled UDP packet (longer than 1200 octets) is not delivered to the client even though it seems to be correctly fragmented as it comes out of the VPN.
The UDP fragments are being delivered (otherwise Wireshark wouldn't see them).
Possibilities I can think of:
Although at this point the problem does not seem to lie with mosh directly, a capability to (gradually) reduce the size of the UDP packet transmitted my mosh would be a way to work around scenarios such as this.
And can you get a tcpdump capture with -v so we can see if the DF bit is set?
Here are the capture summaries of the Ethernet frames named above (4, 5 and 6) that show that DF is not set.
Frame 4: 1200 bytes on wire (9600 bits), 1200 bytes captured (9600 bits) on interface 0
Null/Loopback
Internet Protocol Version 4, Src: 10.184.151.232, Dst: 10.50.36.1
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x02 (DSCP: CS0, ECN: ECT(0))
Total Length: 1196
Identification: 0xc482 (50306)
Flags: 0x01 (More Fragments)
Fragment offset: 0
Time to live: 245
Protocol: UDP (17)
Header checksum: 0x0be9 [validation disabled]
[Header checksum status: Unverified]
Source: 10.184.151.232
Destination: 10.50.36.1
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Reassembled IPv4 in frame: 5
Data (1176 bytes)
Frame 5: 108 bytes on wire (864 bits), 108 bytes captured (864 bits) on interface 0
Null/Loopback
Internet Protocol Version 4, Src: 10.184.151.232, Dst: 10.50.36.1
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x02 (DSCP: CS0, ECN: ECT(0))
Total Length: 104
Identification: 0xc482 (50306)
Flags: 0x00
Fragment offset: 1176
Time to live: 245
Protocol: UDP (17)
Header checksum: 0x2f9a [validation disabled]
[Header checksum status: Unverified]
Source: 10.184.151.232
Destination: 10.50.36.1
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
[2 IPv4 Fragments (1260 bytes): #4(1176), #5(84)]
[Frame: 4, payload: 0-1175 (1176 bytes)]
[Frame: 5, payload: 1176-1259 (84 bytes)]
[Fragment count: 2]
[Reassembled IPv4 length: 1260]
[Reassembled IPv4 data: ea61fe2904eced8d80000000000012536457a225cc7ce3a7...]
User Datagram Protocol, Src Port: 60001, Dst Port: 65065
Data (1252 bytes)
Frame 6: 827 bytes on wire (6616 bits), 827 bytes captured (6616 bits) on interface 0
Null/Loopback
Internet Protocol Version 4, Src: 10.184.151.232, Dst: 10.50.36.1
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x02 (DSCP: CS0, ECN: ECT(0))
Total Length: 823
Identification: 0xc483 (50307)
Flags: 0x00
Fragment offset: 0
Time to live: 245
Protocol: UDP (17)
Header checksum: 0x2d5d [validation disabled]
[Header checksum status: Unverified]
Source: 10.184.151.232
Destination: 10.50.36.1
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: 60001, Dst Port: 65065
Data (795 bytes)
Overall it seems to me that for mosh to be robust, it cannot rely on UDP fragmentation ever working properly across the diverse environments in which it will be used.
/*
* IPv4 MTU. Don't use full Ethernet-derived MTU,
* mobile networks have high tunneling overhead.
*
* As of July 2016, VPN traffic over Amtrak Acela wifi seems to be
* dropped if tunnelled packets are 1320 bytes or larger. Use a
* 1280-byte IPv4 MTU for now.
*
* We may have to implement ICMP-less PMTUD (RFC 4821) eventually.
*/
static const int DEFAULT_IPV4_MTU = 1280;
Looking in RFC 4821, I read:
It is RECOMMENDED that search_low be initially set to an MTU size that is likely to work over a very wide range of environments. Given today's technologies, a value of 1024 bytes is probably safe enough. The initial value for search_low SHOULD be configurable.
Given that, perhaps DEFAULT_IPV4_MTU = 1024;
would be a good compromise until mosh is capable of performing its own MTU probes.
By the book, RFC 791 says that hosts are permitted to drop fragmented packets larger than 576 octets:
All hosts must be prepared to accept datagrams of up to 576 octets (whether they arrive whole or in fragments). It is recommended that hosts only send datagrams larger than 576 octets if they have assurance that the destination is prepared to accept the larger datagrams.
If mosh is doing its own MTU assessments, maybe consider the following:
I had this same issue with openvpn where I had set the MTU to 1200. I remove the setting in advanced settings and just used the default (which was 1500), and now all is well with the world.
The source where this gets defined is here for others who may need to recompile mosh to get it to work with their network setup.
It would definitely be nice for mosh to auto detect MTU and adjust accordingly.
Most helpful comment
Overall it seems to me that for mosh to be robust, it cannot rely on UDP fragmentation ever working properly across the diverse environments in which it will be used.
Looking in RFC 4821, I read:
Given that, perhaps
DEFAULT_IPV4_MTU = 1024;
would be a good compromise until mosh is capable of performing its own MTU probes.By the book, RFC 791 says that hosts are permitted to drop fragmented packets larger than 576 octets:
If mosh is doing its own MTU assessments, maybe consider the following: