Termux-packages: Laggy SSH

Created on 2 Aug 2017  Â·  45Comments  Â·  Source: termux/termux-packages

Problem: Laggy SSH. Noted particulary over local LAN WiFi connections

This is a problem that affects both Termux and Connectbot, so it's probably not strictly a Termux issue.

Symptoms

Very slow / laggy response to SSH input. Display fails to respond for several seconds, up to 24 measured (see below), and possibly longer experienced. It's frequently faster to disconnect and reconnect (under screen or termux) than wait for the remote system to respond.

I've conducted a number of experiments to try to get some scope of the problem.

Repeated SSH connections, comparing local and remote timestamps

The test: Check the local time, connect to the remote system, check its time, and compare the two (subtracting remote from local, in seconds from epoch). Lack of clock coordination (each _should_ be on a timeserver ... sigh) means that remote is 3 seconds ahead of local. Any value > -3 indicates a lag.

Note the value of 21 (which is to say, a lag of _24 seconds_), on the 11th row of output.

$ for i in {0..16}; do check-ssh-lag ; done
 -2  -3  -2  -3  -2  -2  -2  -3  -3  -2  -2  -2   5  -2  -2  -2  -2  -2   2  -2
  1   0  -2   0  -2  -3  -2  -3  -2  -3  -3  -3  -3  -3  -3  -3  -3  -2  -3  -3
 -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -3  -3
 -2  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -2  -3  -2  -3  -3  -3  -3
 -2  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3
 -2  -3  -2  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -2  -3  -2  -3
 -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -2
 -3  -3  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -2  -3  -2  -3  -3  -3  -3
 -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -3  -3  -2  -3
 -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -2  -3  -2  -3  -3  -3
 -3  -2  -2  -3  -3  -3  -2  -2  -2   3  -1  21  -2  -2  -3  -2  -3  -3  -3  -3
 -3  -3  -2  -3  -3  -3  -3  -2  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3
 -3  -2  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3
 -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3
 -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -2  -3  -2  -3  -3  -3  -3
 -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -2  -3  -2  -3  -3
 -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -2  -3  -3  -3  -3  -3  -3  -3  -3

The bash function:

check-ssh-lag () 
{ 
    for ((i=0; i<20; i++ ))
    do
        printf "%3i " "$( echo "-1 * $(date +%s) + $(ssh remote date +%s)" | bc)";
    done;
    echo
}

(ssh-agent is running to provide key-based auth.)

Simultaneously running an interactive shell in a 'screen' split terminal, with a loop printing the date every second, there was no observable lag.

This suggests that it's data being sent from, rather than received by, the android device which is a problem. And not networking.

tcpdump

I've installed tcpdump on the remote host and am watching what it's doing.

I've already noted that whilst locally generated keystrokes aren't being registered remotely, if I'm running a split-screen session with some process generating constant output (e.g., top, or tailing some logfile), I _will_ see the generated output, but cannot _send_ data.

I've been logging tcpdump to a file as below:

tcpdump -c 1000000 -U -C  1 -w /tmp/tcpdump-android-ssh \
    '(src host 192.168.xx.xx) and (port ssh)'

And (in Yet Another Screen Window), am tailing that under watch:

watch 'sudo tcpdump -r /tmp/tcpdump-android-ssh | tail -8 '

What I'll see are apparently a bunch of null-length packets being received, then, after some seconds of delay, the data start to flow again.

The empty packets seem to have length 0, though I want to confirm.

This strongly suggests, though, that _communications_ isn't the problem but _ssh on the local (Android) host not sending data_ is.

Terminal behaviour

One thing I've noticed is that straight text entry is rarely problematic. E.g., open a vim session on the remote host and type:

Now is the time for all good men to come to the aid of their party.

... repeatedly. I don't observe lag.

But ... if I am engaged in a complex editing task (moving around the file repeatedly), or commandline interactions (I do a log of on-shell editing via Readline, etc.), I observe the problem far more frequently.

This suggests that _terminal interactions_, possibly on Termux itself, are a hangup.

Given that I've also identified Termux lagg in scrollback as an identified issue ... I've got some reason to suspect this here as well.

https://github.com/termux/termux-app/issues/368

Further steps

Identifying whether / how the local ssh client is sending or receiving data would be useful. I suspect that's going to involve debuggers.

bug report

Most helpful comment

Don't close it, it's still happening, although it seems to depend on the ROM and environment.

All 45 comments

could it be a rendering issue, instead of an issue with the data being sent/received?

i use tmux, and i notice some weird issues sometimes, and if i bump the terminal width a bit it re-renders and everything appears as expected.

do this, setup a tmux session, and connect to it from two devices. on the other device, use a linux box, or osx, or whatever... then connect them both to the same tmux session.

see if your input is rendered on one and not the other.

that way you can really see if there's a problem sending/receiving data, or a problem with rendering it locally.

Could also perhaps be the same issue as https://github.com/termux/termux-packages/issues/491, i.e. (probably) wifi chip going to sleep when not used on device.

so setup two devices over same tmux session... one over wifi, one over usb ethernet. should be obvious immediately if there's a wifi issue vs a hard wired connection.

or just test it both ways...

I also had massive problems to ssh in to my device via WLAN. Things got even worse after the screensaver got activated.

After some investigation I found that pinging the gateway from the smartphone keeps the WLAN alive.
This safely prevents that WLAN crap from going to sleep:-)

So I wrote some "pinging-daemon". I simply call it from my .profile. By the way the script automatically starts "sshd" after the first login what is very convenient.

The "pinging-daemon" if not called interactively logs its activity to file /data/data/com.termux/files/home/bin/s_ping.log

That "solved" all SSH-lag issues for me so far.

Just take care programs like 'gawk', 'pgrep' et al. are installed to their full extend. Busybox versions might not work for this. The package 'termux-exec' is also installed here.

/data/data/com.termux/files/home/.profile:

# place these lines somewhere in your profile
export R_=$PREFIX
sshs start

/data/data/com.termux/files/home/bin/sshs:

#!/bin/sh

[ $1"" = start ] && {
    pgrep -f sshd$ > /dev/null || {
        echo starting sshd
        /data/data/com.termux/files/usr/bin/sshd
    }
    pgrep -f 's_ping_[0-9]+.awk' > /dev/null || {
        echo starting s_ping
        start-stop-daemon -S -b -x $HOME/bin/s_ping
    }
}

[ $1"" = stop ] && {
    pgrep -f sshd$ > /dev/null && {
        echo stopping sshd
        start-stop-daemon -K -p $R_/var/run/sshd.pid
    }
    pgrep -f 's_ping_[0-9]+.awk' > /dev/null && {
        echo stopping s_ping
        start-stop-daemon -K -p $R_/var/run/s_pingd.pid; rm $R_/var/run/s_pingd.pid
    }
}

/data/data/com.termux/files/home/bin/s_ping:

#!/bin/sh

[ -t 0 ] || exec >> $0.log 2>&1

cat > $0_$$.awk << !
func pr(a \
    )
{
    print strftime("%y-%m-%d %T") ": " a
    fflush()
}

BEGIN {
    system("rm $0_$$.awk")
    CMD = "$R_/var/run/s_pingd.pid"; print "-" PROCINFO["pgrpid"] > CMD; close(CMD)

    pr("-" PROCINFO["pgrpid"] " > " CMD)
    while (1) {
        pr("----------- LOOPING -----------")
        CMD = "netstat -rn 2>&1"
        while (CMD | getline line > 0) {
            if (match(line, "^0.0.0.0 +([^ ]+) .*wlan0$", a)) {
                DEF_GWY = a[1]
            }
        }
        close(CMD)
        if (DEF_GWY) {
            CMD = "ping -W 10 -q -i 0.2 " DEF_GWY " 2>&1"
            pr("CMD: " CMD)
            line = ""
            CMD | getline line; close(CMD)
            pr(">" line "<")
        } else {
            pr("no gwy, retrying...")
        }
        CMD = "sleep 5"; CMD | getline; close(CMD)
    }
}
!
eval exec awk -f $0_$$.awk

Why not just use keepalive?

From the client side you can set ServerAliveInterval

ServerAliveInterval
Sets a timeout interval in seconds after which if no data has been received from the server, ssh(1) will send a message through the encrypted channel to request a response from the server. The default is 0, indicating that these messages will not be sent to the server.

because that does not help if the server is not reachable since WLAN is already (half) down.

Instead you must keep flowing some constant traffic initiated by the server and going through the WLAN connection to prevent WLAN from going to sleep at all. A good target for this is your gateway.

Is there some configuration option on Android to prevent WLAN from going to sleep? Probably not because they do not support a server running on a typical smartphone device.

Instead you must keep flowing some constant traffic initiated by the server and going through the WLAN connection to prevent WLAN from going to sleep at all.

That's exactly what ServerAliveInterval does.

A good target for this is your gateway.

Maybe, if your goal is just to keep the wifi up. Then this has nothing to do with termux.
If you just ping your gateway, how then, do you know if the connection behind the gateway is down? Your gateway should already be sending probes anyway. If your device is ignoring these, or sleeping the radio, maybe you should try to solve that issue, instead of devising workarounds?

Can't you just turn on the setting under your Wifi that keeps it on during sleep?

Alternatively... Settings -> Power -> ... menu -> Power/Battery Optimization

Look for "Google Connectivity Services" or similar, and disable power optimization.

Enjoy the battery drain.

As an alternative, you should try out mosh: https://mosh.org/

Or you can use something like tasker to disable power optimization as-needed on demand.

the ServerAliveInterval needs an 'active' client connection to be effective.

But the lag-problems also occur 'before' any client connection exists.

That's why ServerAliveInterval does not help for this issue.

I made some more experiments. The results show that Android 6.0.1 on my LG K3 LTE is unsuitable providing network services unless you intervene with ugly workarounds (ping-daemon 's_ping', see above).

Android version: 6.0.1
Device manufacturer: LGE
Device model: LG-K100

common to all my tests: the screen is kept on all the time (i.e. screen timeout deactivated)


Test1:

  • start termux
  • start sshd
  • check there currently is no heavy use of the WLAN connection
  • ping the phone from an external host:

$ ping -i 10 192.168.143.119 # issued against the phone from somewhere in the network
PING 192.168.143.119 (192.168.143.119) 56(84) bytes of data.
64 bytes from 192.168.143.119: icmp_seq=1 ttl=63 time=357 ms
64 bytes from 192.168.143.119: icmp_seq=2 ttl=63 time=10634 ms
64 bytes from 192.168.143.119: icmp_seq=3 ttl=63 time=395 ms
64 bytes from 192.168.143.119: icmp_seq=4 ttl=63 time=215 ms
64 bytes from 192.168.143.119: icmp_seq=5 ttl=63 time=35.1 ms
64 bytes from 192.168.143.119: icmp_seq=6 ttl=63 time=65.1 ms
64 bytes from 192.168.143.119: icmp_seq=7 ttl=63 time=10324 ms
64 bytes from 192.168.143.119: icmp_seq=8 ttl=63 time=189 ms
64 bytes from 192.168.143.119: icmp_seq=9 ttl=63 time=218 ms
64 bytes from 192.168.143.119: icmp_seq=10 ttl=63 time=38.2 ms
64 bytes from 192.168.143.119: icmp_seq=11 ttl=63 time=77.9 ms
64 bytes from 192.168.143.119: icmp_seq=12 ttl=63 time=11348 ms
64 bytes from 192.168.143.119: icmp_seq=13 ttl=63 time=1213 ms
64 bytes from 192.168.143.119: icmp_seq=14 ttl=63 time=214 ms
64 bytes from 192.168.143.119: icmp_seq=15 ttl=63 time=34.0 ms
64 bytes from 192.168.143.119: icmp_seq=16 ttl=63 time=264 ms
64 bytes from 192.168.143.119: icmp_seq=17 ttl=63 time=10127 ms
64 bytes from 192.168.143.119: icmp_seq=18 ttl=63 time=119 ms
64 bytes from 192.168.143.119: icmp_seq=19 ttl=63 time=147 ms
64 bytes from 192.168.143.119: icmp_seq=20 ttl=63 time=2.46 ms
64 bytes from 192.168.143.119: icmp_seq=21 ttl=63 time=209 ms
64 bytes from 192.168.143.119: icmp_seq=22 ttl=63 time=28.8 ms
64 bytes from 192.168.143.119: icmp_seq=23 ttl=63 time=61.1 ms
64 bytes from 192.168.143.119: icmp_seq=24 ttl=63 time=83.8 ms
64 bytes from 192.168.143.119: icmp_seq=25 ttl=63 time=112 ms
64 bytes from 192.168.143.119: icmp_seq=26 ttl=63 time=12.0 ms
64 bytes from 192.168.143.119: icmp_seq=27 ttl=63 time=10210 ms
64 bytes from 192.168.143.119: icmp_seq=28 ttl=63 time=41.0 ms
64 bytes from 192.168.143.119: icmp_seq=29 ttl=63 time=262 ms
64 bytes from 192.168.143.119: icmp_seq=30 ttl=63 time=86.6 ms
64 bytes from 192.168.143.119: icmp_seq=31 ttl=63 time=117 ms
64 bytes from 192.168.143.119: icmp_seq=32 ttl=63 time=138 ms
64 bytes from 192.168.143.119: icmp_seq=33 ttl=63 time=178 ms
64 bytes from 192.168.143.119: icmp_seq=34 ttl=63 time=210 ms
64 bytes from 192.168.143.119: icmp_seq=35 ttl=63 time=40.4 ms
64 bytes from 192.168.143.119: icmp_seq=36 ttl=63 time=65.2 ms
64 bytes from 192.168.143.119: icmp_seq=37 ttl=63 time=10134 ms
64 bytes from 192.168.143.119: icmp_seq=38 ttl=63 time=85.1 ms
64 bytes from 192.168.143.119: icmp_seq=39 ttl=63 time=114 ms
64 bytes from 192.168.143.119: icmp_seq=40 ttl=63 time=136 ms

about every other 50s there is a response delay with length of about the ping interval time. When watching the ping screen output the long delay packet and the following
one appears almost at the same time. Refering to the above example icmp_seq=8 obviously triggers the response for icmp_seq=7 and icmp_seq=13 triggers the response for icmp_seq=12.

I don't know what the phone does every other 50s. Does Google scan the neighbourhood for WLAN accesspoints since the phone is in WLAN-client mode?

These delay numbers are absolutely unacceptable when you want to use ssh with the phone.


Test2:

  • start termux
  • start sshd
  • start ping on termux against your gateway with a sufficiently small interval time:

$ ping -W 10 -q -i 0.2 192.168.143.195 # issued on the phone (via a termux shell)

  • ping the phone from an external host:

$ ping 192.168.143.119 # issued against the phone from somewhere in the network
PING 192.168.143.119 (192.168.143.119) 56(84) bytes of data.
64 bytes from 192.168.143.119: icmp_seq=1 ttl=63 time=2.86 ms
64 bytes from 192.168.143.119: icmp_seq=2 ttl=63 time=2.79 ms
64 bytes from 192.168.143.119: icmp_seq=3 ttl=63 time=3.18 ms
64 bytes from 192.168.143.119: icmp_seq=4 ttl=63 time=3.10 ms
64 bytes from 192.168.143.119: icmp_seq=5 ttl=63 time=2.84 ms
64 bytes from 192.168.143.119: icmp_seq=6 ttl=63 time=2.95 ms
64 bytes from 192.168.143.119: icmp_seq=7 ttl=63 time=2.90 ms
64 bytes from 192.168.143.119: icmp_seq=8 ttl=63 time=2.84 ms
64 bytes from 192.168.143.119: icmp_seq=9 ttl=63 time=3.92 ms
64 bytes from 192.168.143.119: icmp_seq=10 ttl=63 time=2.83 ms
64 bytes from 192.168.143.119: icmp_seq=11 ttl=63 time=2.81 ms
64 bytes from 192.168.143.119: icmp_seq=12 ttl=63 time=2.87 ms
64 bytes from 192.168.143.119: icmp_seq=13 ttl=63 time=2.85 ms
64 bytes from 192.168.143.119: icmp_seq=14 ttl=63 time=2.47 ms
64 bytes from 192.168.143.119: icmp_seq=15 ttl=63 time=2.44 ms
64 bytes from 192.168.143.119: icmp_seq=16 ttl=63 time=2.85 ms
64 bytes from 192.168.143.119: icmp_seq=17 ttl=63 time=2.88 ms
64 bytes from 192.168.143.119: icmp_seq=18 ttl=63 time=2.86 ms
64 bytes from 192.168.143.119: icmp_seq=19 ttl=63 time=2.79 ms
64 bytes from 192.168.143.119: icmp_seq=20 ttl=63 time=717 ms
64 bytes from 192.168.143.119: icmp_seq=21 ttl=63 time=5.70 ms
64 bytes from 192.168.143.119: icmp_seq=22 ttl=63 time=2.47 ms
64 bytes from 192.168.143.119: icmp_seq=23 ttl=63 time=2.44 ms
64 bytes from 192.168.143.119: icmp_seq=24 ttl=63 time=2.56 ms
64 bytes from 192.168.143.119: icmp_seq=25 ttl=63 time=21.6 ms
64 bytes from 192.168.143.119: icmp_seq=26 ttl=63 time=2.50 ms
64 bytes from 192.168.143.119: icmp_seq=27 ttl=63 time=16.3 ms
64 bytes from 192.168.143.119: icmp_seq=28 ttl=63 time=2.65 ms
64 bytes from 192.168.143.119: icmp_seq=29 ttl=63 time=2.56 ms
64 bytes from 192.168.143.119: icmp_seq=30 ttl=63 time=2.39 ms
64 bytes from 192.168.143.119: icmp_seq=31 ttl=63 time=3.77 ms
64 bytes from 192.168.143.119: icmp_seq=32 ttl=63 time=4.51 ms
64 bytes from 192.168.143.119: icmp_seq=33 ttl=63 time=2.42 ms
64 bytes from 192.168.143.119: icmp_seq=34 ttl=63 time=2.52 ms
64 bytes from 192.168.143.119: icmp_seq=35 ttl=63 time=2.40 ms
64 bytes from 192.168.143.119: icmp_seq=36 ttl=63 time=2.68 ms
64 bytes from 192.168.143.119: icmp_seq=37 ttl=63 time=2.50 ms
64 bytes from 192.168.143.119: icmp_seq=38 ttl=63 time=2.22 ms

what gives me dreamy response times compared to the above.
this clearly shows that pinging the gateway for me is an acceptable method to greatly reduce the lag.


Test3:

  • configure the phone to operate as hotspot
  • use your laptop as WLAN client and connect to the phone
  • start termux
  • start sshd
  • check there currently is no heavy use of the WLAN connection

$ ping -i10 192.168.43.1 # issued against the phone from the laptop
PING 192.168.43.1 (192.168.43.1) 56(84) bytes of data.
64 bytes from 192.168.43.1: icmp_seq=1 ttl=64 time=3.38 ms
64 bytes from 192.168.43.1: icmp_seq=2 ttl=64 time=2.58 ms
64 bytes from 192.168.43.1: icmp_seq=3 ttl=64 time=2.58 ms
64 bytes from 192.168.43.1: icmp_seq=4 ttl=64 time=4.00 ms
64 bytes from 192.168.43.1: icmp_seq=5 ttl=64 time=2.60 ms
64 bytes from 192.168.43.1: icmp_seq=6 ttl=64 time=7.36 ms
64 bytes from 192.168.43.1: icmp_seq=7 ttl=64 time=3.81 ms
64 bytes from 192.168.43.1: icmp_seq=8 ttl=64 time=5.37 ms
64 bytes from 192.168.43.1: icmp_seq=9 ttl=64 time=3.96 ms
64 bytes from 192.168.43.1: icmp_seq=10 ttl=64 time=3.80 ms
64 bytes from 192.168.43.1: icmp_seq=11 ttl=64 time=2.63 ms
64 bytes from 192.168.43.1: icmp_seq=12 ttl=64 time=2.60 ms
64 bytes from 192.168.43.1: icmp_seq=13 ttl=64 time=3.82 ms
64 bytes from 192.168.43.1: icmp_seq=14 ttl=64 time=6.62 ms
64 bytes from 192.168.43.1: icmp_seq=15 ttl=64 time=5.29 ms
64 bytes from 192.168.43.1: icmp_seq=16 ttl=64 time=3.70 ms
64 bytes from 192.168.43.1: icmp_seq=17 ttl=64 time=2.57 ms
64 bytes from 192.168.43.1: icmp_seq=18 ttl=64 time=11.0 ms
64 bytes from 192.168.43.1: icmp_seq=19 ttl=64 time=3.93 ms
64 bytes from 192.168.43.1: icmp_seq=20 ttl=64 time=2.66 ms
64 bytes from 192.168.43.1: icmp_seq=21 ttl=64 time=10.5 ms
64 bytes from 192.168.43.1: icmp_seq=22 ttl=64 time=3.80 ms
64 bytes from 192.168.43.1: icmp_seq=23 ttl=64 time=2.40 ms
64 bytes from 192.168.43.1: icmp_seq=24 ttl=64 time=18.4 ms
64 bytes from 192.168.43.1: icmp_seq=25 ttl=64 time=4.25 ms
64 bytes from 192.168.43.1: icmp_seq=26 ttl=64 time=2.54 ms
64 bytes from 192.168.43.1: icmp_seq=27 ttl=64 time=14.3 ms
64 bytes from 192.168.43.1: icmp_seq=28 ttl=64 time=3.99 ms

these response times also are excellent compared to the first example.
Even without the help of any ping-daemon workaround.

Probably because in AP-mode Google can't scan the neighbourhood for other accesspoints:-)


Conclusion:

hammering the gateway with echo request packets from the phone for me is the best known method to avoid the lag.

Should anybody know about a better workaround please let me know.

@sp4rkie

did you try the ssh setting as i suggested, though? if your complaint is about SSH performance, that should do what you need without relying on a ping daemon or scripts, etc...

sure, maybe your initial connection could be slow, if the radio is asleep, but once connected the keep alive should do the trick if you set it to below 50 seconds.

@gordol
thanks for the suggestion. You start with 'Alternatively... Settings -> Power ...' I can't find such settings. Maybe LG did mask them away? Is there needed some special application to expose those?

you're right the connection once being setup may be kept fast enough with serveraliveinterval. But when I run the test of the entry post (@dredmorbius provided) I get even worse numbers:

$ for i in {0..7}; do check-ssh-lag ; done
 0  -1   0   0   0  -1   1   0  -1   1  -1   0   0   0  -1   1  -1   0   1  -1
 0   0   0  -1   1  -1   0   0   0   0  -1   0  -1   1  -1   0   0   0  -1   1
-1   0   1   0   0   0   0  -1   0  -1   0   0   0  -1   0  -1   0  -1   0   0
 0  -1   0   0  -1   0  -1   1  63   0   0  -1   0   0   0  -1   0  -1   1  -1
 0   0   0   0  -1   0  -1   0   0   0  -1   0   1   0  -1   0   0   1   0  14
37   0   0  11   1   5   0   2   0  -1   0   2   3   2   9   3   1  10   1  -1
 0   0  -1   2  17   0   4  28   2   1  10   0   6   1   0   2   8   8   7   9

with 'ping-daemon' s_ping of https://github.com/termux/termux-packages/issues/1193#issuecomment-333140664 there is no noticeable delay when issueing the initial connect.

Last but not least all this probably can't be improved by termux nor is it a problem of termux I think.

@sp4rkie it may be settings -> battery or similar... what version of android are you on?

Version and model I already mentioned here: https://github.com/termux/termux-packages/issues/1193#issuecomment-333305098

I found 2 related options:

Settings->Battery&Power saving->Battery saver->Turn Battery saver on
(is disabled of course)
and
Settings->Wi-Fi->Advanced Wi-Fi->Keep-Wi-Fi on when screen is off
(is enabled of course)

there are no other interesting settings concerning the problem

Coming back to this: one though that occurs is that this somewhat resembles GC lag I've seen in Java VMs.

I don't know much about Android and its components, but I understand it _is_ a Java-based environment, running inside one (or more) VMs.

Could this be some Java housekeeping getting in the way of SSH / Termux activity, or other operations (networking, etc.)?

A long shot but I'll throw it out there.

Disabling Chrome browser in Android has significantly improved network speed and throughput. It appears that nonupdated and updated Android Chrome browsers made numerous calls to unknown Internet addresses such as: Host dswtcxjbvpzzt lookup failed: Host not found. Host pqafnctfbeao lookup failed: Host not found. Host adtqifgasnurhbu lookup failed: Host not found. when Chrome browsers were enabled.

What is more interesting, is that these calls to unknown Internet addresses were happening sporadically, making their source more difficult to diagnose. This was probably due to finding and making connections to rouge Internet sites, instead of incomplete connections which are duely noted and shared.

Disabling Chrome browser in Android has significantly improved network speed and throughput for each and every application that uses network access. This might be the cause of, "Problem: Laggy SSH. Noted particulary over local LAN WiFi connections".

@sdrausty How did you determine this?

@dredmorbius in Termux by looking at network traffic while at same time disabling and enabling apps on various devices. It was tedious time-consuming work.

Chrome browser does not cause SSH lag. This is due to wifi power saving feature which is enabled by default in kernel. SSH lag only when device screen is off.

There are 2 workarounds:

  1. Enable wifi hotspot.
  2. Continuous ping of remote host (for example 192.168.1.1).
    I wrote script for that:
#!/data/data/com.termux/files/usr/bin/sh
##
##  Start network wakelock
##

if [ -z "${1}" ] || [ "${1}" = "enable" ] || [ "${1}" = "on" ]; then
    PID=$(pgrep -u "${TERMUX_UID}" -f "abduco -n netping ping")

    if [ -z "${PID}" ]; then
        if abduco -n netping ping -i 0.3 -s 0 192.168.1.1 > /dev/null 2>&1; then
            echo "[*] Starting network wakelock."
        else
            echo "[!] Failed to start network wakelock."
            exit 1
        fi
    else
        echo "[!] Network wakelock already running."
        exit 1
    fi
elif [ "${1}" = "disable" ] || [ "${1}" = "off" ]; then
    PID=$(pgrep -u "${TERMUX_UID}" -f "abduco -n netping ping")

    if [ ! -z "${PID}" ]; then
        if kill -TERM "${PID}" > /dev/null 2>&1; then
            echo "[*] Network wakelock stopped."
            exit 0
        else
            echo "[!] Failed to stop network wakelock."
            exit 1
        fi
    else
        echo "[!] Network wakelock already stopped."
        exit 1
    fi
else
    echo
    echo " Usage: net-wakelock [enable|disable]"
    echo
    exit 0
fi

@xeffyr Thank you for suggesting these workarounds.

Chrome browser does not cause SSH lag.

There can be more than one reason for this issue, "Problem: Laggy SSH. Noted particulary over local LAN WiFi connections". Are you still 100% sure that your statement is correct?

Disabling Chrome browser in Android has significantly improved network speed and throughput for each and every application that uses network access.

Months of meticulous work with Termux on devices have made this result available. The simplest way to confirm this statement seems to be to forgo the use of Chrome browser on devices to see whether an overall surge in speed and throughput occurs when accessing the Internet through LAN WiFi connections.

Are you still 100% sure that your statement is correct?

Yes, wifi speed is limited when screen is off on kernel side.

About GoogleChrome's requests to invalid hostnames like pqafnctfbeao. This is needed to determine if your ISP is hijacking "DNS Name not found" results and this consume very small amount of traffic.

this consume very small amount of traffic.

Time and throughput consumption was causing diagnostics, and ultimately drove these results here to this issue.

when screen is off on kernel side.

Can termux-wake-lock help keep SSH awake? Should termux-wake-lock be expand with options?

About GoogleChrome's requests to invalid hostnames like pqafnctfbeao. This is needed to determine if your ISP is hijacking

Thank you for this very interesting information! Do you have any links you can share regarding these Chrome browser lookups?

termux-wake-lock will not help here. Wifi power management can be disabled only in kernel compile time.
I have tried to disable it with iwconfig wlan0 power off but after few seconds Android sets it to on so it have to be permanently disabled but will increase battery drain.

To read more about Chrome lookups, read this: https://serverfault.com/questions/235307/unusual-head-requests-to-nonsense-urls-from-chrome and this: https://groups.google.com/a/chromium.org/forum/#!topic/chromium-discuss/F70-k_PGhEg

Another possible source of SSH lags is packet loss which causing tcp retransmissions. Use iperf3 to test your LAN.

@xeffyr:

SSH lag only when device screen is off.

That's not consistent with my observations, see initial discussion above.

@dredmorbius, i don't have lag when screen is on.

I'm receiving lags only when screen is off. If I execute as root command iwconfig wlan0 power off then no lags even if screen off, but after some time Android reenables wifi power management and lags appears again.

As I wrote before, problem coming from drivers (kernel) (and not from Java, Google Chrome or such) and I guess that this was done to save battery. Moreover, this problem affects not only SSH from Termux and ConnectBot but also many other networking programs.
@sp4rkie did tests with ping which confirm this:

...
64 bytes from 192.168.143.119: icmp_seq=24 ttl=63 time=2.56 ms
64 bytes from 192.168.143.119: icmp_seq=25 ttl=63 time=21.6 ms
64 bytes from 192.168.143.119: icmp_seq=26 ttl=63 time=2.50 ms
64 bytes from 192.168.143.119: icmp_seq=27 ttl=63 time=16.3 ms
64 bytes from 192.168.143.119: icmp_seq=28 ttl=63 time=2.65 ms
64 bytes from 192.168.143.119: icmp_seq=29 ttl=63 time=2.56 ms
...

Latency jumps from 2.5 to over 16 ms is not normal.

I've just verified my test results from above. I can reproduce everything I said. What I forgot to mention:

common to all my tests is: the screen is kept on all the time (i.e. screen timeout deactivated)
but the lag nevertheless strikes unless my pinging workaround is activated.

For me it appears that the Android device simply is not ment to operate as a 'server'. Unless you configure it to work as an hotspot. It's as simple as that.

My personal workaround is connecting to it via ADB over USB. When I'm on the computer my phone is almost always plugged in anyway. Over ADB there is no lag and the SSH connection is much more stable over USB anyway.
I've written a guide on how to set this up: https://glow.li/technology/2016/9/20/access-termux-via-usb/

@Neo-Oli Understanding that, this doesn't match my experience, as documented above, showing significant SSH lag whilst I was directly interacting with the Android tablet.

That is a large part of what makes this issue so significant: intermittent lag whilst I'm _not_ directly interacting with the tablet, and, say, doing background data or file transfers, really doesn't much concern me. 20 second lapses whilst engaged in an SSH session, file editing, etc., etc., does.

@Neo-Oli USB - ADB forward - SSHD is a good way, very low lag, I have been using it.

But few mins after screen off the ssh connection will be lost, but adb devices still get right serial and adb shell also works! the sshd process is alive and the port is up, but can not get connect to.

Until I lighten screen, the ssh connection back. before that I have run termux-wake-lock.
I don't know why, seems some security rules limit the network?

and @dredmorbius , if you got so bad lag with usb-adb-sshd, you could try and take the result here by only adb connection, adb shell and then ping LOCALNETWORK-IP.
as my exp, usb-adb-sshd nearly no lag, and same to usb-adb.

Timings requested (summary && statistics).

This is not necessarily "doze mode" always, on my android if I ping constantly in local network my router I get pings ~80ms all the way to the 500ms, usually around ~200ms. Lot's of fluctuation.

But if I ping external IP I get normal stable pings, e.g. ~30ms. Which is kind of funny that local network is slower than external.

This is not necessarily "doze mode" always

It is a wifi powersaving mode, thing implemented purely in related kernel module. It also works independently of doze mode. At least, the "wifi wakelock" has no effect on it.

ping constantly in local network my router I get pings ~80ms all the way to the 500ms
But if I ping external IP I get normal stable pings, e.g. ~30ms
Which is kind of funny that local network is slower than external

Ping isn't a tool for measuring network performance.
80 ms ping to router in LAN indicates some problems with it. Should be less than 1-2 ms.

$ ping -c 3 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.08 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.892 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=1.05 ms

--- 192.168.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 17ms
rtt min/avg/max/mdev = 0.892/1.007/1.078/0.086 ms

Since this issue has reawakened and I'm the original reporter: I've not noted this issue for quite some time, probably a year or more.

The major non-Android change of which I'm aware is that the systems I'm remotely connecting to all are now wired connections themselves. I'm also making somewhat less intensive use of SSH connections (though I still use it fairly frequently) due to failure of the Android's bluetooth keyboard. And, come to think, there's another wireless-only system that doesn't seem to exhibit the behaviours, though I'm not connecting to it frequently.

But otherwise, SSH sessions tend to be highly responsive.

Not sure what specifically resolved this, though I'm not complaining.

I'm going to leave the issue open for now, though am considering closing it. Likely will do that myself in a month (or when I get around to it).

Don't close it, it's still happening, although it seems to depend on the ROM and environment.

OK, I'll leave it.

-- Edward Morbius ([email protected]) Dr. Edward Morbius's Lair of the Id https://dredmorbius.reddit.com

Sent from ProtonMail mobile

-------- Original Message --------
On Jul 8, 2019, 00:56, haarp wrote:

Don't close it, it's still happening, although it seems to depend on the ROM and environment.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

In my case, ssh (or mosh) works perfectly fine while the screen is on, no matter if termux is in the background or if the screen is locked. But a few seconds after I turn it off termux stops responding at all until I turn the screen on again (although it is still running). Weirdly, I can ping my phone just fine, although while the screen is off I get wildly fluctuating response times:

47 packets transmitted, 47 received, 0% packet loss, time 46045ms
rtt min/avg/max/mdev = 7.012/328.051/763.962/203.358 ms

instead of ~70ms while it is on:

57 packets transmitted, 57 received, 0% packet loss, time 56059ms
rtt min/avg/max/mdev = 5.229/67.074/180.247/35.941 ms

Pinging the gateway or any other external ip like sp4rkie recommended does not help while the screen is off because it will start spitting out ping: sendmsg: operation not permitted after a few seconds. It does improve the ping times while the screen if on though:

107 packets transmitted, 107 received, 0% packet loss, time 106110ms
rtt min/avg/max/mdev = 3.141/8.004/26.010/3.107 ms

The longer ping times and "operation not permitted" messages seem to coincide with the mosh connection being temporarily disconnected.
I also tried connecting using a usb hub and lan cable instead, which works perfectly fine while the screen is turned off:

241 packets transmitted, 241 received, 0% packet loss, time 240260ms
rtt min/avg/max/mdev = 0.869/1.757/8.254/0.655 ms

So the culprit does seem to be the wifi power management. I wonder how it decides to enable this power saving mode? Downloading large files with wget never fails. Also, if messengers can manage to receive messages almost instantly while the screen is off, is it really impossible to keep an ssh connection alive?


This is on a Mate 20 Pro, pings with my PC connected via LAN

$ termux-info
/data/data/com.termux/files/usr/bin/termux-info: 23: /data/data/com.termux/files/usr/bin/termux-info: awk: not found
Updatable packages:
All packages up to date
Subscribed repositories:
System information:
Linux localhost 4.9.148 #1 SMP PREEMPT Tue Jun 4 19:48:38 CST 2019 aarch64 Android
Termux-packages arch:
aarch64
Android version:
9
Device manufacturer:
HUAWEI
Device model:
LYA-L29

I wonder how it decides to enable this power saving mode?

It is being enabled automatically when screen is off or data transmission (outgoing) is ended. Just effective way to save battery, especially when Android smartphones aren't intended to be used as servers from manufacturer's view point.

Note that "wifi tethering" is exception here and when it is on, the power saving is disabled.

Also, if messengers can manage to receive messages almost instantly while the screen is off, is it really impossible to keep an ssh connection alive?

High ping (and even packet loss) rate doesn't mean that network is unusable. Messengers don't send data continuously unlike SSH and don't suffer from "laggy connection" issue.

But a few seconds after I turn it off termux stops responding at all until I turn the screen on again (although it is still running).

you tried to except termux from the 'power saving feature'?

on Android 8: Optimize battery usage: "Not optimized" for termux
on Android 9: Power saving feature: "Excepted" for termux

for other apps (e.g. K-9 Mail) it's essential (at least for my devices) to configure this to keep IMAP push connections active after shutting the screen down

on Android 9: Power saving feature: "Excepted" for termux

It's called manage automatically/manually for me (Settings > Battery > App launch). I've set everything to manual for termux (Auto-launch, Secondary launch and Run in background).

A latency difference between wifi powersaving enabled and disabled. Used iwconfig to control this setting.

  1. With power saving:
[xeffyr]:~:$ ping -c 10 192.168.1.155
PING 192.168.1.155 (192.168.1.155) 56(84) bytes of data.
64 bytes from 192.168.1.155: icmp_seq=1 ttl=64 time=71.8 ms
64 bytes from 192.168.1.155: icmp_seq=2 ttl=64 time=913 ms
64 bytes from 192.168.1.155: icmp_seq=3 ttl=64 time=3.49 ms
64 bytes from 192.168.1.155: icmp_seq=4 ttl=64 time=755 ms
64 bytes from 192.168.1.155: icmp_seq=5 ttl=64 time=677 ms
64 bytes from 192.168.1.155: icmp_seq=6 ttl=64 time=601 ms
64 bytes from 192.168.1.155: icmp_seq=7 ttl=64 time=521 ms
64 bytes from 192.168.1.155: icmp_seq=8 ttl=64 time=442 ms
64 bytes from 192.168.1.155: icmp_seq=9 ttl=64 time=365 ms
64 bytes from 192.168.1.155: icmp_seq=10 ttl=64 time=284 ms

--- 192.168.1.155 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 11ms
rtt min/avg/max/mdev = 3.488/463.390/912.965/276.067 ms
  1. Without powersaving (executed as root iwconfig wlan0 power off in Termux):
[xeffyr]:~:$ ping -c 10 192.168.1.155
PING 192.168.1.155 (192.168.1.155) 56(84) bytes of data.
64 bytes from 192.168.1.155: icmp_seq=1 ttl=64 time=3.88 ms
64 bytes from 192.168.1.155: icmp_seq=2 ttl=64 time=4.01 ms
64 bytes from 192.168.1.155: icmp_seq=3 ttl=64 time=4.11 ms
64 bytes from 192.168.1.155: icmp_seq=4 ttl=64 time=3.92 ms
64 bytes from 192.168.1.155: icmp_seq=5 ttl=64 time=3.73 ms
64 bytes from 192.168.1.155: icmp_seq=6 ttl=64 time=4.08 ms
64 bytes from 192.168.1.155: icmp_seq=7 ttl=64 time=3.96 ms
64 bytes from 192.168.1.155: icmp_seq=8 ttl=64 time=4.71 ms
64 bytes from 192.168.1.155: icmp_seq=9 ttl=64 time=4.36 ms
64 bytes from 192.168.1.155: icmp_seq=10 ttl=64 time=4.11 ms

--- 192.168.1.155 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 21ms
rtt min/avg/max/mdev = 3.727/4.086/4.712/0.267 ms

Doze mode and wifi powersaving are completely independent. The last thing is applied regardless of any kind of "battery optimizations".

With Android Pie (AOSP/GSI), the following temporarily solves my performance issues:

$ termux-wake-lock; su -c 'iw wlan0 set power_save on; iw wlan0 set power_save off'

Caveat, this requires root and must be run again each time power state changes on the phone.

It should be possible to trigger this in an intent receiver for android.intent.action.SCREEN_ON and android.intent.action.SCREEN_OFF actions.

I can use SSH for match to my computer , but i can't if i wont match my computer with my phone .

for lag-free access via ssh it proved to be useful for some devices to switch the screen on.

To switch the screen from within termux I use the command

termux-telephony-call <fake-number-always-returning-busy>

Is there a more elegant way (i.e. another CLI command) to switch the screen on with termux?

@xeffyr -- thank you for the update! Has this issue been resolved by a particular pull request?

The last thing possible to do from Termux side is https://github.com/termux/termux-app/pull/1216.

Was this page helpful?
5 / 5 - 1 ratings

Related issues

newmania picture newmania  Â·  3Comments

ILadis picture ILadis  Â·  3Comments

bbtdev picture bbtdev  Â·  3Comments

neitsab picture neitsab  Â·  3Comments

zejji picture zejji  Â·  4Comments