Status-react: Select mailservers based on ping

Created on 4 Nov 2019 · 54Comments · Source: status-im/status-react

User Story

Currently the mailservers from the fleet are selected randomly, which results in me getting connected to the Hong Kong mailservers almost every time when I'm in Europe and should be getting the european ones.

Description

This results in a really bad experience for users with slow sync times and bad latency.

Acceptance Criteria

The simplest solution would be to just do an ICMP ping to a few mailserver, group them into response time buckets, and pick a random one from the fastest bucket.

Notes

Example ping output:

 $ for DC in "do-ams3" "gc-us-central1-a" "ac-cn-hongkong-c"; do \
     ping -c1 mail-01.${DC}.eth.beta.statusim.net | grep 'time='; \
   done 
64 bytes from 206.189.243.162 (206.189.243.162): icmp_seq=1 ttl=55 time=29.1 ms
64 bytes from 210.19.188.35.bc.googleusercontent.com (35.188.19.210): icmp_seq=1 ttl=50 time=127 ms
64 bytes from 47.91.156.93 (47.91.156.93): icmp_seq=1 ttl=47 time=279 ms

As you can see for me from Poland the ranking of access time is simple:

Amsterdam - 29.1 ms
United States - 127 ms
Hong Kong - 279 ms

feature performance

Source

jakubgs

❤3

Most helpful comment

@jakubgs you asked about the UI, so this is how I imagine it would look like. Split into two lists, one for recommended and other for everything else, each mailserver lists response time with relative ranges like Fast - Average - Slow - No response. This would leave any ambiguity out. For kicks on the left is how it currently looks like 🤢

servers

errorists on 19 Nov 2019

❤2

All 54 comments

I'm not sure where this should be implemented though, status-go, or status-react. My first guess would be status-go, but I'm actually not sure where the code that selects the mailserver lives.

jakubgs on 4 Nov 2019

Thanks to Eric I know where currently the selection code lives:
https://github.com/status-im/status-react/blob/02ce0782b1fc773a4ad2a18749426ee2863d17cc/src/status_im/mailserver/core.cljs#L70-L83

jakubgs on 4 Nov 2019

This would even leave in status-protocol-go, in a separate package probably. We have two issues there:

adambabik on 4 Nov 2019

❤1

This must be the method that actually tells the node to connect to the mailserver:
https://github.com/status-im/status-react/blob/2e8e1a35add435c044d6f61f208da80386fa0d24/src/status_im/mailserver/core.cljs#L214-L233

jakubgs on 6 Nov 2019

According to Adam the call to get fastest mailservers should be an RCP call.

jakubgs on 6 Nov 2019

A request should send a list of mailservers to check, and possibly a timeout for the check, and return a list of objects with round trip time:

[
   { addr: "A", rtt: 123 },  // milliseconds
   { addr: "B", rtt: null }, // timeout
   { addr: "C", rtt: 554 },  // milliseconds
]

jakubgs on 6 Nov 2019

👍1

New RPC calls are exposed by adding methods here:
https://github.com/status-im/status-go/blob/develop/services/mailservers/api.go
For more details read:
https://github.com/status-im/status-go/blob/develop/services/mailservers/README.md

jakubgs on 6 Nov 2019

Example interface for ICMP probing from Adam:

package mailserver

type Prober interface {
    Probe() ([]node.ID, error)
}

type ICMPProber struct {
    items []node.ID
}

func NewICMPProber(addresses []string) *ICMPProber {
    return &ICMPProber{items: parseAddresses(addresses)}
}

type ProtocolProber struct { 
}

jakubgs on 6 Nov 2019

At first I thought ICMP will be simplest, but it appears it is not because sending ICMP packets requires creating a raw socket which require special permissions:

       EPERM  The user doesn't have permission to open raw sockets.  Only
              processes with an effective user ID of 0 or the CAP_NET_RAW
              attribute may do that.

And fails on Linux without sudo with:

 $ ./icmp_ping 
Prober {mail-01.do-ams3.eth.beta.statusim.net}
Err: listen ip4:icmp 0.0.0.0: socket: operation not permitted

__NOTE:__ This is true for Darwin as well.

So the most sensible thing to do will be to do a TCP SYN > SYN-ACK > RST handshake and time that.

jakubgs on 7 Nov 2019

This package could be used as a template to write such a TCP handshake ping:
https://github.com/tevino/tcp-shaker
Example command line tool test:

~/go/src/github.com/tevino/tcp-shaker/app/tcp-checker master*
 caspair > ./tcp-checker -c 3 -a mail-01.do-ams3.eth.beta.statusim.net:443

Checking mail-01.do-ams3.eth.beta.statusim.net:443 with the following configurations:
    Timeout: 1s
   Requests: 1
Concurrency: 3

Finished 1/1 checks in 31.271279ms
  Succeed: 1
  Errors: connect 0, timeout 0, other 0

One thing that worries me though is that it's only for Linux, and their non-Linux code is a dummy:
https://github.com/tevino/tcp-shaker/blob/master/checker_nonlinux.go
But I think the linux code should work fine on Android.

jakubgs on 7 Nov 2019

I tried reading the code of https://github.com/tevino/tcp-shaker but it's really obtuse. Syscalls verywhere, weird use of sync.Map and sync.Poll as well as a Linux epoll. Really hard to follow.

I could just use it, but epoll doesn't exist on Mac, so I don't know if that's gonna fly in Apple devices, I'd have to implement the same code but for darwin using probably something like kqueue to wait for the opened connection to close.

jakubgs on 7 Nov 2019

MacOS lacks syscall.SOL_TCP but its equivalent is syscall.IPPROTO_TCP.

There's also no TCP_QUICKACK. We could use TCP_NODELAY instead, but:

Turning on TCP_NODELAY has similar effects, but can make throughput worse for small writes. If you write a loop which sends just a few bytes (worst case, one byte) to a socket with "write()", and the Nagle algorithm is disabled with TCP_NODELAY, each write becomes one IP packet. This increases traffic by a factor of 40, with IP and TCP headers for each payload.
https://stackoverflow.com/questions/7286592/set-tcp-quickack-and-tcp-nodelay

But we don't intend to send any bytes, just close immediately, so it shouldn't have any effect.

jakubgs on 7 Nov 2019

When I do those two changes listed above the only build errors appear to be related to lack of epoll:

../../socket.go:63:12: undefined: syscall.EpollCreate1
../../socket.go:63:33: undefined: syscall.EPOLL_CLOEXEC
../../socket.go:74:12: undefined: syscall.EpollEvent
../../socket.go:75:17: undefined: syscall.EPOLLOUT
../../socket.go:75:36: undefined: syscall.EPOLLIN
../../socket.go:77:12: undefined: syscall.EpollCtl
../../socket.go:77:39: undefined: syscall.EPOLL_CTL_ADD
../../socket.go:85:34: undefined: syscall.EpollEvent

So in theory I could port this code to use kqueue and it might work.

jakubgs on 7 Nov 2019

This is a nice example of how Kqueue is used in Go:
https://gist.github.com/nbari/386af0fa667ae03daf3fbc80e3838ab0
Here's the Go's implementation:
https://golang.org/src/syscall/syscall_darwin.go
And a nice example of kqueue usage in C:
https://jmmv.dev/2004/10/example-of-kqueue.html

jakubgs on 7 Nov 2019

This might help me build a standalone binary for Android using Gomobile:

Apps written entirely in Go have a main function, and can be built with gomobile build, which directly produces runnable output for Android and iOS.
https://godoc.org/golang.org/x/mobile/app

This way I could verify if this TCP knocking will even work.

jakubgs on 7 Nov 2019

I made an example App using the x/mobile/app module:

package main

import (
    "context"
    "log"
    "time"

    "github.com/tevino/tcp-shaker"
    "golang.org/x/mobile/app"
)

var addresses = []string{
    "206.189.243.162:443",
    "35.188.19.210:443",
    "47.91.156.93:443",
}

func Ping(address string, timeout time.Duration) error {
    c := tcp.NewChecker()

    ctx, stopChecker := context.WithCancel(context.Background())
    defer stopChecker()
    go func() {
        if err := c.CheckingLoop(ctx); err != nil {
            log.Print("checking loop stopped due to fatal error: ", err)
        }
    }()

    log.Print("Pinging:", address)
    <-c.WaitReady()
    err := c.CheckAddr(address, timeout)
    switch err {
    case tcp.ErrTimeout:
        log.Print("Connect timed out")
    case nil:
        log.Print("Connect succeeded")
    default:
        log.Print("Error while connecting: ", err)
    }
    return nil
}

func main() {
    app.Main(appMain)
}

func appMain(a app.App) {
    log.SetPrefix("TCP_PING: ")
    var timeout time.Duration = 1000 * time.Millisecond
    var i int
    for i = 0; i < len(addresses); i++ {
        err := Ping(addresses[i], timeout)
        if err != nil {
            log.Print("Err:", err)
        }
    }
}

And it fails on Android with this:

GoLog   : TCP_PING: Pinging:206.189.243.162:443
GoLog   : TCP_PING: Error while connecting: permission denied
GoLog   : TCP_PING: Pinging:35.188.19.210:443
GoLog   : TCP_PING: Error while connecting: permission denied
GoLog   : TCP_PING: Pinging:47.91.156.93:443
GoLog   : TCP_PING: Error while connecting: permission denied

Which I guess suggests that Android blocks use of some of the socket flags that tcp-shaker uses.

jakubgs on 7 Nov 2019

If you look at the SocketOptions documentation for Android:
https://developer.android.com/reference/java/net/SocketOptions
There is no TCP_QUICKACK available, only TCP_NODELAY.

jakubgs on 7 Nov 2019

I tested the ICMP ping using golang.org/x/net/icmp in a simple app and got back:

GoLog   : ICMP_PING: Err:listen ip4:icmp 0.0.0.0: socket: permission denied

Similar thing if I try to use an UDP ping:

GoLog   : ICMP_PING: Err:socket: permission denied

Both fail on the icmp.ListenPacket() call, first with ip4:icmp and second with udp.

So it doesn't look like ICMP ping is something that can be done on Android without extra permissions.

jakubgs on 8 Nov 2019

I narrowed down the permission denied error on the TCP pinger to this call:

syscall.Socket(syscall.AF_INET, syscall.SOCK_STREAM, 0)

It seems like one of those flags is not allowed. But which one?

jakubgs on 8 Nov 2019

GLORIOUS SUCCESS!
I managed to make the permission denied error to go away by specifying a custom AndroidManifst.xml which contained the following permission:

<uses-permission android:name="android.permission.INTERNET" />

Looks like the default is not to give network access to an app.
But now it fails with tcp.ErrTimeout, but that's an improvement.

jakubgs on 8 Nov 2019

It doesn't seem like the adding of INTERNET permissions helps the ICMP test app:

GoLog   : ICMP_PING: Err:bind: permission denied
icmp_ping_test: type=1400 audit(0.0:916061): avc: \
    denied { node_bind } for scontext=u:r:untrusted_app_25:s0:c512,c768 \
    tcontext=u:object_r:node:s0 tclass=rawip_socket permissive=0

jakubgs on 8 Nov 2019

Interestingly enough TCP_QUICKACK is not listed in the Android socket docs:
https://developer.android.com/reference/java/net/SocketOptions
But it doesn't throw any errors when I set it so maybe it's just undocumented:

syscall.SetsockoptInt(fd, syscall.SOL_TCP, syscall.TCP_QUICKACK, 0)

jakubgs on 8 Nov 2019

The timeout happens even through the epoll that listens for the fd events has quite the range:

event.Events = syscall.EPOLLOUT | syscall.EPOLLIN | epollET

Where epollET is:

const epollET = 1 << 31 // 2147483648

Which shifts first bit 31 places left, and gives us 2147483648, which I assume is supposed to represent the EPOLLET flag, which is weird because it is available in the syscall package:

EPOLLET = -0x80000000 // -2147483648

Which is the negative of epollET, but I think the value of EPOLLET is dependent on architecture.

The EPOLLET flag means:

If the rfd file descriptor has been added to the epoll  interface  using  the  EPOLLET
(edge-triggered) flag, the call to epoll_wait(2) done in step 5 will probably hang de‐
spite the available data still present in the file input buffer; meanwhile the  remote
peer  might be expecting a response based on the data it already sent.  The reason for
this is that edge-triggered mode delivers events only when changes occur on the moni‐
tored  file  descriptor.   So, in step 5 the caller might end up waiting for some data
that is already present inside the input buffer.  In the above example, an  event  on
rfd  will  be generated because of the write done in 2 and the event is consumed in 3.
Since the read operation done in 4 does not consume the whole buffer data, the call to
epoll_wait(2) done in step 5 might block indefinitely.

An  application that employs the EPOLLET flag should use nonblocking file descriptors
to avoid having a blocking read or write starve a task that is handling multiple  file
descriptors.   The suggested way to use epoll as an edge-triggered (EPOLLET) interface
is as follows:

      i   with nonblocking file descriptors; and

      ii  by waiting for an event only after read(2) or write(2) return EAGAIN.

By contrast, when used as a level-triggered interface (the default,  when  EPOLLET  is
not  specified), epoll is simply a faster poll(2), and can be used wherever the latter
is used since it shares the same semantics.

jakubgs on 8 Nov 2019

I tried using syscall.EPOLLET but I get this during build:

socket_linux.go:85:52: constant -2147483643 overflows uint32

So I assume the local definition of epollET is some kind of workaround for that.

jakubgs on 8 Nov 2019

I don't get why the value of syscall.EPOLLET in Go is a negative number when the EpollEvent structure uses an unsigned uint32 for its flags:

type EpollEvent struct {
    Events uint32
    Fd     int32
    Pad    int32
}

https://golang.org/pkg/syscall/#EpollEvent
As if they wanted it to be unusable? Or is it simply a mistake?

jakubgs on 8 Nov 2019

Interestingly the syscall.EPOLL_NONBLOCK flag is not available:

socket_linux.go:90:62: undefined: syscall.EPOLL_NONBLOCK

jakubgs on 8 Nov 2019

Instead of relying on the pipe that the default tcp-shaker code uses I tried calling the syscall.EpollWait method myself:

events := make([]syscall.EpollEvent, 1)
nc, err := syscall.EpollWait(c.PollerFd(), events, int(timeout.Milliseconds()))

And what I get back is an event with Events attribute set to 4 which is EPOLLOUT, which means that the connection is ready for writing. Which means that this method can work, but for some reason doesn't trigger through reading of the pipe in the waitPipeTimeout() method:
https://github.com/tevino/tcp-shaker/blob/9f5b7a96d888d963841cdd52c87c72bef159dc91/checker_linux.go#L187-L194

jakubgs on 8 Nov 2019

Turns out that tcp-shaker uses EpollWait() itself:

func pollEvents(pollerFd int, timeout time.Duration) ([]event, error) {
    var timeoutMS = int(timeout.Nanoseconds() / 1000000)
    var epollEvents [maxEpollEvents]syscall.EpollEvent
    nEvents, err := syscall.EpollWait(pollerFd, epollEvents[:], timeoutMS)
    if err != nil {
        if err == syscall.EINTR {
            return nil, nil
        }
        return nil, os.NewSyscallError("epoll_wait", err)
    }

    var events = make([]event, 0, nEvents)

    for i := 0; i < nEvents; i++ {
        var fd = int(epollEvents[i].Fd)
        var evt = event{Fd: fd, Err: nil}

        errCode, err := syscall.GetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_ERROR)
        if err != nil {
            evt.Err = os.NewSyscallError("getsockopt", err)
        }
        if errCode != 0 {
            evt.Err = newErrConnect(errCode)
        }
        events = append(events, evt)
    }
    return events, nil
}

https://github.com/tevino/tcp-shaker/blob/9f5b7a96d888d963841cdd52c87c72bef159dc91/socket_linux.go#L83-L110

I need to debug why this call doesn't catch the socket being ready.

jakubgs on 8 Nov 2019

Okay, I can see what is failing on Android, and that's this call:

errCode, err := syscall.GetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_ERROR)

https://github.com/tevino/tcp-shaker/blob/9f5b7a96/socket_linux.go#L100
Which results in an error:

getsockopt: socket operation on non-socket

This seems to be caused by the fact that the file descriptor has the value of 0, which doesn't exist.

jakubgs on 8 Nov 2019

This wouldn't be an issue if the implementation just listened for one fd at a time, but tcp-shaker seem to use a sync.Map and a sync.Poll to juggle multiple pipes in the CheckingLoop(), since then I could assume the file descriptor number.

jakubgs on 8 Nov 2019

I made the simplest possible implementation of the TCP ping here:
https://github.com/status-im/infra-utils/blob/master/tcp-ping/main.go
And this implementation works just fine on my amd64 architecture:

 $ make amd64
GOARCH=amd64 go build -o tcp-ping-amd64 main.go 
 $ ./tcp-ping-amd64 
TCP_PING:2019/11/09 16:42:26 PINGING: 8.8.8.8:53
TCP_PING:2019/11/09 16:42:26 Socket FD: 3
TCP_PING:2019/11/09 16:42:26 Epoll FD: 4
TCP_PING:2019/11/09 16:42:26 EpollWait #: 1
TCP_PING:2019/11/09 16:42:26 SO_ERROR: 0
TCP_PING:2019/11/09 16:42:26 EpollEvent: {4 3 0}
TCP_PING:2019/11/09 16:42:26 SUCCESS

We can see that the EpollEvent returns on the 2nd position the socket file descriptor number 3.

But when I push the arm64 binary to my Android phone and run it:

 $ make arm64
GOARCH=arm64 go build -o tcp-ping-arm64 main.go
 $ adb push tcp-ping-arm64 /sdcard/
tcp-ping-arm64: 1 file pushed. 28.4 MB/s (2933445 bytes in 0.099s)
 $ adb shell
OnePlus6:/ $ cd /data
OnePlus6:/data $ su 
OnePlus6:/data # mv /sdcard/tcp-ping-arm64 ./                                                                                        
OnePlus6:/data # chmod +x tcp-ping-arm64
OnePlus6:/data # ./tcp-ping-arm64                                                                                                    
TCP_PING:2019/11/09 15:43:52 PINGING: 8.8.8.8:53
TCP_PING:2019/11/09 15:43:52 Socket FD: 3
TCP_PING:2019/11/09 15:43:53 Epoll FD: 4
TCP_PING:2019/11/09 15:43:53 EpollWait #: 1
TCP_PING:2019/11/09 15:43:53 SO_ERROR: 0
TCP_PING:2019/11/09 15:43:53 EpollEvent: {4 0 0}
TCP_PING:2019/11/09 15:43:53 SUCCESS

The EpollEvent I get back is clearly showing a value of 0, which makes no fucking sense.

The only thing I can think of now I to write the exact same thing in pure C and see if that works.

jakubgs on 9 Nov 2019

I have implemented essentially the same thing but in pure C:
https://github.com/status-im/infra-utils/blob/master/tcp-ping/main.c
And on Linux it returns the socket file descriptor as expected:

 $ make c-amd64
gcc main.c -o tcp-ping-amd64
 $ ./tcp-ping-amd64 
PINGING: 8.8.8.8:53
epoll event flags: 4
epoll event fd: 3
SUCCESS

And the same thing done on Android:

 $ make c-arm64
aarch64-linux-gnu-gcc -static -Werror main.c -o tcp-ping-arm64
/usr/lib/gcc-cross/aarch64-linux-gnu/9/../../../../aarch64-linux-gnu/bin/ld: /tmp/ccCHN6C9.o: in function `main':
main.c:(.text+0x80): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
 $ adb push tcp-ping-arm64 /sdcard/
tcp-ping-arm64: 1 file pushed. 22.7 MB/s (642032 bytes in 0.027s)
 $ adb shell
OnePlus6:/ $ su
OnePlus6:/ # cd /data
OnePlus6:/data # mv /sdcard/tcp-ping-arm64 ./
OnePlus6:/data # chmod +x tcp-ping-arm64
OnePlus6:/data # ./tcp-ping-arm64
PINGING: 8.8.8.8:53
epoll event flags: 4
epoll event fd: 3
SUCCESS

Which clearly shows that the issue is - as I suspected - with Go and how it handles epoll events.

jakubgs on 9 Nov 2019

I have found out that the syscall package is on a code freeze, and the the golang.org/x/sys/unix package contains the more up-to-date implementations.
I have modified the simple Go implementation to use x/sys/unix and it turns out that it works that way:

OnePlus6:/data # ./tcp-ping-arm64
TCP_PING:2019/11/09 17:05:05 PINGING: 8.8.8.8:53
TCP_PING:2019/11/09 17:05:05 Socket FD: 3
TCP_PING:2019/11/09 17:05:05 Epoll FD: 4
TCP_PING:2019/11/09 17:05:05 EpollWait #: 1
TCP_PING:2019/11/09 17:05:05 SO_ERROR: 0
TCP_PING:2019/11/09 17:05:05 EpollEvent: {4 0 3 0}
TCP_PING:2019/11/09 17:05:05 SUCCESS

This time around it does contain the 3 file descriptor number.
And surprisingly the Android EpollEvent has a bit different structure with 4 instead of 3 attributes:

type EpollEvent struct {
    Events uint32
    PadFd  int32
    Fd     int32
    Pad    int32
}

http://www.golang.ir/src/cmd/vendor/golang.org/x/sys/unix/ztypes_linux_arm64.go

Finally some progress!

jakubgs on 9 Nov 2019

This issue appears to be somehow related: https://github.com/golang/go/issues/32192
Tt talks about missing fields in the EpollEvent, but in x/sys/unix rather than syscall, which is how I found that it even existed.

jakubgs on 9 Nov 2019

I have opened an issue with the Go repo: https://github.com/golang/go/issues/35479
Maybe it will at least help someone else not waste hours researching the same thing as me.

jakubgs on 9 Nov 2019

Also created a PR for tcp-shaker to fix this: https://github.com/tevino/tcp-shaker/pull/18

jakubgs on 10 Nov 2019

I'm trying to port tcp-shaker to darwin but it appears that the lack of TCP_QUICKACK flag makes it impossible to send the RST right after SYN-ACK from the server.

On linux the packet exchange looks correct - SYN > SYN-ACK > RST:
linux_tcp_ping_000
But on MacOS the damned thing establishes the connection fully and only later closes:
darwin_tcp_ping_000
The sequence is fucked because MacOS ACKs the SYN-ACK and allows the connection to be open.

jakubgs on 10 Nov 2019

That's correct, if I don't set TCP_QUICKACK on Linux it sends the ACK too:
linux_tcp_ping_002_000
So I don't really see how I can make this work the same way on MacOS.

jakubgs on 10 Nov 2019

Well, apparently, if I call close() on the socket right after connect() I can get the RST I want:
darwin_tcp_ping_002_000
But that kinda makes the whole kqueue thing and kevent listening pointless, since the socket is already closed right after I've opened it.

jakubgs on 11 Nov 2019

It looks like as long as I call unix.Close() before I call unix.Kevent() to wait for a socket event it closes correctly with a RST right after SYN-ACK.

But it also means that unix.Kevent() times out and returns an event for a non-existent socket, so calling unix.GetsockoptInt() on it makes no sense.

jakubgs on 11 Nov 2019

The issue with closing the connection right after I open it is that currently I don't know how I can actually check if it succeeded or not. I'm probably missing something, and maybe some combination of flags for unix.GetsockoptInt() would allow me to verify what actually happened to the connection.

jakubgs on 11 Nov 2019

Okay, I managed to get it working, but not exactly. The key was using the right kevent filter:

eventFilter := unix.Kevent_t{
    Ident:  uint64(fd),
    Filter: unix.EVFILT_WRITE,
    Flags:  unix.EV_ADD | unix.EV_ONESHOT,
}

The implementation was done in: https://github.com/status-im/tcp-shaker/commit/804f4fa394c3241e95bea701aee8949ea17554a5 (support-darwin branch)

But the issue is that it doesn't send the RST packet right away, it first ACKs the SYN-ACK:
darwin_tcp_ping_003_001
Which means that the connection is established and closed 25 microseconds later, which means that whatever application is listening is going to have to handle a connection that is going to be closed right away. Without that 3rd packet with ACK the service never learns about the attempt, only kernel knows.

jakubgs on 11 Nov 2019

This article was very useful in figuring this shit out:
http://eradman.com/posts/kqueue-tcp.html

jakubgs on 11 Nov 2019

@adambabik in your opinion, would status-go be fine with a SYN > SYN-ACK > ACK > RST kind of TCP ping, or could that cause problems?

jakubgs on 11 Nov 2019

Here's an initial implementation of the TCP ping RCP call in status-go:
https://github.com/status-im/status-go/pull/1672
It needs some tweaks but it works:

$ curl -s localhost:8545 -H 'content-type: application/json' \
    -d '{"jsonrpc":"2.0","method":"mailservers_ping","params":[{"addresses":["1.1.1.1:53", "8.8.8.8:53", "1.1.1.1:231", "1.1.1.1:313121"], "timeoutMs": 500}],"id":1}'

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": [
    {
      "address": "1.1.1.1:313121",
      "latency": -1,
      "error": "unknown error: address 313121: invalid port"
    },
    {
      "address": "1.1.1.1:53",
      "latency": 7,
      "error": null
    },
    {
      "address": "8.8.8.8:53",
      "latency": 9,
      "error": null
    },
    {
      "address": "1.1.1.1:231",
      "latency": -1,
      "error": "tcp check timeout: I/O timeout"
    }
  ]
}

jakubgs on 13 Nov 2019

I've finally merged the status-go changes: https://github.com/status-im/status-go/pull/1672
Big kudos to Adam for helping me a lot with that PR.

@yenda you can see how the RPC call should be used here:
https://github.com/status-im/status-go/blob/develop/rtt/README.md

The call should be available as long as you use a version from v0.34.0-beta.9 an up. I'd appreciate if you could find some time to implement use of this RPC call in automatic mailserver selection.

@errorists what do you think about modifying the mailserver list to include some kind of indicator(color dot or something else) to show which mailservers are available/unavailable/fastest? This new call returns a list with RTTs(round trip times) or possible errors if they are unavailable, so we can give users some feedback on what works and what doesn't.

jakubgs on 15 Nov 2019

🚀1

as a side effect, all clients will be constantly broadcasting their IP to the list of available mailservers.

corpetty on 18 Nov 2019

I don't know about constantly. I guess only at login, and when a connection with current one fails, or when a user triggers a check. I don't think it should be periodic at all.

jakubgs on 18 Nov 2019

as long as at the moment of choosing, the client has relatively refreshed response times, otherwise they may make a decision on stale data, leading to poor UX

corpetty on 18 Nov 2019

this will become more necessary as the available choices move outside of our control and community based.

corpetty on 18 Nov 2019

True fact, but that would mean it's necessary only when user opens the mailserver list view, otherwise a refresh of that data is only necessary when a connection to previously used mailserver fails.

jakubgs on 18 Nov 2019

👍1

@jakubgs @corpetty the way I'm implementing it right now it uses the method whenever it needs to connect to a mailserver if the user didn't pin one already. This means that it does it at login, whenever going back online after being offline, and when removing a custom mailserver (if there is no selected mailserver). This will be superseeded soon by waku node anyway.