Zig: A plan for std.net

Created on 22 Nov 2020 · 11Comments · Source: ziglang/zig

In #7124 a few people chimed in to ask for something more in std.net.
This ticket is meant to organize all the requests/ideas and hopefully turn them into code by the time 0.8 rolls around.

Now that network streams (net.Stream) are decoupled from files we can finally start adding more socket-specific APIs and reduce the amount of footguns linked to the use of sockets (did I really call stat on a socket? sure I did).

The Stream type is a thi(iiiiiiiii)n layer over the OS socket type, there's no guarantee socket handles are valid file handles and vice-versa (eg. that's the case for Winsocket handles) and there's no knowledge of the transport stream type (TCP, UDP, Unix...).
The latter is still up for discussion but the two alternatives I thought of, having separate *Stream type or having a tagged union type, are IMO not so nice from an API ergonomic point of view.
The Stream type is also stateless, it has no knowledge of the connection state, to keep the bookkeeping overhead as small as possible and avoid getting that piece of info out of sync with the kernel.

Right now the idea is to add:

More constructors (the Host variant takes a string, the suffix-less one a address-port pair):
- connectTcpHost (replaces tcpConnectToHost)
- connectTcp (replaces tcpConnectToAddress)
- connectUdp (complements connectTcp)
  - Do we need a connectUdpHost ?
- connectUnixHost (complements connectTcpHost)
- Support connection timeouts in blocking mode!
- The tentative prototype is:
  
  zig fn (destination: <[]const u8|Address>, options: struct { reuse_port: bool = false, reuse_address: bool = false, timeout: ?<some type> = null, })
More property getters/setters:
- nodelay
- keepalive
- linger
- nonblocking (?)
- Read/write timeouts (SO_RCVTIMEO and SO_SNDTIMEO)
A way to query the local address (getsockname)
- fn getLocalAddress(self) Address
A way to query the remote address (socket_getpeername)
- fn getRemoteAddress(self) Address
A nice wrapper around shutdown()
- fn shutdown(enum { read, write, both })

Open questions:

UDP streams want recvmsg/sendmsg and/or recvfrom/sendto)
Some operations make sense only for some socket types (see point above), can we really get away with a single stream type?
Peek operation. Is that useful?

cc @Aransentin @frmdstryr @MasterQ32

standard library

Source

LemonBoy

Most helpful comment

@frmdstryr

why having a stream api for a packet semantic ?
why should zig provide a udp stream api using writers, whereas the udp is
not a stream ? event if it "works"

i think the underlying reasoning is different (you can lose a packet)

Le lun. 23 nov. 2020 à 09:29, LemonBoy notifications@github.com a écrit :

Before I forget, it also needs to be able to use send with MSG_NOSIGNAL
instead of write on linux which should fix #5614
https://github.com/ziglang/zig/issues/5614 and #6590
https://github.com/ziglang/zig/issues/6590

Sure thing, using recv/send instead of read/write should do the trick
here.

Unless you write to a buffered writer which wraps the UDP writer. Then
flushing the buffered writer sends a single packet (as long as the
buffer doesn't overflow). This is what I did for websockets and "it
works" tm .

Emphasis mine, you have to be extremely cautious with your usage of the
buffered stream. For this use-case you're better off with a
FixedBufferStream (or a LinearFifo) where you assemble the packets and
then write them out in a controlled fashion.
mq32 is right, the stream interface is really easy to misuse.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ziglang/zig/issues/7194#issuecomment-732008652, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AA5FV34XD5QSG4CDT5NNOELSRIMP3ANCNFSM4T6MOUMA
.

frett27 on 23 Nov 2020

👍2

All 11 comments

UDP streams want recvmsg/sendmsg and/or recvfrom/sendto)

UDP isn't stream nor connection based. connectUdpHost and connectUdp can be quite useful stil, but providing stream types for UDP doesn't make much sense as a UDP socket can receive from multiple peers at the same time.

I think it's better to separate TCP Streams (and provide .inStream() and .outStream() on them) and UDP sockets (that do provide .receiveFrom() and .sendTo() as well as receive/send for connected UDP sockets

Unix sockets can both be stream- and packet oriented and should use either abstraction depending on what the user wants (so wrapping unix sockets into both Stream and UdpSock (or whatever))

Also a tcp listener has different options than a client, so what about (using a bit .NET terminology at is fitting):

/// This is the basic socket type, guaranteed to be usable as a OsSocketHandle for APIs like poll and such
const Socket = extern struct {
    socket_fd: OsSocketHandle,
};

/// Provides a network endpoint, made of a address (v4, v6) and port or unix socket
const EndPoint = struct { // "wrapper" over sockaddr

};

const TcpListener = struct { // could also be called StreamListener
    socket: Socket,

    fn listen(options: struct {
        bind_address: ?EndPoint,
        backlog: ?usize, // maybe not necessary
        reuse_address: ?bool,
        reuse_port: ?bool, // one of them isn't available on windows, the other one is on-by-default
        …,
    }) !Self;

    fn listenUnix(…) !Self; // only available on unixy systems

    fn close(self: *Self) void;

    fn accept(self: Self) !TcpClient;

    … // here be functions like getLocalEndpoint()
};

const TcpClient = struct { // could also be called StreamClient
    socket: Socket,

    const ConnectOptions = struct {
        … // here be generic stream socket options
    };

    fn connectTo(addr: EndPoint, options: Options) !Self;

    /// `host_name` is the host or ip in text form
    /// `service` is either the port number (integer) or a service name (see SRV record)
    fn connectToHost(host_name: []const u8, service: []const u8, options: Options) !Self;

    fn close(self: *Self);

    fn inStream(self: Self) Reader;
    fn outSteam(self: Self) Writer;

    … // here be functions like getLocalEndpoint, getRemoteEndpoint()
};

const UdpSocket = struct { // or DatagramSocket
    socket: Socket,

    const ConnectOptions = struct {
        …, //
    };

    const BindOptions = struct { 
        …, //
    };

    // Same semantics as for TcpClient
    fn connectTo(addr: EndPoint, options: ConnectOptions) !Self;
    fn connectToHost(host: []const u8, service: []const u8, options: ConnectOptions) !Self;
    fn connectToUnix(…); !Self;

    // Creates and binds a new socket to the given addr.
    fn bindNew(addr: EndPoint, options: BindOptions) !Self;

    fn close(self: *Self);

    // Sends the given datagram to the connected endpoint (only allowed for connected sockets)
    fn send(datagram: []const u8) !void;

    // Receives a datagram from the connected endpoint (only allowed for connected sockets)
    fn receive(buffer: []u8) ![]u8;

    // Sends the datagram to the given target (allowed on both connected and unconnected sockets)
    fn sendTo(datagram: []const u8, target: EndPoint) !void;

    // Receives a datagram (allowed on both connected and unconnected sockets)
    fn receiveFrom(buffer: []u8) !struct { datagram: []u8, sender: EndPoint };

    // Joins a given multicast group
    fn joinMulticastGroup(group: Address) !void;
    fn leaveMulticastGroup(group: Address) !void; // afaik IP_DROP_MEMBERSHIP is not implemented on all OS
};

Additional things we should allow and encourage usage of:

Happy eyeballs (connect to ipv6 and ipv4 simultaneously, prefer v6 connection)
SRV records (provide different services on different machines via the same hostname)

Oh. That answer came out longer than i expected, but i hope the code example makes my vision clear. I don't think it's a good idea to abstract all features over the same socket management type, but split them into semantic options. Using the Socket type still allows casting between all different implementations

MasterQ32 on 22 Nov 2020

👍1

UDP isn't stream nor connection based.

the UDP interface to the kernel is optionally connection-based. (as your example shows for "connected" sockets).

It's often useful to additionally consider SCTP, as it has framing and streams. I imagine a useful abstraction would be a reader/writer that operates on a single frame, and then you "end" the frame (which e.g. sets MSG_EOR on the next write)

Also don't forget to consider extra data, not only for unix sockets (where you can have e.g file descriptors attached to a message), but also for TCP where you need to deal with the URG flag at the right point.

daurnimator on 22 Nov 2020

That answer came out longer than i expected, but i hope the code example makes my vision clear.

No problem, I opened this ticket to work out all the details.

UDP isn't stream nor connection based. connectUdpHost and connectUdp can be quite useful stil, but providing stream types for UDP doesn't make much sense as a UDP socket can receive from multiple peers at the same time.

As @daurnimator said you can connect an UDP socket and set the default destination address, hence the idea of having a reader/writer part for symmetry with the TCP counterpart. The sendto/recvfrom part was meant to address the connectionless part, where the caller specifies the remote address.

I think it's better to separate TCP Streams (and provide .inStream() and .outStream() on them) and UDP sockets (that do provide .receiveFrom() and .sendTo() as well as receive/send for connected UDP sockets

You mean reader() and writer() :stuck_out_tongue:
There's a lot of overlap between the two socket types, but I guess splitting the socket by type makes sense.
My only fear was that APIs wanting to use both socket types have to resort to anytype and lose the parameter type altogether.

Also a tcp listener has different options than a client

That's very close to the StreamListener we already have, no?

EndPoint

At the moment Address is a thin wrapper over every sockaddr type.

SRV records (provide different services on different machines via the same hostname)

Hmm, that can be part of the resolver interface, I don't see any reason to add an extra parameter (and have an extra lookup) everywhere as this is a pretty niche use case.

Happy eyeballs (connect to ipv6 and ipv4 simultaneously, prefer v6 connection)

:+1:

LemonBoy on 22 Nov 2020

You mean reader() and writer() stuck_out_tongue

True! I was refactoring some old code right now and messed up :grin:

As @daurnimator said you can connect an UDP socket and set the default destination address, hence the idea of having a reader/writer part for symmetry with the TCP counterpart.

having .writer() still doesn't make sense as UDP is not a stream:

try udp_sock.writer().print("{} doesn't make {}", .{ "this", "sense" });

will result in three packets being sent: "this", "doesn't make " and "sense" which might not be received in that order, or even completly. Providing a reader/writer abstraction does imply this though and users are tempted to do the very wrong thing here and assume that they send something that is both "one thing" and "ordered", but the other site might receive this:
"sense", "this", "sense", " doesn't make "

So we shouldn't provide an API that assumes order of writes are guaranteed to be received as such

MasterQ32 on 22 Nov 2020

Providing a reader/writer abstraction does imply this though and users are tempted to do the very wrong thing here and assume that they send something that is both "one thing" and "ordered"

Oh well, the user I had in mind was a bit smarter than that :) The use case I had in mind involve the careful use of Reader.read and Writer.write to send whole datagrams, but I see it's easy to misuse it (and after all we can simply provide a read and write method).

I agree on the proposed naming then: TCPListener, TCPClient and UDPSocket convey pretty well the fact they are different abstractions over a socket.

LemonBoy on 22 Nov 2020

👍1

Before I forget, it also needs to be able to use send with MSG_NOSIGNAL instead of write on linux which should fix https://github.com/ziglang/zig/issues/5614 and https://github.com/ziglang/zig/issues/6590

frmdstryr on 23 Nov 2020

having .writer() still doesn't make sense as UDP is not a stream: ... will result in three packets being sent: "this", "doesn't make " and "sens

Unless you write to a buffered writer which wraps the UDP writer. Then flushing the buffered writer sends a single packet (as long as the buffer doesn't overflow). This is what I did for websockets and "it works" :tm: .

frmdstryr on 23 Nov 2020

Before I forget, it also needs to be able to use send with MSG_NOSIGNAL instead of write on linux which should fix #5614 and #6590

Sure thing, using recv/send instead of read/write should do the trick here.

Unless you write to a buffered writer which wraps the UDP writer. Then flushing the buffered writer sends a single packet (as long as the buffer doesn't overflow). This is what I did for websockets and "it works" tm .

Emphasis mine, you have to be _extremely_ cautious with your usage of the buffered stream. For this use-case you're better off with a FixedBufferStream (or a LinearFifo) where you assemble the packets and then write them out in a controlled fashion.
mq32 is right, the stream interface is really easy to misuse.

LemonBoy on 23 Nov 2020

@frmdstryr

why having a stream api for a packet semantic ?
why should zig provide a udp stream api using writers, whereas the udp is
not a stream ? event if it "works"

i think the underlying reasoning is different (you can lose a packet)

Le lun. 23 nov. 2020 à 09:29, LemonBoy notifications@github.com a écrit :

Before I forget, it also needs to be able to use send with MSG_NOSIGNAL
instead of write on linux which should fix #5614
https://github.com/ziglang/zig/issues/5614 and #6590
https://github.com/ziglang/zig/issues/6590

Sure thing, using recv/send instead of read/write should do the trick
here.

Unless you write to a buffered writer which wraps the UDP writer. Then
flushing the buffered writer sends a single packet (as long as the
buffer doesn't overflow). This is what I did for websockets and "it
works" tm .

Emphasis mine, you have to be extremely cautious with your usage of the
buffered stream. For this use-case you're better off with a
FixedBufferStream (or a LinearFifo) where you assemble the packets and
then write them out in a controlled fashion.
mq32 is right, the stream interface is really easy to misuse.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ziglang/zig/issues/7194#issuecomment-732008652, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AA5FV34XD5QSG4CDT5NNOELSRIMP3ANCNFSM4T6MOUMA
.

frett27 on 23 Nov 2020

👍2

Second @frett27 here. TCP, UDP and Unix sockets have a fundamentally different set of behaviors and guarantees. Shoehorning the same stream-like interface into all of them would unnecessarily confuse the user. Anyone who desires a common interface between different socket types can already use file descriptors and read/write syscalls directly.

mrakh on 23 Nov 2020

Webockets are on top of TCP . The frames have a fin flag to handle a single user message split over multiple frames (which I just didn't implement yet). The end user (often) doesn't need to know or care how the framing works hence the reason for providing a stream that handles it for them.

I guess it doesn't matter either way... people will add it if it's not there and they want it or they can ignore it if it is.

frmdstryr on 23 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings