The following C-derived function names are not terribly Julian:
getaddrinfo
gethostname
getipaddr
getpid
getsockname
getpeername
I'm sure there's more, but I found these just by looking for things starting with get
. I think we should consider renaming these and/or refactoring these APIs. I'm sure there are also others, which people could post here as well.
is_apple, is_bsd, is_linux, is_unix, is_windows
Most is... functions in Base are spelled without the underscore. I would suggest spelling these without the underscore.
Put it like this: If the function has the same function, why should it have a different name?
Or: What problem is solved by this change?
The problem that would be solved is that these C-inherited names match neither the internally used nor outwardly recommended naming conventions for Julia. Perhaps one would expect a C-style name in Julia that matches its equivalent in C if the arguments and everything were the same, but that's rarely the case with Julia because we don't typically pass Ptr
around in user-facing APIs. Plus, as Stefan mentioned, we should consider refactoring these APIs to be more Julian; in doing so we may diverge from the C API, and proving a different API with the same name is a little surprising. But I think the most important point is that it's better to provide a consistent user experience with the language than to inconsistently rehash another language's APIs.
I have a slight preference for keeping C names for things that are sufficiently similar to C functions. But I have a large preference for moving these functions to other modules, like Sys and Socket (which doesn't exist yet).
This was specifically spurred by being on a call today which had a dozen top Julia contributors on it and no one could remember that the function to find out your hostname is gethostname
. Multiple people suggested hostname()
. I finally remembered that we'd borrowed the name from C. I've frequently gone through the same thought process with process ID where I think, pid()
, no wait, it's called getpid()
. The fact is that we just don't tend to prefix functions that return a value with "get". Now in the case of pids, we could just have a global constant instead, assigned at startup, but that's a different question.
Philosophically, I'm ok with mirroring an API suite that's really similar to another language 鈥撀爏ee the path and file APIs which are intentionally modeled on Python's (since Python had the most sane and comprehensive API for that sort of thing). But in this case, this is just a random smattering of C-derived names in the middle of APIs for networking and I/O that don't really resemble C much at all. I think that's why even people very familiar with C have a hard time remembering getpid
and gethostname
.
About 30 years ago, there was a lot of discussion in the POSIX community on how best to map the POSIX API, which was originally specified only for C, into other programming languages. A distinction was made between "thin bindings", which provide an API as close as possible to what C programmers are accustomed to, and "thick bindings", which try to make full use of the native facilities and conventions of the new language (names, types, packaging, error reporting, etc.), and basically end up rewriting the full POSIX standard into the new language. IEEE Std. 1003.5 (POSIX/Ada) is probably the most elaborate "thick binding" of POSIX ever written for a language other than C. That standard "contains a very detailed rationale explaining the reasoning and analysis that led to the final shape of the binding. The rationale discusses most of the issues that confront any binding developer, including packaging, documentation style (e.g., "thick" vs "thin") and tasking safety." Perhaps [1] is still worth a read (e.g. Julia's type facilities are much closer to Ada than to C)? But in the end, I believe "thick bindings" to POSIX never gained a huge market share. Many developers are already familiar with the C bindings, and will refer to the C man pages and POSIX spec for detailed semantics, because in practice hardly anyone ever writes documentation for non-C POSIX APIs that is nearly as thorough and detailed and authoritative as the one that already exists for C. I suspect that is why most other programming languages follow the C binding of the POSIX API quite closely: using the same (or very similar) identifiers for functions and their parameters makes it easier to refer to the C API documentation.
[1] IEEE Std 1003.5-1999, Appendix B.1.5: Level of Binding to B.1.9: Mapping C features to Ada, pp 552-563. http://ieeexplore.ieee.org/document/815314/
Except we're not even in the business of providing bindings to POSIX 鈥撀爓e're actually mostly providing bindings to libuv, which is a wildly different API with quite different names.
It would be good to compare and contrast what various languages call these functions. Presumably not everyone follows the C names, but perhaps most do.
This doesn't seem super urgent to me. For these kinds of functions, it would be easy to introduce a new API if we want, and deprecate the current names in the next major version after that.
My only issue with introducing a new API soon and waiting to deprecate the old one in the next major version is that for the 1.x series, we have two sets of names for the same things, which I think would be confusing.
Fair point, I just think these names do little enough harm that we can reasonably take them off our plate for now.
The relatively simple first work item here is look at what Python, Ruby and Perl call these. If they're what we call them, then we can just close the issue. If there's a lot of disagreement but one of those languages calls these something sane, then we should follow the sane one (often Python).
Python's socket
module seems to use the exact same names: https://docs.python.org/3/library/socket.html
Go's API is quite different: https://golang.org/pkg/net/
Research results:
| Julia | Python | Ruby | Perl |
| :---: | :---: | :---: | :---: |
| getaddrinfo
| socket.getaddrinfo
| Socket.getaddrinfo
| getaddrinfo
|
| gethostname
| socket.gethostname
| Socket.gethostname
| gethostname
|
| getipaddr
| socket.gethostbyname(socket.gethostname())
| Socket.ip_address_list
| Net::Address::IP::Local->public
|
| getpid
| os.getpid
| Process.pid
| $PID
|
| getsockname
| socket.getsockname
| getsockname
| getsockname
|
Rust uses the C names but from its libc
crate.
perl's getpid
seems to be $PID
, not getppid
.
Looks like $PID
and getppid
are different: https://perldoc.perl.org/functions/getppid.html
Right, and getppid
is the wrong one.
Okay, updated the table.
Let me make a proposal (subject to naming of course):
abstract type Host; end
abstract type NetworkEndpoint; end
struct IPEndpoint{A<:IPAddr} <: NetworkEndpoint
addr::A
port::UInt16
end
struct LocalHost <: Host; end
localhost = LocalHost()
gethostname()
-> hostname(localhost)
getaddrinfo(hostname)
-> resolve(IPAddr, hostname)
(can use IPv4 or IPv6 instead for specific protocol)getipaddr
-> ipaddr(localhost)
getsockname(sock)
-> endpoint(sock)::NetworkEndpoint
(ipaddr
and port
for accessors)getpeername(sock)
-> peer(sock)::NetworkEndpoint
getpid
-> pid
Looks good to me. I do especially like replacing the no-argument functions with explicitly operating on localhost
.
What is hostname(...)
mean if you don't pass the magic token?
I'd like it to be the generic function you use for asking the hostname. So hostname(::NetworkEndpoint)
might give the hostname of that endpoint (where possible), hostname(worker)
might give the hostname for a worker, etc.
What about having an endpoints
function that returns a pair of address structures, this side and the peer? That's one less function to remember, and I feel like endpoint
and peer
are going to be a little hard to remember. here, there = Net.endpoints(sock)
seems easy to remember.
One thing I don't really like about using a localhost
singleton like this is that this name is usually used to represent the loopback interface and not the host computer. The use of it in ipaddr(localhost)
and hostname(localhost)
can easily be confused with wanting to get information about the loopback interface, which are both valid questions and would be ip"127.0.0.1"
and "localhost"
respectively.
In another word, it can be pretty confusing to have resolve(IPAddr, "localhost")
and ipaddr(localhost)
return different results.
I do like the general idea though, as long as the singleton is not actually called localhost
.
We could use the name loopback
but that doesn't help much since it means the same thing.
I personally feel that the C names are more discoverable, since people know them. Of course, I would be ok with Julian names as well - in which case we need an assignee who can settle the design and get this done by feature freeze.
Even if we change these after 1.0, they are very easy to update with femtocleaner automatically.
This could be a case where we add a new API (non-breaking) whenever we get around to it, and just wait longer to deprecate these names. They're not too harmful.
Another simple possibility here would be to put these wrappers in a standard library package.
Both Python and Ruby have this kind of functionality in a standard namespace called "socket." Based on a quick search through the repo, I think this stuff is only used in Distributed, which is being moved to the stdlib as of #24443. So it seems quite sensible to me to put this stuff in a Socket module in stdlib and add a dependency on Socket in Distributed.
Let's just leave these here (they're not actively harmful) and we can introduce a nice shiny Sockets
standard library package in the future.
No need to abbreviate address
to addr
.
IPAddr
-> IPAddress
getipaddr
-> getipaddress
etc
Most helpful comment
No need to abbreviate
address
toaddr
.IPAddr
->IPAddress
getipaddr
->getipaddress
etc