Julia: make C-style APIs more Julian

Created on 1 Feb 2017 · 32Comments · Source: JuliaLang/julia

The following C-derived function names are not terribly Julian:

getaddrinfo
gethostname
getipaddr
getpid
getsockname
getpeername

I'm sure there's more, but I found these just by looking for things starting with get. I think we should consider renaming these and/or refactoring these APIs. I'm sure there are also others, which people could post here as well.

design

Source

StefanKarpinski

👍9 😕1

Most helpful comment

No need to abbreviate address to addr.
IPAddr -> IPAddress
getipaddr -> getipaddress etc

samoconnor on 14 Jan 2018

👍6 ❤2

All 32 comments

is_apple, is_bsd, is_linux, is_unix, is_windows

Most is... functions in Base are spelled without the underscore. I would suggest spelling these without the underscore.

jmkuhn on 1 Feb 2017

👍6 👎1

Put it like this: If the function has the same function, why should it have a different name?
Or: What problem is solved by this change?

lobingera on 1 Feb 2017

The problem that would be solved is that these C-inherited names match neither the internally used nor outwardly recommended naming conventions for Julia. Perhaps one would expect a C-style name in Julia that matches its equivalent in C if the arguments and everything were the same, but that's rarely the case with Julia because we don't typically pass Ptr around in user-facing APIs. Plus, as Stefan mentioned, we should consider refactoring these APIs to be more Julian; in doing so we may diverge from the C API, and proving a different API with the same name is a little surprising. But I think the most important point is that it's better to provide a consistent user experience with the language than to inconsistently rehash another language's APIs.

ararslan on 1 Feb 2017

👍6 ❤1

I have a slight preference for keeping C names for things that are sufficiently similar to C functions. But I have a large preference for moving these functions to other modules, like Sys and Socket (which doesn't exist yet).

JeffBezanson on 1 Feb 2017

This was specifically spurred by being on a call today which had a dozen top Julia contributors on it and no one could remember that the function to find out your hostname is gethostname. Multiple people suggested hostname(). I finally remembered that we'd borrowed the name from C. I've frequently gone through the same thought process with process ID where I think, pid(), no wait, it's called getpid(). The fact is that we just don't tend to prefix functions that return a value with "get". Now in the case of pids, we could just have a global constant instead, assigned at startup, but that's a different question.

Philosophically, I'm ok with mirroring an API suite that's really similar to another language – see the path and file APIs which are intentionally modeled on Python's (since Python had the most sane and comprehensive API for that sort of thing). But in this case, this is just a random smattering of C-derived names in the middle of APIs for networking and I/O that don't really resemble C much at all. I think that's why even people very familiar with C have a hard time remembering getpid and gethostname.

StefanKarpinski on 2 Feb 2017

About 30 years ago, there was a lot of discussion in the POSIX community on how best to map the POSIX API, which was originally specified only for C, into other programming languages. A distinction was made between "thin bindings", which provide an API as close as possible to what C programmers are accustomed to, and "thick bindings", which try to make full use of the native facilities and conventions of the new language (names, types, packaging, error reporting, etc.), and basically end up rewriting the full POSIX standard into the new language. IEEE Std. 1003.5 (POSIX/Ada) is probably the most elaborate "thick binding" of POSIX ever written for a language other than C. That standard "contains a very detailed rationale explaining the reasoning and analysis that led to the final shape of the binding. The rationale discusses most of the issues that confront any binding developer, including packaging, documentation style (e.g., "thick" vs "thin") and tasking safety." Perhaps [1] is still worth a read (e.g. Julia's type facilities are much closer to Ada than to C)? But in the end, I believe "thick bindings" to POSIX never gained a huge market share. Many developers are already familiar with the C bindings, and will refer to the C man pages and POSIX spec for detailed semantics, because in practice hardly anyone ever writes documentation for non-C POSIX APIs that is nearly as thorough and detailed and authoritative as the one that already exists for C. I suspect that is why most other programming languages follow the C binding of the POSIX API quite closely: using the same (or very similar) identifiers for functions and their parameters makes it easier to refer to the C API documentation.

[1] IEEE Std 1003.5-1999, Appendix B.1.5: Level of Binding to B.1.9: Mapping C features to Ada, pp 552-563. http://ieeexplore.ieee.org/document/815314/

mgkuhn on 5 Feb 2017

Except we're not even in the business of providing bindings to POSIX – we're actually mostly providing bindings to libuv, which is a wildly different API with quite different names.

StefanKarpinski on 6 Feb 2017

It would be good to compare and contrast what various languages call these functions. Presumably not everyone follows the C names, but perhaps most do.

ararslan on 20 Jul 2017

👍1

This doesn't seem super urgent to me. For these kinds of functions, it would be easy to introduce a new API if we want, and deprecate the current names in the next major version after that.

JeffBezanson on 11 Sep 2017

My only issue with introducing a new API soon and waiting to deprecate the old one in the next major version is that for the 1.x series, we have two sets of names for the same things, which I think would be confusing.

ararslan on 11 Sep 2017

Fair point, I just think these names do little enough harm that we can reasonably take them off our plate for now.

JeffBezanson on 11 Sep 2017

The relatively simple first work item here is look at what Python, Ruby and Perl call these. If they're what we call them, then we can just close the issue. If there's a lot of disagreement but one of those languages calls these something sane, then we should follow the sane one (often Python).

StefanKarpinski on 12 Sep 2017

❤2

Python's socket module seems to use the exact same names: https://docs.python.org/3/library/socket.html

JeffBezanson on 14 Sep 2017

Go's API is quite different: https://golang.org/pkg/net/

Keno on 14 Sep 2017

Research results:

Rust uses the C names but from its libc crate.

ararslan on 14 Sep 2017

perl's getpid seems to be $PID, not getppid.

yuyichao on 14 Sep 2017

Looks like $PID and getppid are different: https://perldoc.perl.org/functions/getppid.html

ararslan on 14 Sep 2017

Right, and getppid is the wrong one.

yuyichao on 14 Sep 2017

Okay, updated the table.

ararslan on 14 Sep 2017

Let me make a proposal (subject to naming of course):

abstract type Host; end
abstract type NetworkEndpoint; end
struct IPEndpoint{A<:IPAddr} <: NetworkEndpoint
    addr::A
    port::UInt16
end
struct LocalHost <: Host; end
localhost = LocalHost()

gethostname() -> hostname(localhost)
getaddrinfo(hostname) -> resolve(IPAddr, hostname) (can use IPv4 or IPv6 instead for specific protocol)
getipaddr -> ipaddr(localhost)
getsockname(sock) -> endpoint(sock)::NetworkEndpoint (ipaddr and port for accessors)
getpeername(sock) -> peer(sock)::NetworkEndpoint
getpid -> pid

Keno on 14 Sep 2017

❤3

Looks good to me. I do especially like replacing the no-argument functions with explicitly operating on localhost.

JeffBezanson on 14 Sep 2017

👍1

What is hostname(...) mean if you don't pass the magic token?

vtjnash on 14 Sep 2017

I'd like it to be the generic function you use for asking the hostname. So hostname(::NetworkEndpoint) might give the hostname of that endpoint (where possible), hostname(worker) might give the hostname for a worker, etc.

Keno on 14 Sep 2017

What about having an endpoints function that returns a pair of address structures, this side and the peer? That's one less function to remember, and I feel like endpoint and peer are going to be a little hard to remember. here, there = Net.endpoints(sock) seems easy to remember.

StefanKarpinski on 18 Sep 2017

👍1

One thing I don't really like about using a localhost singleton like this is that this name is usually used to represent the loopback interface and not the host computer. The use of it in ipaddr(localhost) and hostname(localhost) can easily be confused with wanting to get information about the loopback interface, which are both valid questions and would be ip"127.0.0.1" and "localhost" respectively.

In another word, it can be pretty confusing to have resolve(IPAddr, "localhost") and ipaddr(localhost) return different results.

I do like the general idea though, as long as the singleton is not actually called localhost.

yuyichao on 19 Sep 2017

We could use the name loopback but that doesn't help much since it means the same thing.

StefanKarpinski on 19 Sep 2017

I personally feel that the C names are more discoverable, since people know them. Of course, I would be ok with Julian names as well - in which case we need an assignee who can settle the design and get this done by feature freeze.

Even if we change these after 1.0, they are very easy to update with femtocleaner automatically.

ViralBShah on 19 Nov 2017

This could be a case where we add a new API (non-breaking) whenever we get around to it, and just wait longer to deprecate these names. They're not too harmful.

JeffBezanson on 19 Nov 2017

👍1

Another simple possibility here would be to put these wrappers in a standard library package.

StefanKarpinski on 20 Nov 2017

👍2

Both Python and Ruby have this kind of functionality in a standard namespace called "socket." Based on a quick search through the repo, I think this stuff is only used in Distributed, which is being moved to the stdlib as of #24443. So it seems quite sensible to me to put this stuff in a Socket module in stdlib and add a dependency on Socket in Distributed.

ararslan on 20 Nov 2017

👍1

Let's just leave these here (they're not actively harmful) and we can introduce a nice shiny Sockets standard library package in the future.

StefanKarpinski on 20 Nov 2017

👍1

No need to abbreviate address to addr.
IPAddr -> IPAddress
getipaddr -> getipaddress etc

samoconnor on 14 Jan 2018

👍6 ❤2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

0.5 `@test_skip` causes CI to fail

sbromberger · 3Comments

Dates.format regression on master: Width of milliseconds field cannot be fixed anymore

helgee · 3Comments

isposdef() is incorrect

wilburtownsend · 3Comments

add special display for ≈ test failures

StefanKarpinski · 3Comments

The devdocs page for functions is outdated

ararslan · 3Comments