Julia: Understandable names

Created on 9 Dec 2015  路  17Comments  路  Source: JuliaLang/julia

Base Julia is full of abbreviations. Functions with names like iscntrl are inscrutable. I think, before it's too late, that Julia should make every effort to be written in plain English. To this end, I think that rules like this should be put into effect:

No abbreviations should be used in base julia for any words shorter than 10 (or so) characters
All words should be separated by _

All 17 comments

I'm sorry, but this kind of sweeping issue isn't credibly actionable. Please stop opening things that are so broad.

@bramtayl Might be better to 1) make a list of at least the worst cases 2) write a PR to add new, more understandable names (including deprecating the inscrutable abbreviations) 3) wait for the fireworks!

The story of picking function names is much more complex than you'd think at first. True, iscntrl is pretty obscure, but it's inherited from C. It's possible all of those functions (about 15 of them) could be replaced by a single character-class-querying function.

We also have an avowed policy of avoiding underscores, as it tends to produce overly long names and less-well-factored functions. For example print_shortest(x) (which we currently have, unfortunately) seems worse to me than, say, print(x, shortest=true).

So I think these have to be taken on a case-by-case basis. To that end, it's certainly possibly there are a few especially obscure names we can fix.

We also have an avowed policy of avoiding underscores

@JeffBezanson Where is that policy? That doesn't seem to be what Julia Docs says:

Word separation can be indicated by underscores ('_'), but use of underscores is discouraged unless the name would be hard to read otherwise.

The statement in Julia Docs seems reasonable, however, avoiding underscores by simply removing them flies in the face of research that has been done into readability of names. I believe that language design should be data driven, and think it would be good to take that research into account in julia.

For things like your print example, where there _is_ a better way, with keywords or the type system, I definitely agree, it's would be good to get rid of the composite names.

Another thing is to adopt hard to understand names because they came from C (from back in the day when linkers could not handle more than 8 characters in a name, file names were limited to either 8.3 (C/PM & MS-DOS), or 14 for Unix), Matlab, or R. I've been told a number of times that arguing that something should be done like some other language was not a good reason to do something in julia, however it is has been used as an excuse for those inscrutable names.

That's sad, because julia can do _so_ much better, with it's great (and getting better) type system and multi-dispatch.

How about something like: ischar(Control, x) or ischartype(Control, x)?
(with abstract CharClass ; immutable Control <: CharClass ; end)
A package could define iscntrl(x) = ischartype(Control, x) if they really wanted to.
Looking at Base, only 4 of those functions are even used, isspace is used in 4 files, 11 times total,
isprint is used twice, in 2 files, isupper and islower are used once each in 1 file.

Would people accept a PR that added ischartype, with say just the short forms for isspace, islower, isupper, isprint and isdigit (which I saw some in packages), deprecating the rest, like iscntrl?
isblank was deprecated already.

I definitely think that is is good have "well-factored functions". And I understand how it would be easier to inherit C function for users familiar with C.

One thing to do would be to go through Julia and make the three changes below:
-No words longer than 10 or so characters (for example, "short" instead of "abbrev.")
-No abbreviations at all
-All words should be separated by _

And in the process, building "well-factored functions" groups where possible, and maintaining C compatibility if deemed necessary, all on a case by case basis.

From this thread, isupper, isprint, islower have the words supper, sprint, and slower within them. Unwary users could be confused by all the apple products referenced in Julia! This ambiguity would be avoided by is_upper, is_print, and is_lower.

Here's an example: The function to remove a file is rm, taken from the Unix shell syntax. However, rm is strictly more powerful as it also includes the functionality of Unix rmdir. The same function is called remove in other languages (including C!), so this could be an entry on your list.

Note, however, that there is a group of similar functions (including e.g. mkdir, cp, mv, symlink, and a few more), and they should be named in a consistent manner.

Unix is a great example of an interface that is completely unreadable because of excessive abbreviation. Regex is another example (and there are great alternatives; see the R package rex).

This issue is also mistitled since it's not about syntax at all.

-No words longer than 10 or so characters (for example, "short" instead of "abbrev.")
-No abbreviations at all
-All words should be separated by _

We are definitely not doing this. This isn't three changes; more like thousands. But the real issue is that as soon as you start trying to apply sweeping policies like this you run into hard decisions over and over. Is rand really so objectionable that it needs to be renamed random? Should abs be absolute_value, and should svd be singular_value_decomp? "Decomposition" is longer than 10 characters; the English language and history of mathematics aren't always convenient.

Is DateTime unreadable? Should it be Date_Time? Does "TCP" qualify as an abbreviation, or is that allowed by custom? Are people better off learning that "eof" in an I/O context means "end of file", or are they better off typing end_of_file every single time? Is NaN an evil abbreviation that needs to be replaced with NotANumber even though it's incredibly standard? Should kron be kronecker_product? Should peakflops be peak_floating_point_operations_per_second?

Another non-obvious point is that when we copy names from C, it's not usually because we're trying to make things easy for C programmers. Rather it's for _global_ consistency --- that is, consistency not just within julia but among software systems. When picking a word for something, one of the first things you ask is whether there is already a word for it, and if so, why not reuse it.

see the R package rex

Isn't "rex" an abbreviation for "regular expression"? Who allowed that?

I think before a 1.0 release, the list of all the names could be given a once over. That doesn't seem unreasonable. It might allow a few particularly poor names to be noticed when viewed in a larger context.

But in the long run Julia provides capabilities to create packages which export custom sets of names, heck the package could even be auto generated for the most part. If someone want's a custom set of names then they can provide it themselves.

But the real issue is that as soon as you start trying to apply sweeping policies like this you run into hard decisions over and over

@JeffBezanson You are kind of arguing against yourself here - while I also don't agree with @bramtayl's 3 rules, that's why I also don't agree with the "avowed policy of avoiding underscores".

DateTime isn't a good example, as that already has camelcase to help distinguish the words.
If you follow the convention that all caps implies a constant, that is where underscores are _most_ needed to help separate "words" and make things readable. They aren't needed for camelcase names, by convention module and type names in Julia, as the capital letters are breaking them up.
All lowercase, although more readable than all uppercase (and there are numerous studies on that, from long before computers), do benefit greatly by using underscores if they are long.

I agree with most of your other examples, except the last two - peakflops could be peak_flops, as the common abbreviation is flops, not peakflops, and maybe kron would be more readable as kron_prod (given prod is a common abbreviation for product, assuming that kron is a known abbreviation at least among mathematicians for kronecker).

My recommendation, on a case-by-case basis, would be to look at longer names with _s, and short abbreviated names such as iscntrl, and first see if there is a more julian way of factoring them.
If not, then leave the underscores alone, they do help readability, don't remove them just because of
some policy that somehow underscores are always "bad". Same thing with abbreviations such as TCP
or NaN.

@mason-bially Yes, totally agree. I think short hard to understand names (i.e. to be like Matlab, Unix sh, or C) really belong in compatibility packages. One of the (many) great things about Julia is that it makes doing that sort of thing trivial.

I think short hard to understand names (i.e. to be like Matlab, Unix sh, or C) really belong in compatibility packages. One of the (many) great things about Julia is that it makes doing that sort of thing trivial.

I would love to see that, especially as part of #5155, for example the printf library really has no business in Base except for comparability with a c like environment (for which it is quite useful). But we could do with a more python like format that actually enables more powerful format strings. But in the long run those sorts of choices should be made by the people using the language. The base library though should strive to have a well factored set of names, which is why I completely agree with your points.

As Jeff has stated elsewhere, if a name is long enough to need an underscore to break it up, then it's probably actually doing too many things. Generic functions should do one thing, a long name is a sign of missing abstraction and something should be refactored into multiple concepts.

Continued comments on this issue are not helpful. Make concrete proposals. Adding underscores will likely be rejected.

@mason-bially As far as printf, I don't think it fits well with julia, and the type-based stuff that @tbreloff added in https://github.com/JuliaLang/Formatting.jl/pull/10 I think is much nicer (and more julian)

@tkelman "probably actually doing too many things", I'd agree, but that is only probably, and there _are_ a few cases where _'s are appropriate. I have tried to make a concrete proposal here, about the is* functions from utf8proc.jl, but haven't heard back yet. I'll make a PR if it looks like people might be interested.

Ok, I admit the language "avowed policy" was too strong. I should have said "strong preference" for avoiding underscores.

I agree with the goal of identifying and fixing bad names. I think various different approaches will be needed.

  • Remove the _mul_ etc. names as part of #13157
  • Move some things to packages (e.g. special functions)
  • Replace the character category functions with something like what @ScottPJones suggests
  • Replace print_ functions with something like #14052
  • Identify isolated cases of names that can just be improved
  • Move filesystem functions into their own namespace. Then we could have e.g. File.copy, rename rm to remove etc. It would also move obscure functions like issetgid out of the way. This was already started by #12819
  • Clean up the os testing macros (#4233)

I would also prefer an ordinary function call API for formatting to any kind of format strings.

That sounds great! :100:% behind that! (I'll make a PR for the character category stuff, @nalimilan has already made some great suggestions on that)

I've recently put in a CRAN package for building time formats. It looks like the underlying logic used is similar to the print formats of @tbreloff . I might write a similar R package for sprintf. I don't think anyone's actually used the package so it's probably full of bugs, but I thought it might be useful.

https://cran.r-project.org/web/packages/strptimer/vignettes/strptimer.html

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yurivish picture yurivish  路  3Comments

wilburtownsend picture wilburtownsend  路  3Comments

manor picture manor  路  3Comments

omus picture omus  路  3Comments

StefanKarpinski picture StefanKarpinski  路  3Comments