Chapel: Standard library factory functions are not consistent

Created on 17 Oct 2019  路  23Comments  路  Source: chapel-lang/chapel

Related https://github.com/chapel-lang/chapel/issues/12712

Request for consistent identifer-naming style for standard-library factory functions.

The ones I've seen so far include: makeRandomStream, newBlockDist, and createStringWith*.

Alternatively/preferably, define them as type methods instead. Then everything can be e.g., BlockDist.new(...)

Most helpful comment

I am in favor of standardizing factory functions as type methods with a common verb used. Despite being slightly longer, I have a slight preference towards create over make, due to being clear and less overloaded of a term than make.

Some interesting points are made in answers to a SO question on naming factory functions:

'Create' fits the feature better than most other words. The next best word I can think of off the top of my head is 'Construct'. In the past, 'Alloc' (allocate) might have been used in similar situations, reflecting the greater emphasis on blocks of data than objects in languages like C.

'Create' is a short, simple word that has a clear intuitive meaning. In most cases people probably just pick it as the first, most obvious word that comes to mind when they wish to create something. It's a common naming convention, and "object creation" is a common way of describing the process of... creating objects.

...

'Build' and 'Make' are common terms for processes relating to compiling code, so have different connotations to programmers, implying a process that comprises many steps and possibly a lot of disk activity. However, the idea of a Factory "building" something is a sensible idea - especially in cases where a complex data-structure is built, or many separate pieces of information are combined in some way.

All 23 comments

@e-kayrakli: I'm particularly interested in whether we considered making the string factory functions into type methods on string (e.g., string.createWith*) to avoid relying on global standalone functions. It seems obvious in retrospect, but is there something I'm missing? (not to say that the others aren't similarly interesting, just that this was the most recently added case).

@BryantLam - Is the difference in terms make / new / create what is bothering you here? I can imagine many other ways they might not be consistent but if it is simply this term I think it's worth saying that directly in the issue description.

6698 is also related.

@bradcray -- I don't recall any discussions about making these factory functions type methods for strings/bytes. I couldn't see anything like that in the design discussion we had over at #13818

For those two types at least, the idea of converting factory functions to type methods seems appealing to me.

BlockDist.new(...)

While I would prefer new for symmetry, new is a keyword and cannot be used as an identifier. make is my next preference only because it's shorter than create.

I am in favor of standardizing factory functions as type methods with a common verb used. Despite being slightly longer, I have a slight preference towards create over make, due to being clear and less overloaded of a term than make.

Some interesting points are made in answers to a SO question on naming factory functions:

'Create' fits the feature better than most other words. The next best word I can think of off the top of my head is 'Construct'. In the past, 'Alloc' (allocate) might have been used in similar situations, reflecting the greater emphasis on blocks of data than objects in languages like C.

'Create' is a short, simple word that has a clear intuitive meaning. In most cases people probably just pick it as the first, most obvious word that comes to mind when they wish to create something. It's a common naming convention, and "object creation" is a common way of describing the process of... creating objects.

...

'Build' and 'Make' are common terms for processes relating to compiling code, so have different connotations to programmers, implying a process that comprises many steps and possibly a lot of disk activity. However, the idea of a Factory "building" something is a sensible idea - especially in cases where a complex data-structure is built, or many separate pieces of information are combined in some way.

I tend to agree with Ben's preference for create over make (but I also type really, really fast, so that extra character is unlikely to affect me as much as some people).

An interesting case to wrestle with is whether factory functions whose names contain more than just create() should also use create or whether they could use newBlah instead. E.g., the string/bytes factories will likely continue to have additional words in their names differentiating the different modes of creating the value.

I'm not wild about this, but just to put it out there: We could also support type.new() in the parser and have it desugar to some well-defined method name (similar to what we do for .domain on arrays today, though that is also controversial and perhaps likely to go away). But that introduces some pretty significant cases for confusion so I'm not advocating for it so much as putting it out for people to spit on or applaud as they see fit.

I'm okay with create(). I type fast too and I agree that "make" is overloaded.

We could also support type.new() in the parser and have it desugar to some well-defined method name

I could go either way on this. I like it, but don't like the complexity that it brings. create is a fine trade-off to make the compiler have less special-casing.

My 2 cents: create works for me.

I'd be about equally happy with type.new(). In particular I think that new being a keyword will make it more likely for people to use the convention (although they might also think that type.new() isn't just a type method). Anyway type.new() has the advantage of requiring fewer terms for users.

Just leaving a note that I prefer "create", too. All the reasons that I can think of are listed above, but just to repeat the most important ones to me:

  • "create" feels the most natural for me for a factory function
  • "make" would be overloading the term
  • "new" doesn't sound like just a function but something more special

I started working on this and got string.createWith*Buffer working in a branch, and hoping to convert others and open a PR soon.

The only tiny drawback that I can see is having two deprecated ways of creating strings on master (string.init and createStringWith*) but string.init should be removed completely in the next release. So, I don' think it is too ugly to worry about.

Another factory function that needs to change is makeIndexBuffer.

This one is slightly different in the sense that it needs to be called on a sparse domain, so it shouldn't be a type method. But still the name should be changed to createIndexBuffer to keep things consistent.

After some discussions under PRs and offline with @mppf and @ben-albrecht I want to bring up one potentially complicated case with the Random module. We currently have:

  • A type alias RandomStream that aliases the default RNG (that is PCG)
  • class PCGRandomStream
  • class NPBRandomStream
  • makeRandomStream that is deprecated in favor of createRandomStream (via #14452)

The question here is whether createRandomStream (or in general createType) is an acceptable variation of the convention we are trying to establish here. The reasons I am (partly motivated by @mppf) reluctant to add RandomStream.create:

  • createRandomStream has an algorithm argument that allows the programmer to choose a different algorithm than the default. That would not be possible with RandomStream.create.
  • This interface is also fits well with other Random module convenience functions like fillRandom, shuffle etc, where they also take an algorithm argument
  • A potential [PCG|NPB]RandomStream.create does not really add anything on top of new owned [PCG|NPB]RandomStream, as opposed to other other factory functions discussed here:

    • string/bytes ones add more verbosity/clarity

    • distributed domain/array ones avoid a lot of typing

Not as important but also changing mySparseDomain.makeIndexBuffer to mySparseDomain.createIndexBuffer is another example where a create* method is not a type method. And I don't see a way to make it a type method without making it look contrived.

The bottom line is, I am personally in favor of viewing the functions/methods that start with create as a convention rather than making the convention strictly about type methods starting with create*

* That would not be possible with `RandomStream.create`.

I'm probably missing something but why doesn't making it a type method work?

The bottom line is, I am personally in favor of viewing the functions/methods that start with create as a convention rather than making the convention strictly about type methods starting with create*

One concern is that if multiple conventions are applicable to a type, which would a developer implement? Does the developer prefer one of them (e.g., the type method), or do you implement all of them (e.g., and also a global function)? Will a user remember which was implemented? I'd rather not have to constantly look up in the documentation to find which are implemented (but maybe factory functions are more exceptional than normal and won't be a common idiom). If type methods are not applicable to some usage patterns, I think I'd rather have all global functions than some be global and others be type methods just so I don't have to remember what style of factory functions Chapel chose to implement.

I'm probably missing something but why doesn't making it a type method work?

Choosing the RNG algorithm via a function argument would not work with type method. Instead you'd have to use:

RandomStream.create(); // get an instance of default RNG (PCG)
PCGRandomStream.create(); // get an instance of PCG RNG
NPBRandomStream.create(); // get an instance of NPB RNG

Having to call different function to get different RNG streams might make user code clunkier than just passing an argument to a unified factory function. (Although, I am not sure if an application would need more than one type of random streams. And, currently you can control which one is default by the compiler flag -sdefaultRNG=(RNG.PCG|RNG.NPB) and in the application code use the first one of the above all the time.)

That being said, I believe your concern is valid. If we want to restrict the factory functions to be all type or non-type methods I see the following options:

  1. Forget type.create() and adopt createType() as the factory function convention.
  2. Keep type.create() as the convention
    2a. Remove any factory functions from the Random module. Have the current utilities fillRandom, shuffle and permutation as-is with algorithm argument. But make the user use initializers to create a random stream.
    2b. Make all the random module utility functions type methods on different RNG algorithms, instead of exposing that via an argument. This would result in some extra typing as well:
fillRandom(myArr);  // default algorithm
// will turn into
RandomStream.fillRandom(myArr);

Some of these can be discussed as a separate issue, but the ideas here can be pertinent to other (future?) modules when implementing factory/utility functions. So I am keeping the discussion here for now.

The factory functions mentioned in the issue exist for different reasons. Of course new MyType is the "normal" way to create something. This issue is all about exceptions.

  • makeRandomStream uses an argument to decide which type to create, so isn't implementable withnew
  • newBlockDist hides some implementation details that cause new Block to not generally do what is desired
  • createStringWith* include more information about how the string should be created in the function name

I think it is reasonable to have consistent terminology for alternatives to new Something (e.g. "create") but I don't think it's reasonable to expect them to always be type methods or have a specific name.

One big reason for that is that I consider only the 1st of these to be a factory pattern. The typical factory function examples that I know of actually takes a string argument specifying which thing. E.g. createEncoder("UTF-8") or createHashFunction("SHA-256"). These functions do not always return the same type, but it is common for them to return a type meeting certain requirements (e.g. implement an interface or subtype of parent class).

Back to this:

I'm probably missing something but why doesn't making it a type method work?

Choosing the RNG algorithm via a function argument would not work with type method. Instead you'd have to use:

It could work with a type method; it would just be strange. We don't have a "Random Stream Factory" type. So, we could either add such a type (but... what's the point of doing so?) or we could make the RandomStream implementations able to return a different type according to a param argument. E.g. proc type RandomStream(param algorithm = defaultRNG).

Anyway, I don't think that we should require that all factory functions are or are not type methods, because the reason for having the factory function is pretty different in these different cases.

Make all the random module utility functions type methods on different RNG algorithms, instead of exposing that via an argument.

We might decide that this is a better idea than the alternatives. If we chose this path, we could have a function returning a type and accepting the param algorithm, e.g. proc getRandomStreamType(param algorithm) type.

I think we've ended up in this situation due to an ambiguity in the Random module interface: RandomStream is both referred to as a type alias to the default RNG, but also as a common interface, as if it were a superclass to the random streams, which it is _not_. As a result, we are trying to justify an exception (createType()) to the exception (Type.create()) of creating objects with new Type().

From here, we could accept the Random interface as it is and allow exceptions to the factory function standard to work around it, or we could try to address the issues in the Random interface.

As an interface-consistency zealot, I personally lean towards the latter. I think @e-kayrakli's suggestions above (specifically bullet 2) seem to be going in the right direction.

If RandomStream is a type alias to PCGRandomStream, then it doesn't make sense for createRandomStream() or RandomStream.create() to create something that is _not_ PCGRandomStream

Remove any factory functions from the Random module. Have the current utilities fillRandom, shuffle and permutation as-is with algorithm argument..

The module-level functions fillRandom(), shuffle() should probably be reserved for the module-level random stream(s) when those get added (https://github.com/chapel-lang/chapel/issues/7225), so I think it is reasonable to require calling them as methods on existing random stream objects otherwise.

Perhaps we should spin off a separate issue to discuss specifics of Random interface design further.

I agree with @ben-albrecht.

  • newBlockDist hides some implementation details that cause new Block to not generally do what is desired

This seems more like a problem with the default behavior of new Block(...) if users are generally expected to use newBlockDist/BlockDist.create() instead.

  • createStringWith* include more information about _how_ the string should be created in the function name

Yeah, there was a discussion about this approach versus an enum-argument variant in https://github.com/chapel-lang/chapel/issues/13818#issuecomment-523626592.

This case is also somewhat different since it deals with FFI behaviors. Another approach could have been to make new types like OwnedString or BorrowedString and then implement something akin to Rust's From trait: OwnedString.from(c_str).

E.g. createEncoder("UTF-8") or createHashFunction("SHA-256"). These functions do not always return the same type
...
or we could make the RandomStream implementations able to return a different type according to a param argument. E.g. proc type RandomStream(param algorithm = defaultRNG).

I'm okay if RandomStream(int).create(algorithm = RNG.NPB) gave me a NPBRandomStream; it is what I asked for.

Anyway, I don't think that we should require that all factory functions are or are not type methods, because the reason for having the factory function is pretty different in these different cases.

In general, I would prefer type methods because it avoid polluting the global namespace. Maybe this helps the compiler find viable candidates faster? I don't know. It's also slightly easier for me to reason about where a type method (should be) defined.

But I also don't really care that much as long as everything is consistent. I do think that global functions would be easier for most people to use since that's what people are generally accustomed to already, but

  1. to me, there's an elegance that RandomStream(int).create() requires that the type/param fields of the type are specified in the left side rather than in the method argument list. Maybe not everyone agrees that it's easy to learn or to use, though.
  2. the "exceptional" cases noted also seem like they're just interfaces that could have been designed differently / more consistently to begin with. Having global functions just makes it more loosely defined as to what the consistent convention is, which means I usually have to go look it up.

I summarized the Random module specific discussion in a proposal for that module under #14589, I am holding off merging the PR that changes other factory functions (#14449) until the discussion on the Random module is settled to avoid potential back-and-forth.

I feel that the main objection I have to the direction this is going has not been sufficiently discussed.

The typical factory function examples that I know of actually takes a string argument specifying which thing. E.g. createEncoder("UTF-8") or createHashFunction("SHA-256"). These functions do not always return the same type, but it is common for them to return a type meeting certain requirements (e.g. implement an interface or subtype of parent class).

If we insist that all factory functions are type methods, what type do they apply to? Is it the interface / parent class? Will it always be the case that we can name an interface / parent class? For example, in Chapel, it is possible to have generic functions that return a type based on a param even without any interface or parent class. Are we saying that the standard library may not use that pattern?

Broadly speaking, isn't a factory function any function that returns a newly constructed thing? open fits into that category - it returns a newly constructed open file. And yet I think it would but utterly unreasonable to insist the function be named file.create() instead of open(). (at the very least because "create" means something totally different in the context of the filesystem). For that matter doesn't + on integers return a newly constructed integer, so shouldn't we be writing int.createSum(1, 2) instead of 1 + 2 ?

I think it's reasonable to have a broad convention, like "use the term create in factory functions unless you have a good reason not to". However I think that going beyond is a really bad idea. (I think any style guidance that does not include "unless you have a good reason not to" is a bad idea).

I think it's reasonable to have a broad convention, like "use the term _create_ in factory functions unless you have a good reason not to". However I think that going beyond is a really bad idea. (I think _any_ style guidance that does not include "unless you have a good reason not to" is a bad idea).

Agreed.

I care that a consistent convention exist; I don't think I particularly care enough about what the convention should be (though I clearly have opinions). In a way, this issue brought to light that the Random module was designed differently than other modules, maybe for good reasons. I don't know enough about Random to comment on it.

Will it always be the case that we can name an interface / parent class?

I would hope that the answer is "yes" only because constrained generics would require it in most cases. For example, string and bytes need a common interface for use in constrained generics with a common set of string operations to interchange between these two types.

Is there an example of a set of related types you can think of that a named interface wouldn't make sense?

For example, in Chapel, it is possible to have generic functions that return a type based on a param even without any interface or parent class. Are we saying that the standard library may not use that pattern?

If there's a good reason for this to exist, it shouldn't be disallowed. I'm having a hard time thinking of a good function factory function that returns unrelated types.

After some offline discussions with @ben-albrecht @mppf and @bradcray I think we shouldn't rush into a decision on this issue.

From design standpoint, these are probably better handled with a wider style/conventions discussion. An example of this is under #6698. Moreover, (as particularly pointed out by @mppf), some of the standard library interface will be changed when we have constrained generics (interfaces) as the language feature.

From 2.0 standpoint, string and bytes factory functions are more part of the language than the others. But still, on the spectrum of language feature vs library feature, I'd argue that they are in the gray area and even closer to being a library feature.

@BryantLam -

Re

For example, string and bytes need a common interface for use in constrained generics with a common set of string operations to interchange between these two types.

Note that we are discussing whether string and bytes return the string/bytes type or a different type when doing indexing on #13901. While either way we could have an interface covering some of the functions... an interface for string/bytes that doesn't include indexing would seem not very useful. The interface could conceivably have an associated type for the return/yielded type... Anyway if having a single interface for string/bytes is important to you that might be worth saying on #13901.

Was this page helpful?
0 / 5 - 0 ratings