Crystal: File.new API deficiencies and proposals

Created on 5 Jun 2019  路  44Comments  路  Source: crystal-lang/crystal

There are several problems with the existing File.new API.

  1. No way to create a file for read/write without truncating the existing file (O_CREAT). #7849
  2. No way to open a file for create without overwriting or opening an existing one (O_EXCL).
  3. No way to atomically lock a file when opening (O_EXLOCK).
  4. No way to use to avoid following symlinks (O_NOFOLLOW).
  5. Following the POSIX fopen API may seem easy but it doesn't scale to current or future uses.
  6. File modes as strings are runtime errors, not compile time.

Solutions

  1. Solutions used by other languages.
  2. Proposed solutions.

1. Creating without truncating

It's not possible without race conditions.
See #7849

2. Creating a file without overwriting (O_EXCL)

Race condition avoidance is one of the numerous reasons it's necessary. This flag may be combined with O_EXLOCK or O_NOFOLLOW for various uses. Other examples are plentiful using google.

3-4. Atomic locking and O_NOFOLLOW

Opening a file with a lock atomically lets you easily modify files without race conditions or risk of data loss. The atomic_write shard would benefit from this.

Example of race condition free atomically rewriting a file:

File.open "file", O_EXLOCK | ... do |old_file|
  buf = modify old_file.read
  File.open "file.tmp", O_EXCL | O_EXLOCK | ... do |new_file|
    new_file.write buf
  end
  File.rename old_file, new_file
end

Existing application already do this. This feature is required for compatibility with other applications as File#flock_* has unavoidable race conditions. Existing flock functions are still useful for upgradable locks or multi reader/writers on an existing file.

It seems like all major targeted platforms already support compatible open modes.

Linux/MacOS/FreeBSD/OpenBSD all support the following modes:

  • O_SHLOCK Atomically obtain a shared lock.
  • O_EXLOCK Atomically obtain an exclusive lock.
  • O_NOFOLLOW If last path element is a symlink, don't follow it.

Windows

  • CreateFile dwShareMode has similar options for locking.
  • O_NOFOLLOW can be simulated using FILE_FLAG_OPEN_REPARSE_POINT, checking if the opened file is a symlink and raising an error if it is.
  • I don't have or use windows so this would need verification.

Creating an OS abstraction layer isn't needed as the necessary functions are widely supported.

5. Runtime vs compile time errors:

Crystal claims to catch errors at compile time.
File.new("foo", "bar") is a runtime error. This already bit me in development where a rarely used code path had a typo when porting a ruby application and changing O_CREAT to "e" instead of "w".

6. The API doesn't scale.

Existing crystal file modes and their POSIX mappings:

| Crystal | POSIX |
| --- | --- |
| r | O_RDONLY |
| r+ | O_RDWR |
| w | O_WRONLY O_CREAT O_TRUNC |
| w+ | O_RDWR O_CREAT O_TRUNC |
| a | O_WRONLY O_CREAT O_APPEND |
| a+ | O_RDWR O_CREAT O_APPEND |
| rb | same as without b |
| wb | same as without b |
| ab | same as without b |

Common uses that are missing:

  • Create a new file but don't truncate an existing one. (What I currently need)
    • O_WRONLY O_CREAT (no trunc)
    • O_RDWR O_CREAT (no trunc)
  • Create a new file but don't open an existing one. (What I currently need)
    • O_WRONLY O_CREAT O_EXCL (no trunc)
    • O_RDWR O_CREAT O_EXCL (no trunc)
  • Append to an existing file without creating. (Errors could indicate missing data, misconfiguration, mount/network issues or application error)

    • O_WRONLY | O_APPEND
    • O_RDWR | O_APPEND
  • Append to an existing file only creating a new file. An example would be log rotation for creating a new file. Errors may indicate a race condition with 2 processes trying to rotate at the same time.

    • O_WRONLY | O_APPEND | O_CREAT | O_EXCL

    • O_RDWR | O_APPEND | O_CREAT | O_EXCL

Any of the above may be combined with either of the 2 locking options.
Any of the above may be combined with O_NOFOLLOW.

So that's ~14 different options * binary_or_not * 2_locking_options_or_not * nofollow_or_not for applications I've come across or have developed. Security software is rather specific in it's locking and race condition needs. There may be more exotic combinations for other apps.

That's way too many letters.

7. Solutions used by other languages:

| Crystal | Type | Example |
| --- | --- | --- |
| c# | almost POSIX flags | File.Open("foo", FileMode.Open) |
| go | POSIX flags | OpenFile("foo", os.O_RDWR, 0644)
| java | multiple. almost POSIX flags. additional methods with common options. 3rd party packages that provide native API with POSIX flags | options = new OpenOption[] { WRITE, CREATE_NEW }; FileSystemProvider.newInputStream("foo", options) |
| nim | almost POSIX flags | open("foo", fmWrite)
| python | POSIX flags with extra work | os.fdopen(os.open("foo", os.O_RDWR), 'rb+')
| ruby | POSIX fopen or POSIX flags | File.new("foo", File::RDWR) |
| rust | almost POSIX method chaining | OpenOptions::new().write(true).create_new(true).open("foo"); |

Many of the almost POSIX flags are renamed flags with 1 to 1 or 2 to 1 mappings. CREATE_NEW is often O_CREAT | O_EXCL.

How a file handle is specified as read/write also changes between languages.
For most languages it's part of the file flags.
Java specifies reader/writer using different methods with a separate file mode param.
With rust it's part of the method chain.
With python it's specified twice (when using specific file modes).

Most of the languages above have an option to not truncate existing files when creating, something that crystal lacks. Most of them also have options to not create new files if one already exists (java, rust, any POSIX, maybe more).

7a. Proposed solutions

  • "r" for read
  • "w" for write
  • "a" for append
  • "c" for creat
  • "e" for excl
  • "s" for shared lock
  • "l" for exclusive lock # don't confuse with "e"
  • "n" for symnofollow
  • "f" for file only
  • "d" for directory only

Opening a file for append, read, create, excl becomes:

  • File.new "file", "farce"

Maybe that didn't work out so well.

7b. Proposed Solutions

Use method chaining

  File.create_new.append.new "foo"
  • Advantages: Probably the most terse solution. (Other than 7a)
  • Disadvantages: Same as custom enum see below. (See below)

Add arguments

  File.new "foo", "w", create_exclusive: true, flock_exclusive: true
  • Advantages: 100% backwards compatibility
  • Disadvantages: Same as custom enum and well, look at it.

Create your own enum

  File.new "foo", File::Write | File::Create | File::FlockExclusive
  • Advantages: Looks prettier than POSIX.
  • Disadvantages: Documentation is custom. More work for those familiar with POSIX or other systems when deciphering what a file mode actually does. Users won't find answers in google when equivalent answers already exist for POSIX.

Use POSIX flags

File.new "foo", File::RDWR | File::CREAT
  • Advantages: Ruby compatible. Easily googleable. Why does my file do X with O_APPEND has hundreds of results. If you tried the same with 'crystal File.open "a"' you get php and CompTIA+ results. Additional flags that combined existing flags can be added such as File::CREATE_NEW that map to common features in other languages.
  • Disadvantages: Not as pretty. Needs mapping for windows but that's been done with other languages/libraries and often there are 1 to 1 equivalents.

Use 7a.

  • Advantages: The perfect solution.

Do nothing.

  • Advantages: No new code.
  • Disadvantages: Some common file operations are impossible. Secure file operations are impossible. Race conditions are unavoidable. Porting from ruby or interop with other applications is sometimes impossible. (Without 3rd party libraries). 3rd party libraries will be created to fill the gap leading to more fragmentation of which library or interface to use.

Closing thoughts

POSIX flags seem like the obvious choice for documentation clarity and increasing the chances of someone finding an answer to why you can't seek+write with O_APPEND vs searching for File::CrystalCustomName.

POSIX flags have other benefits for applications that take advantage of more platform specific open flags like O_DSYNC for WAL or O_TMPFILE for automatic temp file cleanup as additional flags are easy to define in a shard or PR.

Most POSIX flags are completely or mostly portable to windows often have compatibility layers.

Also 100% compatibility with ruby porting.

Simplifying the flags down to a few options makes writing secure software impossible. (Without nonstandard extensions)

feature stdlib

Most helpful comment

I think at this point, the various API proposals need to be collected up, with usage examples, and then voted on.

shorthand enum notation work with multiple enums?

Depends on the type signature...

The reason for having multiple enums is to enforce one of :read, :write or :rdwr. Since we've decided that's not neccesary, one flags enum is fine.

I'll leave this for 24h before working on a more robust implementation, and I'd appreciate :+1: on this comment to indicate people are happy with the API being:

  • a single enum
  • File.open(name : String) for read-only
  • File.open(name : String, *flags : File::Flags) for other uses
  • keep the "mode string" options but deprecate

All 44 comments

Let's not add a minigame for coming up with words out of a combination of letters.

I vote for POSIX flags (though usually I hate posix-isms). Keep using the old approach for most use cases, and for those using exotic cases -- you will know what you're doing...
With 2 different signatures it's backwards compatible too, yay.

Thanks for the very detailed RFC. Here is my opinion:

  • chaining methods: no, it doesn't fit the crystal idioms;
  • adding new letters: no;
  • mapping each letter to an option: no, it's not the crystal way (kwargs, enums are better);
  • keep current letters: they're fopen relicas, limited and not very explicit, I'm not sure it's worth keeping them (maybe deprecate);

  • kwargs are simple, and is an acceptable API but will lead to a bunch of booleans arguments (maybe not a problem) whose value will almost always be true (impacts readability):

    File.open("x.log", append: true)
    File.open("x.log", write: true, create: true, exclusive: true)
    
  • enums with explicit names (Create, ReadWrite) are nice with 1 argument thanks to symbols mapping to enums but ugly with many:

    File.open("x.log", :append)
    File.open("x.log", File::Options.flags(Write, Create, Exclusive))
    File.open("x.log", File::Options::Write | File::Options::Create | File::Options::Exclusive)
    

    We can have delegations to remove some noise:

    File.open("x.log", File.options(Write, Create, Exclusive))
    File.open("x.log", File::Write | File::Create | File::Exclusive)
    

    Of course it would be nice if crystal was mapping piped symbols to enum flags, but I'm not sure it would be practical to implement (@asterite ?):

    File.open("x.log", :write | :create | :exclusive)
    
  • enums with POSIX abbreviations (CREAT, RDWR) are easier to search for help, but they impact readability, and I foresee lots of typos.

  Of course it would be nice if crystal was mapping piped symbols to enum flags, but I'm not sure it would be practical to implement (@asterite ?):
  ```crystal
  File.open("x.log", :write | :create | :exclusive)
  ```

Can that happen when the type is specified as a enum? Or when the symbol is capitalized to match the enum?

File.open("x.log", :Write | :Create | :Exclusive)

Or maybe drop the : for the special case of flags type?

File.open("x.log", Write | Create | Exclusive)
* enums with POSIX abbreviations (CREAT, RDWR) are easier to search for help, but they impact readability, and I foresee lots of typos.

Those familiar with POSIX may have lots of typos. CREAT is almost automatic for me. <-- Don't weight this too heavy. IDK much.

Detailed documentation stating what each enum maps to would help with searching.

It seems like everyone likes moving to either POSIX or a custom named enum. Considering POSIX is already used behind the scenes to provide the current file modes should I make a PR or wait?

Could do File.open("x.log", :write, :create, :exclusive) with what we have now, but that would incur the cost of a runtime reduce on these *args.

Hmm, can't be worse than parsing a string, right?

I say I like POSIX, but then again, I couldn't stand seeing CREAT everywhere 馃槀

The flags enum method is my preferred method, then deprecating the old fopen-like API.

So create a PR with custom named posix flags?
Read Write Create Exclusive etc.

I think someone should list all possible combinations and then we can aim for a better API. I actually really like Ruby's way with a String: it's not type-safe, but it's short and easy to write. It's also a bit intuitive ("r" for read, "w" for write, "a" for append, etc.).

I wouldn't mind having an API with enums or similar, but first:

Here's a partial list. I may have missed some. Also, this doesn't include any platform specific options.

  • RDONLY
  • WRONLY
  • RDWR
  • WRONLY | CREAT
  • RDWR | CREAT
  • WRONLY | CREAT | EXCL
  • RDWR | CREAT | EXCL
  • WRONLY | APPEND
  • RDWR | APPEND
  • WRONLY | APPEND | CREAT
  • RDWR | APPEND | CREAT
  • WRONLY | APPEND | CREAT | EXCL
  • RDWR | APPEND | CREAT | EXCL
  • WRONLY | CREAT | TRUNC
  • RDWR | CREAT | TRUNC
  • WRONLY | CREAT | EXCL | TRUNC
  • RDWR | CREAT | EXCL | TRUNC
  • WRONLY | APPEND | TRUNC
  • RDWR | APPEND | TRUNC
  • WRONLY | APPEND | CREAT | TRUNC
  • RDWR | APPEND | CREAT | TRUNC
  • WRONLY | APPEND | CREAT | EXCL | TRUNC
  • RDWR | APPEND | CREAT | EXCL | TRUNC
  • RDONLY | SHLOCK
  • WRONLY | SHLOCK
  • RDWR | SHLOCK
  • WRONLY | CREAT | SHLOCK
  • RDWR | CREAT | SHLOCK
  • WRONLY | CREAT | EXCL | SHLOCK
  • RDWR | CREAT | EXCL | SHLOCK
  • WRONLY | APPEND | SHLOCK
  • RDWR | APPEND | SHLOCK
  • WRONLY | APPEND | CREAT | SHLOCK
  • RDWR | APPEND | CREAT | SHLOCK
  • WRONLY | APPEND | CREAT | EXCL | SHLOCK
  • RDWR | APPEND | CREAT | EXCL | SHLOCK
  • WRONLY | CREAT | TRUNC | SHLOCK
  • RDWR | CREAT | TRUNC | SHLOCK
  • WRONLY | CREAT | EXCL | TRUNC | SHLOCK
  • RDWR | CREAT | EXCL | TRUNC | SHLOCK
  • WRONLY | APPEND | TRUNC | SHLOCK
  • RDWR | APPEND | TRUNC | SHLOCK
  • WRONLY | APPEND | CREAT | TRUNC | SHLOCK
  • RDWR | APPEND | CREAT | TRUNC | SHLOCK
  • WRONLY | APPEND | CREAT | EXCL | TRUNC | SHLOCK
  • RDWR | APPEND | CREAT | EXCL | TRUNC | SHLOCK
  • RDONLY | EXLOCK
  • WRONLY | EXLOCK
  • RDWR | EXLOCK
  • WRONLY | CREAT | EXLOCK
  • RDWR | CREAT | EXLOCK
  • WRONLY | CREAT | EXCL | EXLOCK
  • RDWR | CREAT | EXCL | EXLOCK
  • WRONLY | APPEND | EXLOCK
  • RDWR | APPEND | EXLOCK
  • WRONLY | APPEND | CREAT | EXLOCK
  • RDWR | APPEND | CREAT | EXLOCK
  • WRONLY | APPEND | CREAT | EXCL | EXLOCK
  • RDWR | APPEND | CREAT | EXCL | EXLOCK
  • WRONLY | CREAT | TRUNC | EXLOCK
  • RDWR | CREAT | TRUNC | EXLOCK
  • WRONLY | CREAT | EXCL | TRUNC | EXLOCK
  • RDWR | CREAT | EXCL | TRUNC | EXLOCK
  • WRONLY | APPEND | TRUNC | EXLOCK
  • RDWR | APPEND | TRUNC | EXLOCK
  • WRONLY | APPEND | CREAT | TRUNC | EXLOCK
  • RDWR | APPEND | CREAT | TRUNC | EXLOCK
  • WRONLY | APPEND | CREAT | EXCL | TRUNC | EXLOCK
  • RDWR | APPEND | CREAT | EXCL | TRUNC | EXLOCK
  • RDONLY | NOFOLLOW
  • WRONLY | NOFOLLOW
  • RDWR | NOFOLLOW
  • WRONLY | CREAT | NOFOLLOW
  • RDWR | CREAT | NOFOLLOW
  • WRONLY | CREAT | EXCL | NOFOLLOW
  • RDWR | CREAT | EXCL | NOFOLLOW
  • WRONLY | APPEND | NOFOLLOW
  • RDWR | APPEND | NOFOLLOW
  • WRONLY | APPEND | CREAT | NOFOLLOW
  • RDWR | APPEND | CREAT | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXCL | NOFOLLOW
  • RDWR | APPEND | CREAT | EXCL | NOFOLLOW
  • WRONLY | CREAT | TRUNC | NOFOLLOW
  • RDWR | CREAT | TRUNC | NOFOLLOW
  • WRONLY | CREAT | EXCL | TRUNC | NOFOLLOW
  • RDWR | CREAT | EXCL | TRUNC | NOFOLLOW
  • WRONLY | APPEND | TRUNC | NOFOLLOW
  • RDWR | APPEND | TRUNC | NOFOLLOW
  • WRONLY | APPEND | CREAT | TRUNC | NOFOLLOW
  • RDWR | APPEND | CREAT | TRUNC | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXCL | TRUNC | NOFOLLOW
  • RDWR | APPEND | CREAT | EXCL | TRUNC | NOFOLLOW
  • RDONLY | SHLOCK | NOFOLLOW
  • WRONLY | SHLOCK | NOFOLLOW
  • RDWR | SHLOCK | NOFOLLOW
  • WRONLY | CREAT | SHLOCK | NOFOLLOW
  • RDWR | CREAT | SHLOCK | NOFOLLOW
  • WRONLY | CREAT | EXCL | SHLOCK | NOFOLLOW
  • RDWR | CREAT | EXCL | SHLOCK | NOFOLLOW
  • WRONLY | APPEND | SHLOCK | NOFOLLOW
  • RDWR | APPEND | SHLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | SHLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | SHLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXCL | SHLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | EXCL | SHLOCK | NOFOLLOW
  • WRONLY | CREAT | TRUNC | SHLOCK | NOFOLLOW
  • RDWR | CREAT | TRUNC | SHLOCK | NOFOLLOW
  • WRONLY | CREAT | EXCL | TRUNC | SHLOCK | NOFOLLOW
  • RDWR | CREAT | EXCL | TRUNC | SHLOCK | NOFOLLOW
  • WRONLY | APPEND | TRUNC | SHLOCK | NOFOLLOW
  • RDWR | APPEND | TRUNC | SHLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | TRUNC | SHLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | TRUNC | SHLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXCL | TRUNC | SHLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | EXCL | TRUNC | SHLOCK | NOFOLLOW
  • RDONLY | EXLOCK | NOFOLLOW
  • WRONLY | EXLOCK | NOFOLLOW
  • RDWR | EXLOCK | NOFOLLOW
  • WRONLY | CREAT | EXLOCK | NOFOLLOW
  • RDWR | CREAT | EXLOCK | NOFOLLOW
  • WRONLY | CREAT | EXCL | EXLOCK | NOFOLLOW
  • RDWR | CREAT | EXCL | EXLOCK | NOFOLLOW
  • WRONLY | APPEND | EXLOCK | NOFOLLOW
  • RDWR | APPEND | EXLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | EXLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXCL | EXLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | EXCL | EXLOCK | NOFOLLOW
  • WRONLY | CREAT | TRUNC | EXLOCK | NOFOLLOW
  • RDWR | CREAT | TRUNC | EXLOCK | NOFOLLOW
  • WRONLY | CREAT | EXCL | TRUNC | EXLOCK | NOFOLLOW
  • RDWR | CREAT | EXCL | TRUNC | EXLOCK | NOFOLLOW
  • WRONLY | APPEND | TRUNC | EXLOCK | NOFOLLOW
  • RDWR | APPEND | TRUNC | EXLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | TRUNC | EXLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | TRUNC | EXLOCK | NOFOLLOW
  • WRONLY | APPEND | CREAT | EXCL | TRUNC | EXLOCK | NOFOLLOW
  • RDWR | APPEND | CREAT | EXCL | TRUNC | EXLOCK | NOFOLLOW

Uhh I was gonna suggest a decision tree on these but that's unlikely to clarify things.

I really like the suggestion of "three enums". Or, again, perhaps just the clarity that they might bring in understanding the possible combinations.

@didactic-drunk Thank you!

Another thing worth looking at is like how it's done in Go: https://golang.org/pkg/os/#pkg-constants

It seems only one of read, write or read-write can be specified, and then the rest are or-ed. I can't see that working with an enum. Three enums might be good...

But thanks much for the list. I'm sure there's at least some way to shape it in a way that can be insightful to look at.
I'm thinking at least like "flags that never appear together" and "flags that are completely independent"

Here are some additional platform specific optimization options that can be safely ignored on other platforms that don't support them. This would provide a performance boost on the supported platforms and gracefully degrade on all others by defining the flag as 0.

| Flag | Platforms | Notes |
| --- | --- | --- |
| DIRECT | POSIX | Useful for databases or applications that do their own caching. |
| NOATIME | Linux | Fewer writes. |
| SEQUENTIAL | Windows | Optimize buffering for sequential reading. It may be possible to emulate this on some other platforms. |
| Several more |

Platform specific flags that may possibly be emulated or provide useful performance improvements over fsync when supported.

  • SYNC
  • DSYNC
  • FSYNC
  • RSYNC

All of the flags above may be combined with any of the flags listed in the prior post.

There are a variety of other platform specific flags that may be useful but were not mentioned, some of which I already have plans on using. With POSIX flags either extensions could provide them or they could be marked as :nodoc:. Checking for support is as easy as a macro of either defined or flag == 0.

Basically every call requires one of READ | WRITE | RDWR except when it doesn't for more rare uses.

CREAT APPEND TRUNC may be used with any other options.

EXCL optionally pairs with CREAT. This is the only real exclusive dependency.

SHLOCK and EXLOCK are mutually exclusive but may be combined with any other flags including RDONLY by itself.

All other flags can be combined with any other combinations with the exception of the sync options. Normally you would choose one. I have no idea what happens if you supply more than one.

The number of combinations is:

  • (RD WR RDWR APPEND CREATE TRUNC EXCL combinations) * (3 for locking) * 2**n where n is the number of additional options. It's well over 100 before options like DIRECT SEQUENTIAL etc are added, not including platform specific options.

The C# open options look like they were created to support Windows CreateFile* functions which specify additional parameters. (Created for Microsoft's platform).

The go constants are straight POSIX.

I have a ruby application I'm trying to convert that uses a no READ | WRITE open option which is the reason I'm pushing for POSIX flags. That way I can define flag if the main crystal project won't accept extra or hidden flags to get my application working. Otherwise I need to implement my own open method digging in to some crystal internals along the way.

@asterite You suggested 3 arguments. There may be a need for 4 based on reading the mono source code. Or maybe only 2. Or 3. Or 1.

# `Open()` contains 3 arguments.
FileMode [Append, Create, CreateNew, OpenOrCreate, Truncate]
FileAccess [Read, Write, ReadWrite]
FileShare [None, Read, Write, ReadWrite] # Windows specific. 

# `Create` has a different argument and seems like it contains the optional extras.
# It's a subset of the file options available to `CreateFile`.
FileOptions [Sequential, RandomAccess, WriteThrough, ...] # Some of these map to POSIX functions.

Most of the options above are subsets of CreateFile arguments.

FileMode and FileAccess

C# has separate arguments for Create/Append/Truncate/etc and Read/Write/ReadWrite. In C#'s POSIX implementation the 2 arguments are OR'd and passed to open. If you want to split the arguments it would make File.new calls longer and slightly less readable.

# 1 argument
File.new "foo", File::Options.flags(Write, Create, Exclusive)
# 2 arguments
File.new "foo", File::Mode::Write, File::Access.flags(Create, Exclusive)

FileShare

FileShare is an optional argument for mandatory locking inherited from DOS. (See "The Tar Pit: Backwards Compatibility"

It has no equivalent on POSIX. C# uses a default FileShare based on the passed in FileAccess to provide the expected application behavior on windows. It does nothing on POSIX. A similar approach could be used in crystal by providing an optional FileShare argument for applications that require specific Windows compatibility almost if not identical to the C# implementation with the C# defaults when the argument is missing.

This would not effect any of the file flags listed in prior posts or help split them in to multiple arguments as FileShare is it's own completely separate beast.

What choice you make for the default behavior without an explicit share mode comes down to two main choices:

  1. Should the program work the same way regardless of operating system.
  2. Should the program conform to the norms of the current operating system.

C# chose 2. I think that goes against crystals goals of a unified abstract API between OS's.

C# uses Windows mandatory locking defaults on Windows and POSIX behavior on POSIX as POSIX doesn't implement mandatory file locking. They could have used one of the 3-4 advisory locking API's as an approximation but instead chose OS norms.

Consistent and saner behavior would be to never lock a file by default unless requested. Numerous articles outline the insanity of mandatory file locking and it's problems.

Ultimately I don't care which approach is used since I'm developing POSIX software using features that Windows doesn't have and has no means to emulate.

How enum's and how many arguments? @asterite section

If FileMode and FileAccess are split and FileShare is it's own thing then where do the other options go?

  • NoFollow, Sequential, RandomAccess, Sync, etc.

If you think they fit with Read/Write or Create/Truncate you're done and can stop reading. Otherwise a 4th option is needed.

# 4 arguments
File.new "foo", File::Mode.flags(Read, Write), File::Access.flags(Create, Exclusive), File::Options.flag(Sequential, SymNoFollow), File::Share::LockExclusive
# 1 argument
File.new "foo", File::Options.flags(Read, Write, Create, Exclusive, Sequential, SymNoFollow, LockExclusive)

Feel free to mix and match enums in order to make two or three argument version and see how they look.

What's easier for me to read is 1 argument.

Splitting Enum's.

For C# they created an API with with 3 arguments for Windows compatibility and still don't have a place to put extra options. One argument is mandatory. The other 2 have defaults or are derived from the first if not supplied.

On POSIX they end up OR'd and passed as a single value to open().

Naming

Adding enums adds noise and naming problems.

  • Mode, Access, Options, Sharing.

    • Quick: which contains Sequential? Don't look up.

    • Read?

    • CreateNew?

You could try to come up with more descriptive names but that comes down to taste. No name will meet everyone's expectations of where the enum's should be split. That means mandatory documentation reading for everyone who didn't come up with the names trying to figure out what to put where.

Do you still want to copy an API designed for Windows compatibility?

Writing this is issue is 20x longer than the code to implement it. Tell me what goes where and I'll make the PR. If you left it to me I'd do POSIX flags just like go and ruby. (Which means ruby programs would port without extra work)

Mode (Read, Write, ReadWrite) and Flags enums based off posix but with some thought towards how they map towards windows would be my preference. They also need good defaults. I'm fine with one enum too, hopefully the compiler will recognise :read | :write as a literal enum in the future.

New PR #8011.

In fact, we could have File.open(name : String, read : Bool = true, write : Bool = false, *flags : File::Flags)

What do people think of this API?

Well, when writing, am I supposed to always pass read: false, write: true like a peasant

In that case perhaps these two could work

  • File.open(name : String, *flags : File::Flags) # means read
  • File.open(name : String, *, write : Bool = false, read : Bool = false, *flags : File::Flags)

And I would be confused what read+write means alone, namely whether it would first truncate the file or not. And then the default initial position (beginning / end) can become uncertain as well.

I assumed the String modes were here to stay. Are they?

Does append get it's own bool? Why or why not? What about truncate?

The most common operations in crystal are [Read] and [Write | Create | Trunc] or possibly [Write, CreateNew]. Other common modes when there are more options available are probably [Read, Write], [Read, Write, Create], [Write, Create, Append], [Write, Create, Append, Trunc]. This is not an exhaustive list.

With the above proposed API:

File.open(name, read : true)
File.open(name, write: true, File::Mode.flags(Create, Truncate))
File.open(name, write: true, File::Mode.flags(CreateNew))
File.open(name, read: true, write: true)
File.open(name, read: true, write: true, File::Mode::Create)
File.open(name, write: true, File::Mode.flags(Create, Append))
File.open(name, write: true, File::Mode.flags(Create, Append, Truncate))

With a single param API:

File.open(name, File::Mode.flags(Read))
File.open(name, File::Mode.flags(Write, Create, Truncate))
File.open(name, File::Mode.flags(Write, CreateNew))
File.open(name, File::Mode.flags(Read, Write))
File.open(name, File::Mode.flags(Read, Write, Create))
File.open(name, File::Mode.flags(Write, Create, Append))
File.open(name, File::Mode.flags(Write, Create, Append, Truncate))

If they can be shortened using a hypothetical %f:

File.open(name, %f(Read))
File.open(name, %f(Write, Create, Truncate))
File.open(name, %f(Write, CreateNew))
File.open(name, %f(Read, Write))
File.open(name, %f(Read, Write, Create))
File.open(name, %f(Write, Create, Append))
File.open(name, %f(Write, Create, Append, Truncate))

It doesn't seem like individual bool params come out ahead in terms of clarity or typing except for [Read] which is the default and often not supplied.

What are the usecases of WRONLY? It seems very very rare that anyone will use read: false. Just do:

File.open("foo", write: true, :append, :create)

for a+. I want to avoid a :readwrite flag, it's ugly.

It'd just be read and write which are named arguments, since they are the most basic permissions for the file, the rest are just optional flags.

I assumed the String modes were here to stay. Are they?

For now.

Does append get it's own bool? Why or why not? What about truncate?

Because they're optional flags, but read, write, or both have to be specified.

the rest are just optional flags.

Kind of. Write is often paired with Create, CreateNew and maybe Append and maybe Truncate. Since Flags will almost always be specified (Read is the default) I think the provided examples show clarity is improved by keeping them together.

Just look at the crystal code base. Compare how manyopen calls use Write without Create*. Create* is the common case which means supplying flags. But is it Create or CreateNew? That seems to go back and forth. A single param wouldn't work for that either.

Even if one or both [Read, Write] was required I don't see the improvement, especially when reading another person's code or doing security auditing. I'd want them right next to each other and easily parseable for code audits.

Because they're optional flags, but read, write, or both _have_ to be specified.

No they don't.

File.open "lockfile", File::Mode.flags(Create) do |file|
  file.flock_exclusive do
    # ...
  end
end

This works on my #8011 branch right now.

From the GNU Man page:

A file access mode of zero is permissible; it allows no operations that do input or output to the file, but does allow other operations such as fchmod.

There are also corner cases for use of O_EXCL without O_CREAT.

What are the usecases of WRONLY? It seems very very rare that anyone will use read: false.

When I was working with secure log services WRONLY was paired with APPEND allowing multiple processes to append to a file without reading it or overwriting each other. At most a process could attempt to append extra data but couldn't erase or read anything written.

The program was setuid(loguser) to save files to an inaccessible location and used with a pipe.

Just do:

File.open("foo", write: true, :append, :create)

for a+. I want to avoid a :readwrite flag, it's ugly.

It'd just be read and write which are named arguments, since they are the most basic permissions for the file, the rest are just optional flags.

How is
File.open("foo", write: true, :append, :create)
Better than
File.open("foo", :write, :append, :create)

Having individual arguments for read and write is only extra typing.

To be honest, the reason is that I think read: true should be the default, and I don't want a :no_read flag. So that means that every call would have to be File.open("foo", :read). But I guess that's fine.

Append was updated to imply Write. I haven't seen a single use of Append in > 20 years without Write even for odd platform specific cases.

The API could be:

File.open("foo") # Read.
File.open("foo", :write)
File.open("foo", :readwrite) # Or :read, :write.  Don't care.
File.open("foo", :append)
File.open("foo", :read, :append) # Outlier.

Error checking is a reason to not grant read for everything. A file opened for writing raises when read from. This could be a programmer error when using the wrong variable name, memory corruption of the a pointer of file descriptor value or malicious act when used with trusted programs.

I don't think :read should be implied for everything. Many of crystal's own uses of "w" don't read from the file.

Does shorthand enum notation work with multiple enums? My examples don't use a shorthand notation because of varying requests to split open in to multiple parameters.

Anticipated exponential change requests are the reasons I haven't touched locking or operating system specific arguments. I'm trying to get the basic features through then add the rest when there's less to argue over.

I think at this point, the various API proposals need to be collected up, with usage examples, and then voted on.

shorthand enum notation work with multiple enums?

Depends on the type signature...

The reason for having multiple enums is to enforce one of :read, :write or :rdwr. Since we've decided that's not neccesary, one flags enum is fine.

I'll leave this for 24h before working on a more robust implementation, and I'd appreciate :+1: on this comment to indicate people are happy with the API being:

  • a single enum
  • File.open(name : String) for read-only
  • File.open(name : String, *flags : File::Flags) for other uses
  • keep the "mode string" options but deprecate

What do you plan to do with the existing File::Flags that serves a completely different purpose?

class File
  # Represents the various behaviour-altering flags which can be set on files.
  # Not all flags will be supported on all platforms.
  @[Flags]
  enum Flags : UInt8
    SetUser
    SetGroup
    Sticky
  end

@didactic-drunk whoops, it'll be called File::OpenFlags then.

May I suggest moving or removing File::Flags as it's usefulness is dubious. Maybe rename to File::Permissions::Flags if there's a reason to keep it.

Moved to #8026.

Whilst we're deprecating things - what's the opinion on File.new and File.open? I'd rather have a protected File.new(path, fd), then use File.open with and without a block, since File.new implies you're creating a file, when you're often not.

So I'd suggest deprecating all File.new overloads except the platform-specific one, and then having File.open(filename, mode(s), *, permissions, encoding, invalid) with and without a block.

Final layout:

| Method | Current | PR |
|---------------------------------------------------------------------------------------|---------|-----------------------------------------------------------|
| File.new(path, fd, blocking, encoding, invalid) | private | public? :nodoc:? |
| File.{new,open}(filename, mode : String, perm, encoding, invalid) | public | deprecated, use File.open(filename, File::Mode) |
| File.open(filename, mode : String, perm, encoding, invalid, &block) | public | deprecated, use File.open(filename, File::Mode, &block) |
| File.open(filename, {mode,*modes} : File::Mode, *, perm, encoding, invalid) | | public |
| File.open(filename, {mode,*modes} : File::Mode, *, perm, encoding, invalid, &block) | | public |

I see Any.new as creating a new object, not creating what the object refers to.

If you change the verb syntax on one object should the rest of the language change? Array.alloc,Zip.open, Mysql.connect, Blockchain.idk.

Doesn't it create more cognifitive load by having exceptions? All other objects use .new except File.

.open is generally used when opening an (OS-level) resource, and .new is used everywhere else. I'd like to see .new or .open though, not both.

Just for reference: many Socket classes have both .new and .open, where the latter is called with a block. This might not necessarily apply identically to File, but maybe this is also an option worth considering.

On all accounts, I'd speak against .new with a block. .new is a constructor method, but when called with a block, it doesn't return an instance. Instead the instance is yielded to the block. For this, .open seems better. The non-yielding variant could be .new which would match existing APIs (Socket) and this is IMO a clever solution. Yet, it's an additional method name. So .open for everything is fine, too.

Just for reference: many Socket classes have both .new and .open, where the latter is called with a block. This might not necessarily apply identically to File, but maybe this is also an option worth considering.

This is the approach I've taken mostly based on convention.

On all accounts, I'd speak against .new with a block. .new is a constructor method, but when called with a block, it doesn't return an instance. Instead the instance is yielded to the block.

Isn't that what open does? File.open("test") { 2 } => 2

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TechMagister picture TechMagister  路  3Comments

asterite picture asterite  路  3Comments

Papierkorb picture Papierkorb  路  3Comments

lgphp picture lgphp  路  3Comments

costajob picture costajob  路  3Comments