Crystal: Consider making File.join handle absolute paths

Created on 3 Apr 2018  路  22Comments  路  Source: crystal-lang/crystal

In Crystal 0.24.2, File.join doesn't handle absolute paths properly. Take the following example:

File.join("any/path", "/absolute/path")

That expression currently evaluates to "any/path/absolute/path", when the expected value would be "/absolute/path".

One argument against this is that Ruby's File.join also does not handle absolute paths - but every other high-level programming language I've used does (see Python's behavior, for example). Ruby's behaviour here is somewhat infamously cumbersome, so I'm certain it'd be an overall positive change to make.

Most helpful comment

It's funny, because I have the opposite expectation: File.join is joining many paths, not expanding paths. From your example I expect the result to be any/path/absolute/path.

Futhermore, having File.join(a, b) become an absolute path because b starts with a directory separator, but I expected the joined path to always be under a can easily lead to security issues.

I believe the current behavior is a correct, expected and safer behavior by default.

All 22 comments

It's funny, because I have the opposite expectation: File.join is joining many paths, not expanding paths. From your example I expect the result to be any/path/absolute/path.

Futhermore, having File.join(a, b) become an absolute path because b starts with a directory separator, but I expected the joined path to always be under a can easily lead to security issues.

I believe the current behavior is a correct, expected and safer behavior by default.

File.join is joining many paths, not expanding paths.

I agree - File.join("dir", "../another_dir") would still return "dir/../another_dir". This isn't about expanding paths, though: File.expand_path can't handle this, since it's a question of joining two or more paths, and not operating on an already existing one.

When running cd /x/y/z in your shell of choice, it doesn't take you to ./x/y/z. It takes you to /x/y/z, because joining a path (in that case, your CWD) with an absolute one makes the absolute path take precedent. It's expected behavior in most contexts, and Crystal not having a simple way to handle the normally expected path joining behavior is annoying at best.

I don't agree. File.join should be dumb and simple to understand. I'm certain you should be using expand here.

@RX14 So what's your recommended method to join a path with another, possibly absolute path using expand_path?

I should mention, also, that the approach I'm talking about is indeed very simple to understand - it mirrors every cd you've ever done. If you're joining with an absolute path, that absolute path is used, otherwise you get the relative behavior. No fuss.

We don't provide anything for that. You'll have to come up with your solution. Possibly iterating paths until an absolute one is found and start joining there. For example:

def join_paths(*args)
  (args.size - 1).downto(1) do |i|
    if args[i].starts_with?(File::SEPARATOR)
      return File.join(args.to_a[i..-1])
    end
  end
  File.join(args)
end

p join_paths("dir", "sub/path")                # => "dir/sub/path"
p join_paths("dir", "/absolute", "sub/path")   # => "/absolute/sub/path"
p join_paths("dir", "/whoops", "sub", "/abso") # => "/abso"

If all you need is cd like:

def cd(path)
  if path.starts_with?(File::SEPARATOR)
    Dir.cd(path)
  else
    Dir.cd(File.join(Dir.current, path))
  end
end

I.e. expecting one path to be an absolute path and act upon it in the best manner. This is better than forgetting to sanitize a File.join input that includes a maliciously introduced absolute path and silently accept it.

Shouldn't there be a way to join paths intelligently in the stdlib, then?

After all, there's a File.join that only joins strings with File.SEPARATOR, and there's an expand_path that expands special names like .. - to complement those, there should be a way to join paths like paths usually work. To only have ways to handle part of common path resolution behavior (special names but not join semantics) is mighty odd. Picture a separate method named File.coalesce or similar (forgive the silly name idea), which then also handles drive letters when Crystal supports Windows.

If you can come up with a good, reliable, well named solution, that serves a variety of use cases, then please propose. Yet, from my point of view, this is an edge case, closely tied to use cases (e.g. cd will join the current dir iff the path argument isn't absolute), and I don't think the stdlib should provide a facility for it.

It's absolutely not an edge case, seeing how many languages (Python, Rust, C#, C++17, Haskell...) implement it as default. It comes in handy whenever you want the user to be able to provide a path that's either relative (to CWD or another directory, e.g. an include path) or absolute in an intuitive way, which is often in my experience.

How is expand_path with dir: Dir.current not the same as cd?

$ crystal eval 'p File.expand_path("/foo", dir: Dir.current)'
"/foo"

$ crystal eval 'p File.expand_path("foo", dir: Dir.current)'
"/home/rx14/foo"

Oh hey, there's a dir argument that handles absolute paths! That's useful - thanks. However, it also makes the path both absolute and canonical, which isn't always desired.

@obskyr if you could actually provide a usecase for which expand_path doesn't work it'd help greatly.

@RX14 Sure - whenever you want a relative path if possible. Off the top of my head, this can occur in two main categories of cases:

  • You want to display a path to the user. Having an easily readable path is desirable for debug messages, generating config files, generating file lists...
  • Your paths are going to be used on another system. One typical example of this is git repos: if you're generating paths and commiting them, you definitely want paths to in-repo things to be relative.

In my particular use case that spurred this discussion, I'm generating Makefile dependencies which I'd like to be both readable and reusable on another system. While absolute dependencies (in /lib, for example) should be absolute, paths to local dependencies need to be relative. If they weren't, they would be neither readable nor reusable.

Surely a better fix there would be Path#relative_to once we get #5550.

That wouldn't help at all - the entire point is you don't want a relative path if the path you're joining with is absolute.

Well then you're "stuck" using the 5 line solution @ysbaddaden posted above. It seems a lot nicer than making a complex method to do exactly what you want inside the stdlib.

Except it isn't complex. Both readable and reusable paths are common use cases, and the path joining behavior is commonly used and easy to understand. The Python stdlib's implementation is 17 lines, and that's with boilerplate and error handling - that's not complex at all.

I have to stress, once again: Python, Rust, C#, C++17, and Haskell all do this. It's not complex, it's not uncommon, and its definitely not just "exactly what I want".

Well, we disagree. We don't want to change the current behavior of File.join.

I'm probably stupid and confused but your use cases seem like a perfect fit for File.expand_path. Please try to deal with this constraint in your design for the time being 鈥攐r reimplement a custom method that suits you.

Welp. If I have the time and inclination, I'll put together a proper proposal for a new path joining function, with details on expected behavior and examples where neither File.join nor File.expand_path can be used. I'll close the issue of changing the existing method, though.

I don't think it makes sense to change the behaviour of File.join or add an additional method for this. File.join is very simple, it lexically concats individual components to a normalized path. A component starting with a separator doesn't mean it's an absolute path, because it is just interpreted as a component. This is useful for joining paths which start with a separator to designate them as being relative to a project root.

You propose a behaviour similar to calling cd multiple times where an argument as relative path traverses the path from the current directory and an absolute path changes the current directory to that path.

This can be implemented quite easily by calling File.expand_path for each component:

components = ["any/path", "/absolute/path"]
components.reduce(".") { |path, component| File.expand_path(component, dir: path) } # => "/absolute/path"

File.join is very simple, it lexically concats individual components to a normalized path.

That's not true - File.join does no normalization at all beyond eliminating multiple separators between components. Except that one thing, it's just like calling String#join with File.SEPARATOR.

You propose a behaviour similar to calling cd multiple times where an argument as relative path traverses the path from the current directory and an absolute path changes the current directory to that path.

This can be implemented quite easily by calling File.expand_path for each component:

There seems to be a bit of a misunderstanding - that's not a solution, as that'll always result in an absolute path. Joining any/path and relative/path should result in a relative path. If you haven't yet, have a look at the docs for the path joining behavior in any of the languages I mentioned before, and I think it might become clearer.

That's not true - File.join does no normalization at all beyond eliminating multiple separators between components.

Well, it does normalize then, doesn't it?

Turning a resulting absolute path into a relative one shouldn't be an issue, really. In my above example you can just use a custom, unique path as starting point instead of "." and lchop it off at the end.

Was this page helpful?
0 / 5 - 0 ratings