Nim: [RFC] Walking through a file system

Created on 13 Jan 2018  路  13Comments  路  Source: nim-lang/Nim

import os

type
  WalkFlags* = enum
    noErrors # No raise exceptions

  SpanMode* = enum
    shallow, # Only spans one directory.

    depth,   # Spans the directory in depth-first post-order, i.e. the content of any 
                 # subdirectory is spanned before that subdirectory itself.
                 # Useful e.g. when recursively deleting files.

    breadth  # Spans the directory in depth-first pre-order, i.e. the content of any 
                  # subdirectory is spanned right after that subdirectory itself.
                  # Note that SpanMode.breadth will not result in all directory members occurring 
                  # before any subdirectory members, i.e. it is not true breadth-first traversal.

  DirEntryAttributes* = enum
    ...

  DirEntry* = object
    path*: string
    name*: string
    SysDirEntry: Stat or WIN32_FIND_DATA

proc extension*(dirEntry: DirEntry): string
proc size*(dirEntry: DirEntry): BiggestInt

proc created*(dirEntry: DirEntry): Time
proc modified*(dirEntry: DirEntry): Time
proc accessed*(dirEntry: DirEntry): Time
proc attributes*(dirEntry: DirEntry): DirEntryAttributes

isDir*(dirEntry: DirEntry): bool
isFile*(dirEntry: DirEntry): bool
isSymlink*(dirEntry: DirEntry): bool

iterator walk(dirStart: string,
              dirIncludeMask: string = "*",
              fileIncludeMask: string = "*",
              dirExcludeMask: string = "",
              fileExcludeMask: string = "",
              yieldFilter = {pcFile},
              followFilter = {pcDir},
              spanMode: SpanMode = depth,
              flags = {}): DirEntry =
RFC

Most helpful comment

@Araq

I've never used anything like that.

Because the compiler don't need it? :-) Of course, I exaggerated "a bit".
Well, task: The user has a large number of folders with a images. Need to find a jpg, png and tiff images between 2000 and 2010 years and larger than 2 megabytes. And skip the .git, .svn and .hg folders with all the contents.

There is always the low level wrappers like posix.stat if you want more performance.

Yes, I can to use stat and you can.
But a new user RJPL can not, because he came from Ruby/Java/Python/Lua.
And he came to Nim and also wants more performance.
Why should he learn a low-level API?

All 13 comments

Mostly inspired by D's Phobos.

How to do error control? Would the user have the ability to stop the iteration at certain point?

Would the callback style be accepted? If so, we can have a look at go's filepath.Walk:

func Walk(root string, walkFn WalkFunc) error

Walk walks the file tree rooted at root, calling walkFn for each file or directory in the tree, including root. All errors that arise visiting files and directories are filtered by walkFn. The files are walked in lexical order, which makes the output deterministic but means that for very large directories Walk can be inefficient. Walk does not follow symbolic links.

type WalkFunc func(path string, info os.FileInfo, err error) error

WalkFunc is the type of the function called for each file or directory visited by Walk. The path argument contains the argument to Walk as a prefix; that is, if Walk is called with "dir", which is a directory containing the file "a", the walk function will be called with argument "dir/a". The info argument is the os.FileInfo for the named path.

and user can control error and the whole iteration process by the calkback walkFn's return value for each file or error met:

If there was a problem walking to the file or directory named by path, the incoming error will describe the problem and the function can decide how to handle that error (and Walk will not descend into that directory). If an error is returned, processing stops.

@oskca
This is a draft, let's see what the others think.

Would the callback style be accepted?

Yes, like this: https://github.com/nim-lang/Nim/blob/master/lib/pure/os.nim#L655

You don't even need a template for that :)

@Yardanico
I know.
I meant the principle itself.

I don't see the point. Learn os.nim instead?

@Araq
Frequent usage pattern:

for path in walkXXX(...): 
  if getLastAccessTime(path) > x 
    or getFilePermissions(path) == y 
    and getFileSize(path) > z:
   doSomething(path)

This often leads to unnecessary calls to a file system.
For Windows it's also converting strings.

"Frequent"? I've never used anything like that. There is always the low level wrappers like posix.stat if you want more performance.

@Araq

I've never used anything like that.

Because the compiler don't need it? :-) Of course, I exaggerated "a bit".
Well, task: The user has a large number of folders with a images. Need to find a jpg, png and tiff images between 2000 and 2010 years and larger than 2 megabytes. And skip the .git, .svn and .hg folders with all the contents.

There is always the low level wrappers like posix.stat if you want more performance.

Yes, I can to use stat and you can.
But a new user RJPL can not, because he came from Ruby/Java/Python/Lua.
And he came to Nim and also wants more performance.
Why should he learn a low-level API?

@dom96
Why you assigned RFC label to this issue? :-)

The title of this issue is...

[RFC] Walking through a file system

Because the compiler don't need it?

No, I iterate over directory contents quite a bit within Nim's and my own tools. Yet I have not needed more than os.nim provides. If you need more, first create a nimble package and then when quite some people use it, we can add it to the standard library.

Ok, I will do.

Was this page helpful?
0 / 5 - 0 ratings