I'm pretty sure I get a different ordering of results on occasion when I use the app in the regular manner versus when I pipe it to some other app. (I had a concrete example of this but lost it and now can't find.) This would be the case perhaps if you collected results in parallel but did not force a deterministic output except when gathering it to deliver to a pipe (or vice versa).
If this is the case, it's bad for me since sometimes I pipe the results to other commands, for example, | line 2 to extract the second result, but now the desired item is in a different position.
Please add an option to lock it to deterministic output or enable determinism by default.
If I'm mistaken, please close.
Thank you for the feedback.
I'm pretty sure I get a different ordering of results on occasion when I use the app in the regular manner versus when I pipe it to some other app.
That is entirely possible. The current situation is the following:
--max-buffer-time option. The default value for T is 100 ms (I chose this value such that a human would not notice the short lag).Without any changes, there are two options right now:
--threads 1 or -j1. In this case, the filesystem traversal is done in a depth-first fashion by a single directory-tree walker. Note that you are still left with potential non-determinism that might be coming from the OS/filesystem.Also, would it be an option to simply pipe to sort first?
Consider this:
If you are spawning DFSearches s1, s2, ..., sn, then you can report the results of s1...sj so long as all searches strictly below j have finished returning. For thej-th item you can report intermediate progress as it happens. (This assumes your threads partition the space in some kind of non-interlocking, ordered way.)
--threads 1 well that's no fun
| sort will always block on full result set (for obvious reasons)
--max-buffer-time useful but yes v. hacky
Thank you for your suggestion.
If you are spawning DFSearches s1, s2, ..., sn, then you can report the results of s1...sj so long as all searches strictly below j have finished returning.
Correct me if I'm mistaken, but I don't think that this will help a lot.
Let's say it takes a time T to walk the whole search space sequentially. If we parallelize the search across n threads, each of these small searches will (on average) take the same time t. Ideally, we would have t = T/n, but in practice we will have t > T/n of course. Since we will not be able to divide the search space evenly, some of these searches will be faster than t and some will be slower than t.
So the output sequence will look like this:
s1 -- here, t_1 is the time for s1 to finish (approximately equal to t).This might be a viable solution for searches that have a huge number of results. However, if there are only a few results (on the order of the number of threads or smaller), we will typically have to wait for a time t_1 until we see the first result.
The latter point is why I would not like to make this the default behavior. I'm open to the idea of implementing this as an option flag, but I'm afraid we won't gain too much over fd .. | sort (we would be able to see 1/n-th of the search results earlier).
A solution could be dividing the whole work into M chunks (could be more than N) with each chunk to be sufficiently small, and these chunks are assigned to the N threads in a sequential manner. Although looking difficult, there could be an adaptive algorithm to do this.
What about the following algorithm?
Since we are not doing the parallelism ourselves but are using the Ignore crate to handle it our hands may be relatively tied. The walker will create n worker threads and each worker will pull the directory that it will work on from a queue. The order of directory traversal is non-deterministic like @sharkdp said, and the only way it looks like we can fix that is by processing the data somehow before printing it.
I understand that it may be difficult to implement, but this could be a useful idea for future reference.
but I'm afraid we won't gain too much over
fd .. | sort(we would be able to see 1/n-th of the search results earlier).
Even | sort needs to get the last input to decide the first line to print. So what a parallel algorithm can optimize is to do more (meaningful) sorting while waiting for more input.
Further, the directory traversal order does help to determine what can be printed before the rest are sorted. | sort cannot acquire this information.
@sharkdp I went back and reread the code for the Ignore package and that buffer I spoke about in my last comment is filled with paths that come from either the initial constructor of Walker or from the .add method. That means that when more than one thread is specified but only one path is provided then only one thread will actually search that path and the rest are unused. That means the buffer in the output thread should be filled in a deterministic way assuming we in fact use it like a fifo queue.
See source code for run:
for path in self.paths {
let dent =
if path == Path::new("-") {
DirEntry::new_stdin()
} else {
match DirEntryRaw::from_link(0, path) {
Ok(dent) => DirEntry::new_raw(dent, None),
Err(err) => {
if f(Err(err)).is_quit() {
return;
}
continue;
}
}
};
queue.push(Message::Work(Work {
dent: dent,
ignore: self.ig_root.clone(),
}));
any_work = true;
}
where paths is a vector with each element coming from either the initial walker constructor or from the .add method.
Each thread corresponds to one worker
for _ in 0..threads {
let worker = Worker {
f: mkf(),
queue: queue.clone(),
quit_now: quit_now.clone(),
is_waiting: false,
is_quitting: false,
num_waiting: num_waiting.clone(),
num_quitting: num_quitting.clone(),
threads: threads,
parents: self.parents,
max_depth: self.max_depth,
max_filesize: self.max_filesize,
follow_links: self.follow_links,
};
handles.push(thread::spawn(|| worker.run()));
}
And each worker pops one item off the queue to work on. Since only one item should be in the queue in the current implementation then only one thread ever does any work no matter how many threads we specify.
Since only one item should be in the queue in the current implementation then only one thread ever does any work no matter how many threads we specify.
No. Notice that workers will constantly push to the shared queue while traversing their current directory:
You're right the workers push directories back to the queue but when only one directory is given there can only be a maximum of one item on the queue at a time.
run_one pushes at most one item back onto the queueAt most one thread is ever doing anything but multiple threads will be spawned waiting for work.
My hunch is that with the right data and processor the computation time teeters on the max buffer time and when output is piped, making run time a bit shorter, the data is sorted and when printed to the screen then it takes a bit longer and output is not sorted.
@sharkdp, Okay I concede threads don't just push the current directory they are working on onto the queue they push the contents of the directory so if a thread has a directory with n items in it n items will be pushed to the queue for other workers to work on.
Disregard my previous comments
@Doxterpepper Sorry for the late reply
they push the contents of the directory so if a thread has a directory with n items in it n items will be pushed to the queue for other workers to work on.
Yes, that's what I also thought :+1:. Should have been more clear.
I'm going to close this (for now), as:
If somebody feels that I'm mistaking, please tell me and we can reopen this ticket.
This has now been implemented with the new -l/--list-details option of fd which uses ls -l in the background to get a deterministic sort order.
This has now been released in fd v8.0.