Edit: Apologies, never mind; it seems this doesn't improve the fetch performance of the first entry like I had thought.
It does not improve the fetch performance of the first entry. That was never the intention. It improves performance of subsequent FindNextFile() calls, possibly by a lot e.g. when reading over network (i.e. a UNC path or a mapped network share).
However, it is only available in Windows 7 and later, and Git for Windows still supports Vista (even if we officially dropped support for XP due to lack of active contributors).
And the most benefit would probably come from adding this to the FSCache feature. So what would make most sense is to have a static version check in the beginning of fsentry_create_list() and then continue either with FindFirstFileW() or with FindFirstFileExW() depending on that version test.
Note: this would make for an excellent first-time contribution.
More information on when to use FIND_FIRST_EX_LARGE_FETCH: https://blogs.msdn.microsoft.com/oldnewthing/20131024-00/?p=2843/
Well the reason I posted this originally was that I thought it would reduce the number of syscalls by at least 1 for since currently FindFirstFile only fetches 1 entry (and that entry is ., which is useless) no matter what. It's not as important to me for subsequent fetches since I'm dealing with large trees but not large directories per se. But if it seems useful and I get a chance to implement this I'll let you know! Currently I'm leaning toward calling NtQueryDirectoryFile directly and skipping the Win32 API altogether so I can reduce the number of syscalls, but we'll see.
if it seems useful and I get a chance to implement this I'll let you know!
Great! I still think this has merit, in particular for compat/win32/fscache.c.
Okay so I just did some tests and it seems this wouldn't make a huge difference like I had expected. Depending on the repo the difference for git status was anywhere from completely unobservable (small/normal repo) to 3% (large-ish repo with a few directories that have a few dozen files) to 8% (kotlin) to 18% (artificial repo with 30k files in a single directory). So it doesn't seem to really pay off enough to be worth it. Also note that my test used NtQueryDirectoryFile to get back a large batch on the first call too, so the improvement would be even less for FIND_FIRST_EX_LARGE_FETCH.
I'll take the 8%... :-) thank you for sharing your research!
You know, I reimplemented this last night and it seems it could be more like 10% if you just use NtQueryDirectoryFile's output directly and skip the step where I converted the results into WIN32_FIND_DATA format. Happy to share the code if you'd like to take copyright/ownership of it. :-) Although I didn't quite do the part regarding the reparse-point tags (I think you need to call FSCTL_GET_REPARSE_POINT for files that have reparse points so you can tell if they're symlinks) so that would need to be added too.
Oh apparently Windows uses a hack where it putsthe ReparseTag into the EaSize field, so no need for FSCTL_GET_REPARSE_POINT... go figure.
/remind me to take a stab at this on Thursday.
@dscho set a reminder for Mar 22nd 2018
Tip for whenever you take a stab at this. I would recommend you start off with a large buffer that your system already fills up as much as possible, and only increase the size if it was actually filled as much as possible. In my tests that meant starting with enough space for 16 FILE_FULL_DIR_INFORMATION structs that each accommodated 256 characters in their file names (I think that's 9312 bytes or so). And then I only increased that buffer size if there wasn't enough leftover room at the end of the buffer for another file with 256 characters in its name. This makes sure you're optimizing both the memory and the number of calls.
:wave: @dscho, take a stab at this .
I had a discovery today, not specifically related to this issue, but perhaps pertinent:
You can get a nice speed boost (> 2脳) by listing directories in a multithreaded fashion. As you might expect, it's not a linear speedup, but with 8 threads on a 4脳2 system I got speedups of around ~4脳, especially when the system already had the directory entries in its cache.
However, I suspect this may depend strongly on the access characteristics of the storage medium... specifically my testing was on a recent SSD. I imagine on a hard drive it may end up worse (or the same, or better)... I really have no idea.
But it's one possible way to dramatically improve I/O speed in certain scenarios, if that becomes necessary later.
Addressed via https://github.com/git-for-windows/git/pull/1908.
Most helpful comment
:wave: @dscho, take a stab at this .