Rustup: Optimize disk access during install

Created on 6 Jan 2017  路  10Comments  路  Source: rust-lang/rustup

Installation is noticeably slow. I believe there is redundant disk work during installation, associated with the transaction machinery.

enhancement help wanted performance

All 10 comments

I don't specifically know what is inefficient here, but somebody could do some stracing on Linux and see if there's anything obvious.

stat appears to get called 3 times for each file in set_file_perms:

This could probably be reduced to just one stat call.

Given that the majority of filesystem operations which Rustup performs are reading and writing large (10s of MBs) datasets, I think the stat() inefficiency is by no means the most important thing to consider for speed.

I do, however, think that we should look at consider if there's anything we can do to improve install performance on Windows.

Installation on Linux filesystems is fast even on HDD so measuring performance there won't help much for Windows but still it'd be nice to make it even faster.

I don't know which one is more limiting NTFS or AV software but shipping less files would be the way to go here.
Nightly on my Linux machine consists of 17957 files, 16617 of them docs. It'd be significant improvement for Windows if they could be fused together into few bigger files.

If I understand correctly, the set_file_perms function walks the entire tree of files after they have been extracted to ensure their permissions are correct. This is especially costly in the docs, since there are a tremendous number of files there as @mati865 points out. I would expect tens of thousands of fewer system calls in an installation operation to be noticeable, especially e.g. on NFS.

I've done some basic analysis on Windows 10 Surface; since OS specific code paths can make Linux traces (while convenient to grab) misleading :). My test setup had Windows Defender running normally; controlled folder access was off, no defender exclusions defined.

The high level view is that for docs we spend:
2m20s extracting ~16K files
30 seconds copying from staging to final dir
26 seconds deleting from staging dir

sidebar: We are quite chatty in the OS syscalls we make - many extra opens than needed, but looking at the timing, that should be considered a microoptimisation, not a first-pass target. See the trace excerpts below for examples.

What should we be able to expect without getting into hairy levels of OS specific integration?
A trivial Python program can write a similar sized data set in 60seconds, and the archive should be in hot cache, and is tiny compared to RAM these days so - that should be a decent target.


Python benchmark

import os
import shutil
import time

now = time.time()

os.mkdir("root")
os.chdir("root")
written = 0
for dirname in range(80):
    os.mkdir("%s" % dirname)
    for fname in range(200):
        with open("%s/%s"%(dirname, fname), "wt") as f:
            written += f.write(("this is a small file %s\n" % fname)*512)

finish = time.time()
os.chdir("..")
shutil.rmtree("root")

print("created 16000 files in 80 directories in %0.2fs with %s bytes" % (finish-now, written))

Total time right now is 3m20s, of which 60 seconds is entirely dealing with making an unused copy of the content: the staging process is:

  1. unpack the archive to a temp directory
  2. copy from the temp directory into the final directory
  3. delete the temp directory contents

So we can shave a minute off if we unpack directly into the final directory. Would that be safe? If the archive has been validated during download, so we know its contents are intact, then there are no additional IO errors that can occur unpacking directly:

  • the archive, just like the temp storage, is a file on local disk
  • the output directories are being made and validated as we go
  • because we are making copies, we can already run out of disk space mid-operation

Alternatively, we could unpack to a staging directory but then rename into the final directory - that would require double-touching each file, but ensure we could fit all the files on disk before placing any of them into the target location (if thats important - I don't think it is :)).

That would still leave 2m20s to unpack vs 60s for a trivial driver; some of that may be decompression overhead, IO from the source archive etc- I don't have a breakdown for that yet.


Trace excerpts

7:23:52.10: extraction starts
7:24:52.8898672 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc   SUCCESS Desired Access: Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
7:24:52.8898941 AM  rustup.exe  4316    QueryInformationVolume  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc   BUFFER OVERFLOW VolumeCreationTime: 15/04/2017 2:56:45 PM, VolumeSerialNumber: 92EB-4520, SupportsObjects: True, VolumeLabel: Loc艢
7:24:52.8899057 AM  rustup.exe  4316    QueryAllInformationFile C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc   BUFFER OVERFLOW CreationTime: 9/04/2019 7:23:59 AM, LastAccessTime: 9/04/2019 7:24:31 AM, LastWriteTime: 9/04/2019 7:24:31 AM, ChangeTime: 9/04/2019 7:24:31 AM, FileAttributes: D, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: True, IndexNumber: 0x23000000062484, EaSize: 0, Access: Read Attributes, Synchronize, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Long
7:24:52.8899204 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc   SUCCESS 
7:24:52.8900168 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  NAME NOT FOUND  Desired Access: Read Attributes, Delete, Disposition: Open, Options: Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a
7:24:52.8901162 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: 0, OpenResult: Created
7:24:52.8903866 AM  rustup.exe  4316    WriteFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Offset: 0, Length: 8,192, Priority: Normal
7:24:52.8905696 AM  rustup.exe  4316    WriteFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Offset: 8,192, Length: 7,309, Priority: Normal
7:24:52.8906167 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS 
7:24:52.8973734 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
7:24:52.8974230 AM  rustup.exe  4316    SetBasicInformationFile C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS CreationTime: 1/01/1601 12:00:00 PM, LastAccessTime: 8/04/2019 7:34:21 AM, LastWriteTime: 8/04/2019 7:34:21 AM, ChangeTime: 1/01/1601 12:00:00 PM, FileAttributes: n/a
7:24:52.8974922 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS 
7:24:52.8976601 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Desired Access: Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
7:24:52.8976873 AM  rustup.exe  4316    QueryInformationVolume  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  BUFFER OVERFLOW VolumeCreationTime: 15/04/2017 2:56:45 PM, VolumeSerialNumber: 92EB-4520, SupportsObjects: True, VolumeLabel: Loc艢
7:24:52.8977007 AM  rustup.exe  4316    QueryAllInformationFile C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  BUFFER OVERFLOW CreationTime: 9/04/2019 7:24:52 AM, LastAccessTime: 8/04/2019 7:34:21 AM, LastWriteTime: 8/04/2019 7:34:21 AM, ChangeTime: 9/04/2019 7:24:52 AM, FileAttributes: A, AllocationSize: 16,384, EndOfFile: 15,501, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x14000000064f02, EaSize: 0, Access: Read Attributes, Synchronize, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Long
7:24:52.8977145 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS 
7:24:52.8978130 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Desired Access: Write Attributes, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
7:24:52.8978458 AM  rustup.exe  4316    SetBasicInformationFile C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS CreationTime: 1/01/1601 12:00:00 PM, LastAccessTime: 1/01/1601 12:00:00 PM, LastWriteTime: 1/01/1601 12:00:00 PM, ChangeTime: 1/01/1601 12:00:00 PM, FileAttributes: AN
7:24:52.8978630 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS 

7:26:12.18 extraction finishes
-> 30 seconds of copy from staging dir to final dir:
7:26:12.1927218 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc    SUCCESS Desired Access: Read Data/List Directory, Synchronize, Disposition: Create, Options: Directory, Synchronous IO Non-Alert, Open Reparse Point, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
7:26:12.1928832 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc    SUCCESS 
7:26:12.1930491 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc   SUCCESS Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
7:26:12.1930788 AM  rustup.exe  4316    QueryDirectory  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\* SUCCESS Filter: *, 1: .
7:26:12.1931110 AM  rustup.exe  4316    QueryDirectory  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc   SUCCESS 0: .., 1: all.html, 2: alloc, 3: borrow, 4: boxed, 5: collections, 6: fmt, 7: format.m.html, 8: index.html, 9: macro.format!.html, 10: macro.format.html, 11: macro.vec!.html, 12: macro.vec.html, 13: prelude, 14: rc, 15: sidebar-items.js, 16: slice, 17: str, 18: string, 19: sync, 20: vec, 21: vec.m.html
7:26:12.1932216 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Desired Access: Generic Read, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Non-Directory File, Open Reparse Point, Attributes: n/a, ShareMode: Read, Delete, AllocationSize: n/a, OpenResult: Opened
7:26:12.1932645 AM  rustup.exe  4316    QueryAttributeTagFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Attributes: A, ReparseTag: 0x0
7:26:12.1932783 AM  rustup.exe  4316    QueryStandardInformationFile    C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS AllocationSize: 16,384, EndOfFile: 15,501, NumberOfLinks: 1, DeletePending: False, Directory: False
7:26:12.1932892 AM  rustup.exe  4316    QueryBasicInformationFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS CreationTime: 9/04/2019 7:24:52 AM, LastAccessTime: 8/04/2019 7:34:21 AM, LastWriteTime: 8/04/2019 7:34:21 AM, ChangeTime: 9/04/2019 7:24:52 AM, FileAttributes: A
7:26:12.1933018 AM  rustup.exe  4316    QueryStreamInformationFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS 0: ::$DATA
7:26:12.1933412 AM  rustup.exe  4316    QueryBasicInformationFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS CreationTime: 9/04/2019 7:24:52 AM, LastAccessTime: 8/04/2019 7:34:21 AM, LastWriteTime: 8/04/2019 7:34:21 AM, ChangeTime: 9/04/2019 7:24:52 AM, FileAttributes: A
7:26:12.1933611 AM  rustup.exe  4316    QueryEaInformationFile  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS EaSize: 0
7:26:12.1934607 AM  rustup.exe  4316    CreateFile  C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS Desired Access: Generic Read/Write, Delete, Write DAC, Disposition: OverwriteIf, Options: Sequential Access, Synchronous IO Non-Alert, Non-Directory File, Attributes: A, ShareMode: None, AllocationSize: 15,501, OpenResult: Created
7:26:12.1936583 AM  rustup.exe  4316    QueryAttributeInformationVolume C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS FileSystemAttributes: Case Preserved, Case Sensitive, Unicode, ACLs, Compression, Named Streams, EFS, Object IDs, Reparse Points, Sparse Files, Quotas, Transactions, 0x3c00600, MaximumComponentNameLength: 255, FileSystemName: NTFS
7:26:12.1936750 AM  rustup.exe  4316    QueryBasicInformationFile   C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS CreationTime: 9/04/2019 7:26:12 AM, LastAccessTime: 9/04/2019 7:26:12 AM, LastWriteTime: 9/04/2019 7:26:12 AM, ChangeTime: 9/04/2019 7:26:12 AM, FileAttributes: A
7:26:12.1937029 AM  rustup.exe  4316    QueryAttributeInformationVolume C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS FileSystemAttributes: Case Preserved, Case Sensitive, Unicode, ACLs, Compression, Named Streams, EFS, Object IDs, Reparse Points, Sparse Files, Quotas, Transactions, 0x3c00600, MaximumComponentNameLength: 255, FileSystemName: NTFS
7:26:12.1937204 AM  rustup.exe  4316    QueryRemoteProtocolInformation  C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  INVALID PARAMETER   
7:26:12.1937353 AM  rustup.exe  4316    QuerySecurityFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Information: Attribute
7:26:12.1937534 AM  rustup.exe  4316    SetEndOfFileInformationFile C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS EndOfFile: 15,501
7:26:12.1938157 AM  rustup.exe  4316    ReadFile    C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS Offset: 0, Length: 15,501, Priority: Normal
7:26:12.1938545 AM  rustup.exe  4316    WriteFile   C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS Offset: 0, Length: 15,501, Priority: Normal
7:26:12.1939425 AM  rustup.exe  4316    SetBasicInformationFile C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS CreationTime: 1/01/1601 12:00:00 PM, LastAccessTime: 1/01/1601 12:00:00 PM, LastWriteTime: 8/04/2019 7:34:21 AM, ChangeTime: 9/04/2019 7:24:52 AM, FileAttributes: n/a
7:26:12.1939951 AM  rustup.exe  4316    QueryRemoteProtocolInformation  C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   INVALID PARAMETER   
7:26:12.1940088 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\all.html   SUCCESS 
7:26:12.1949163 AM  rustup.exe  4316    CloseFile   C:\Users\robertc\.rustup\tmp\mvrji9wpubejlhg2_dir\rust-docs\share\doc\rust\html\alloc\all.html  SUCCESS 

7:26:45.95 deletes start
7:27:11.35 deletes finish
-> 26 seconds deleting the staged package

(I will happily mail a tarball of the full trace in PMLformat to anyone that wants it)

Codewise this is layered like so:

  • TarXz/GzPackage on TarPackage on DirectoryPackage

    • this is where the double copy comes on: the code assumes the temporary copy of the extracted tar is sancrosanct because installing a directory shouldn't mutate the source directory.

Current performance quick metrics:
With defender on:
my perf script above: 1m8s
rustup master release build: 3m26
stat-per-dir rather than stat-per-file: 3m14s
8M IO buffer on the tar: 3m14s
less stats + buffer - 3m11

With a defender exclusion on ~/.rustup:
my perf script: 24s
rustup master release build: 1m6s
stat-per-dir: 1m3
8M IO buffer on the tar: 50s
less stats + buffer - 36s

So there is still a lot of fat we should be able to cut; I have some experiments (see above) but nothing that is as definitive as halving the IO as my first patch did :). The benchmarks are pretty noisy too :/.

A variable I have to isolate yet is the windows search index; that is also possibly double / triple working (once on initial write, once on the rename, and then removing from the index at the rename step as well). Setting the indexing service to pause reduces load while testing, but I haven't run a set of benchmarks with it enabled/off : and when it is enabled it dynamically throttles, so enables is not always enabled, adding complexity in generating robust numbers.

tl;dr: there is hope for more improvements; a lot of syscall noise still remains though changes deeper in e.g. tar and possibly rust itself will be needed to fix that (e.g. setting the mode on a file in windows should be done while it is open - chmod is sugar for fchmod).

Users wanting things to be as fast as possible today should do the following IMO:

  • set an exclusion on their .rustup directory, or exclude rustup.exe; the former is safer than excluding the binary image name, as that would provide a trivial malware vector. Excluding just the toolchains dir is not sufficient; we write to random paths in .rustup/tmp.
  • disable windows indexing service while running rustup; let the index service batch process the new files after they are installed rather than contending. Possibly there are APIs we can call to hint this to the OS; I have not investigated this yet.

I think this ticket should be closed: we're doing create, write, close now for each file, no extra copies. We do do the move-into-place thing but as docs is moved as a top level dir, its a single syscall.

I concur, thank you @rbtcollins

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Stargateur picture Stargateur  路  3Comments

lilred picture lilred  路  3Comments

ickk picture ickk  路  3Comments

fenhl picture fenhl  路  4Comments

davidawad picture davidawad  路  3Comments