Salt: archive.extracted: lots of unnecessary file transferring, copying, and hashing

Created on 26 Jan 2017  Â·  5Comments  Â·  Source: saltstack/salt

It looks like archives are being downloading when they don't need to be and duplicated in the cache. In this case I've got a large zip file to extract; I don't want to have the contents of this file checked every time state is applied (there are 1000's of files in there) so I'm using if_missing to indicate when the archive is already extracted.

# cat my_program_archive.sls

extract_my_program_essentials:
  archive.extracted:
    - name: C:\ProgramData\My Program\The Essentials
    - source: salt://software_files/my_program/The Essentials.zip
    - source_hash: md5=adc5ffd796615c6811f54e6eb1dc660c
    - archive_format: zip
    - enforce_toplevel: False
    - if_missing: C:\ProgramData\My Program\The Essentials\manifest.xml
I've previously applied this state but the cache contents have been now been cleared i.e. I don't have this archive in the cache any longer:
# salt salt-test cmd.run 'dir /s C:\salt\var\cache\*Essentials*'
salt-test:
    File Not Found
ERROR: Minions returned with non-zero exit code

...but that's fine because the archive is already extracted.

Bug 1: In test mode I would expect the test to pass, since my if_missing file is already present, but the test results suggest the whole archive will be downloaded again:

# salt salt-test state.apply my_program_archive test=True

salt-test:
          ID: extract_my_program_essentials
    Function: archive.extracted
        Name: C:\ProgramData\My Program\The Essentials
      Result: None
     Comment: Archive salt://software_files/my_program/The Essentials.zip would be downloaded to cache

When actually applying the state the comment is correct. The manifest exists so it didn't need to do anything:

# salt salt-test state.apply my_program_archive

salt-test:
          ID: extract_my_program_essentials
    Function: archive.extracted
        Name: C:\ProgramData\My Program\The Essentials
      Result: True
     Comment: C:\ProgramData\My Program\The Essentials\manifest.xml exists

Total run time: 113.092 s

Bug 2: ...but note the run time. It's downloaded the whole thing to the cache again when it didn't need to.

# salt salt-test cmd.run 'dir C:\salt\var\cache\salt\minion\files\base\software_files\my_program\*Essentials*'

salt-test:
     Directory of C:\salt\var\cache\salt\minion\files\base\software_files\my_program   
    26/01/2017  16:46       565,283,690 The Essentials.zip

Bug 3: Worse still, it's duplicated it and hashed it, even though I've provided the hash in the state:

# salt salt-test cmd.run 'dir C:\salt\var\cache\salt\minion\files\base\*Essentials*'

salt-test:
     Directory of C:\salt\var\cache\salt\minion\files\base
    26/01/2017  16:46       565,283,690 _my_program_The Essentials.zip
    26/01/2017  16:46                37 _my_program_The Essentials.zip.hash

I don't know why the archive would have to be duplicated, why it choose to place these files in the base folder (seems a strange location, are they meant to be temporary?), or why the archive had to be re-hashed (the hash value in the file is the same md5 value that I supplied in the state).

Tested on 2016.11.1

Bug State Module ZRELEASED - 2017.7.3

Most helpful comment

I just stumbled over this while investigating continued high load on some servers after a much needed Salt update.

I'm sorry, but this is not a "feature request". It's either an unwanted bug (unintended breakage of functionality which was previously working) or a breaking change (intentionally breaking backwards compatibility).

Before Salt 2016.11.* archive.extracted would download remote archives only, and only if if_missing didn't match an existing file. This functionality was changed, as the documentation states for 2016.11 to enable the use-case for people to download and extract archives without having to know already what's in them. However, this has now broken the previous functionality, of "not downloading and extracting a file if it's been downloaded and extracted before".

This impacts me insofar as I have now sent thousands of unnecessary HTTP requests to Hashicorp downloading consul, consul-template, terraform and vault from their servers only for Salt to then realize that it didn't need them.

I would also encourage a rethink of the hashing functionality, as in my opinion Salt shouldn't redownload and rehash archives which can be hundreds of megabytes or gigabytes in size just to figure out that the hash does not differ from the source_hash on every highstate run. Not to mention that rehashing these archives creates massive load on the minions. That's really bad design.

I think the correct way to implement this is:

  • Salt should never download and rehash a file to compare to source_hash. It can keep the last hash for the state in a cache and compare it to the contents of a hash file URL on the server if one is provided.
  • If no remote hash file is provided, Salt should check if_missing and if if_missing is provided and the target file exists, it should do nothing.
  • If no remote hash file is provided and if_missing is not provided or the target file does not exist, Salt should download the archive, compare its hash to source_hash and extract it.

If a user doesn't know what the contents of an archive are, but wants it extracted whenever the archive changes, the right way is to use a source hash file. Additionally, if Salt caches the downloaded archives hash somewhere, it could presumably also store the correct HTTP headers (etag, if-modified-since), which would be an even better implementation. Unfortunately I caught this bug so late on my end that I assume a revert to the old behavior is now impossible in Saltstack itself, but I would love this bug to get a higher priority and be fixed with some syntax that makes archive.extracted compatible with its previous behavior.


Sidenote: I worked around this behavior by adding unless: test -f [if_missing path] to all archive.extracted states in my Salt repositories as mentioned above my @gtmanfred.

All 5 comments

Thanks for reporting this. I have marked this as a feature request.

It looks like we only check the if_missing after checking __opts__['test'] is True, so that is the reason that it is not reporting what you would expect. And it just isn't checking the is_missing earlier than the archive is downloaded.

Thanks,
Daniel

Is there any known workaround for this issue? For my use-case the source is a 40Mb file at the other end of a VPN tunnel so the performance hit is huge.
Could we use "unless: ls /usr/bin/somefile" or some other approach here until this bug is fixed?

I would use 'test -f /use/bin/file' but yeah that is the only work around I
know of
On Fri, Jun 23, 2017 at 1:53 AM Ari Maniatis notifications@github.com
wrote:

Is there any known workaround for this issue? For my use-case the source
is a 40Mb file at the other end of a VPN tunnel so the performance hit is
huge.
Could we use "unless: ls /usr/bin/somefile" or some other approach here
until this bug is fixed?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/saltstack/salt/issues/38971#issuecomment-310578482,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAssoS7QeT_BnbryMQCU5uyYgIp2n5Z0ks5sG1LigaJpZM4Lu7xH
.

I just stumbled over this while investigating continued high load on some servers after a much needed Salt update.

I'm sorry, but this is not a "feature request". It's either an unwanted bug (unintended breakage of functionality which was previously working) or a breaking change (intentionally breaking backwards compatibility).

Before Salt 2016.11.* archive.extracted would download remote archives only, and only if if_missing didn't match an existing file. This functionality was changed, as the documentation states for 2016.11 to enable the use-case for people to download and extract archives without having to know already what's in them. However, this has now broken the previous functionality, of "not downloading and extracting a file if it's been downloaded and extracted before".

This impacts me insofar as I have now sent thousands of unnecessary HTTP requests to Hashicorp downloading consul, consul-template, terraform and vault from their servers only for Salt to then realize that it didn't need them.

I would also encourage a rethink of the hashing functionality, as in my opinion Salt shouldn't redownload and rehash archives which can be hundreds of megabytes or gigabytes in size just to figure out that the hash does not differ from the source_hash on every highstate run. Not to mention that rehashing these archives creates massive load on the minions. That's really bad design.

I think the correct way to implement this is:

  • Salt should never download and rehash a file to compare to source_hash. It can keep the last hash for the state in a cache and compare it to the contents of a hash file URL on the server if one is provided.
  • If no remote hash file is provided, Salt should check if_missing and if if_missing is provided and the target file exists, it should do nothing.
  • If no remote hash file is provided and if_missing is not provided or the target file does not exist, Salt should download the archive, compare its hash to source_hash and extract it.

If a user doesn't know what the contents of an archive are, but wants it extracted whenever the archive changes, the right way is to use a source hash file. Additionally, if Salt caches the downloaded archives hash somewhere, it could presumably also store the correct HTTP headers (etag, if-modified-since), which would be an even better implementation. Unfortunately I caught this bug so late on my end that I assume a revert to the old behavior is now impossible in Saltstack itself, but I would love this bug to get a higher priority and be fixed with some syntax that makes archive.extracted compatible with its previous behavior.


Sidenote: I worked around this behavior by adding unless: test -f [if_missing path] to all archive.extracted states in my Salt repositories as mentioned above my @gtmanfred.

@morganwillcock I've opened #43518 to fix this.

Was this page helpful?
0 / 5 - 0 ratings