It looks like archives are being downloading when they don't need to be and duplicated in the cache. In this case I've got a large zip file to extract; I don't want to have the contents of this file checked every time state is applied (there are 1000's of files in there) so I'm using if_missing to indicate when the archive is already extracted.
# cat my_program_archive.sls
extract_my_program_essentials:
archive.extracted:
- name: C:\ProgramData\My Program\The Essentials
- source: salt://software_files/my_program/The Essentials.zip
- source_hash: md5=adc5ffd796615c6811f54e6eb1dc660c
- archive_format: zip
- enforce_toplevel: False
- if_missing: C:\ProgramData\My Program\The Essentials\manifest.xml
I've previously applied this state but the cache contents have been now been cleared i.e. I don't have this archive in the cache any longer:
# salt salt-test cmd.run 'dir /s C:\salt\var\cache\*Essentials*'
salt-test:
File Not Found
ERROR: Minions returned with non-zero exit code
...but that's fine because the archive is already extracted.
Bug 1: In test mode I would expect the test to pass, since my if_missing file is already present, but the test results suggest the whole archive will be downloaded again:
# salt salt-test state.apply my_program_archive test=True
salt-test:
ID: extract_my_program_essentials
Function: archive.extracted
Name: C:\ProgramData\My Program\The Essentials
Result: None
Comment: Archive salt://software_files/my_program/The Essentials.zip would be downloaded to cache
When actually applying the state the comment is correct. The manifest exists so it didn't need to do anything:
# salt salt-test state.apply my_program_archive
salt-test:
ID: extract_my_program_essentials
Function: archive.extracted
Name: C:\ProgramData\My Program\The Essentials
Result: True
Comment: C:\ProgramData\My Program\The Essentials\manifest.xml exists
Total run time: 113.092 s
Bug 2: ...but note the run time. It's downloaded the whole thing to the cache again when it didn't need to.
# salt salt-test cmd.run 'dir C:\salt\var\cache\salt\minion\files\base\software_files\my_program\*Essentials*'
salt-test:
Directory of C:\salt\var\cache\salt\minion\files\base\software_files\my_program
26/01/2017 16:46 565,283,690 The Essentials.zip
Bug 3: Worse still, it's duplicated it and hashed it, even though I've provided the hash in the state:
# salt salt-test cmd.run 'dir C:\salt\var\cache\salt\minion\files\base\*Essentials*'
salt-test:
Directory of C:\salt\var\cache\salt\minion\files\base
26/01/2017 16:46 565,283,690 _my_program_The Essentials.zip
26/01/2017 16:46 37 _my_program_The Essentials.zip.hash
I don't know why the archive would have to be duplicated, why it choose to place these files in the base folder (seems a strange location, are they meant to be temporary?), or why the archive had to be re-hashed (the hash value in the file is the same md5 value that I supplied in the state).
Tested on 2016.11.1
Thanks for reporting this. I have marked this as a feature request.
It looks like we only check the if_missing after checking __opts__['test'] is True, so that is the reason that it is not reporting what you would expect. And it just isn't checking the is_missing earlier than the archive is downloaded.
Thanks,
Daniel
Is there any known workaround for this issue? For my use-case the source is a 40Mb file at the other end of a VPN tunnel so the performance hit is huge.
Could we use "unless: ls /usr/bin/somefile" or some other approach here until this bug is fixed?
I would use 'test -f /use/bin/file' but yeah that is the only work around I
know of
On Fri, Jun 23, 2017 at 1:53 AM Ari Maniatis notifications@github.com
wrote:
Is there any known workaround for this issue? For my use-case the source
is a 40Mb file at the other end of a VPN tunnel so the performance hit is
huge.
Could we use "unless: ls /usr/bin/somefile" or some other approach here
until this bug is fixed?—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/saltstack/salt/issues/38971#issuecomment-310578482,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAssoS7QeT_BnbryMQCU5uyYgIp2n5Z0ks5sG1LigaJpZM4Lu7xH
.
I just stumbled over this while investigating continued high load on some servers after a much needed Salt update.
I'm sorry, but this is not a "feature request". It's either an unwanted bug (unintended breakage of functionality which was previously working) or a breaking change (intentionally breaking backwards compatibility).
Before Salt 2016.11.* archive.extracted would download remote archives only, and only if if_missing didn't match an existing file. This functionality was changed, as the documentation states for 2016.11 to enable the use-case for people to download and extract archives without having to know already what's in them. However, this has now broken the previous functionality, of "not downloading and extracting a file if it's been downloaded and extracted before".
This impacts me insofar as I have now sent thousands of unnecessary HTTP requests to Hashicorp downloading consul, consul-template, terraform and vault from their servers only for Salt to then realize that it didn't need them.
I would also encourage a rethink of the hashing functionality, as in my opinion Salt shouldn't redownload and rehash archives which can be hundreds of megabytes or gigabytes in size just to figure out that the hash does not differ from the source_hash on every highstate run. Not to mention that rehashing these archives creates massive load on the minions. That's really bad design.
I think the correct way to implement this is:
source_hash. It can keep the last hash for the state in a cache and compare it to the contents of a hash file URL on the server if one is provided.if_missing and if if_missing is provided and the target file exists, it should do nothing.if_missing is not provided or the target file does not exist, Salt should download the archive, compare its hash to source_hash and extract it.If a user doesn't know what the contents of an archive are, but wants it extracted whenever the archive changes, the right way is to use a source hash file. Additionally, if Salt caches the downloaded archives hash somewhere, it could presumably also store the correct HTTP headers (etag, if-modified-since), which would be an even better implementation. Unfortunately I caught this bug so late on my end that I assume a revert to the old behavior is now impossible in Saltstack itself, but I would love this bug to get a higher priority and be fixed with some syntax that makes archive.extracted compatible with its previous behavior.
Sidenote: I worked around this behavior by adding unless: test -f [if_missing path] to all archive.extracted states in my Salt repositories as mentioned above my @gtmanfred.
@morganwillcock I've opened #43518 to fix this.
Most helpful comment
I just stumbled over this while investigating continued high load on some servers after a much needed Salt update.
I'm sorry, but this is not a "feature request". It's either an unwanted bug (unintended breakage of functionality which was previously working) or a breaking change (intentionally breaking backwards compatibility).
Before Salt 2016.11.*
archive.extractedwould download remote archives only, and only ifif_missingdidn't match an existing file. This functionality was changed, as the documentation states for 2016.11 to enable the use-case for people to download and extract archives without having to know already what's in them. However, this has now broken the previous functionality, of "not downloading and extracting a file if it's been downloaded and extracted before".This impacts me insofar as I have now sent thousands of unnecessary HTTP requests to Hashicorp downloading consul, consul-template, terraform and vault from their servers only for Salt to then realize that it didn't need them.
I would also encourage a rethink of the hashing functionality, as in my opinion Salt shouldn't redownload and rehash archives which can be hundreds of megabytes or gigabytes in size just to figure out that the hash does not differ from the
source_hashon every highstate run. Not to mention that rehashing these archives creates massive load on the minions. That's really bad design.I think the correct way to implement this is:
source_hash. It can keep the last hash for the state in a cache and compare it to the contents of a hash file URL on the server if one is provided.if_missingand ifif_missingis provided and the target file exists, it should do nothing.if_missingis not provided or the target file does not exist, Salt should download the archive, compare its hash tosource_hashand extract it.If a user doesn't know what the contents of an archive are, but wants it extracted whenever the archive changes, the right way is to use a source hash file. Additionally, if Salt caches the downloaded archives hash somewhere, it could presumably also store the correct HTTP headers (
etag,if-modified-since), which would be an even better implementation. Unfortunately I caught this bug so late on my end that I assume a revert to the old behavior is now impossible in Saltstack itself, but I would love this bug to get a higher priority and be fixed with some syntax that makesarchive.extractedcompatible with its previous behavior.Sidenote: I worked around this behavior by adding
unless: test -f [if_missing path]to allarchive.extractedstates in my Salt repositories as mentioned above my @gtmanfred.