Hello there!
I have a question with regards to dependencies and whether transitive dependencies work as intended or perhaps I've stumbled into a bug.
For example, in a terragrunt.hcl creating some compute resources, I have two dependencies defined as such:
dependency "security" {
config_path = "../security"
mock_outputs = {
ecs = "sg-00000000",
access = "sg-11111111"
}
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
skip_outputs = true # This is the interesting part
}
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
private_subnet_ids = ["subnet-00000000", "subnet-111111111"]
}
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}
In the "security" dependency, I have to include the skip_outputs, but only on that block. The terragrunt.hcl for ../security also depends on VPC, so it's essentially a transitive dependency. I have the mock_outputs at every single module, and anything that depends solely on VPC is fine, it's just when vpc and security both come up.
Error message is this:
/vpc/terragrunt.hcl is a dependency of /security/terragrunt.hcl but detected no outputs. Either the target module has not been applied yet, or the module has no outputs. If this is expected, set the skip_outputs flag to true on the dependency block.
So it would seem that when evaluating the dependencies, it isn't able to render the mocks of nested dependencies. By setting the skip_outputs to true on the security dependency as indicated above, it seems to work again.
Curious if this is a bug or working as designed. I couldn't find any examples that really cleared it up for me.
One of my colleagues gave me this interesting workaround that kind of has the best of both worlds:
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
skip_outputs = get_terraform_command() == "validate" || get_terraform_command() == "plan" ? true : false
Because we have some regional infra, and then some infra which is partitioned and linked to the regional, I basically put the skip_outputs on everything since ultimately they all rolled up to some sort of regional infrastructure.
I'm guessing the expected usage is to limit the dependencies or somehow combine them into a more traditional root module since they're pretty tightly coupled. Curious if maybe a "skip_outputs_all" CLI flag would be appropriate, if not checking the mocks of all the dependencies.
Do you have mocks defined for the vpc module in the security module terragrunt.hcl config? Or only defined in the child module?
When security also depends on VPC, there are indeed mocks for both. It seems more like it isn't evaluating the outputs at all, rather than mocks are missing (as evidenced by the fact that skip_outputs gets around it).
I stumbled into it because I was building out the terragrunt infra one component at a time, then destroyed it to see if I could re-build a larger sub-section of the environment and that's when I ran into this problem.
I also tried playing with the ignore/include external dependency flags but that didn't seem to help either. I'm currently using the skip_outputs trick above to get the plan to work and to visually verify, and then I'm removing the OR condition and plan afterwards so future plans show the correct changes.
Oooo ok I think I know what the bug is. Not sure of the fix yet, but I am aware of the problem now:
Currently the detection of "no outputs" is based on whether or not the state file for the module exists. When the state file does not exist, terragrunt goes in to mocking mode, but if the state file exists but is empty, it is treated the same as the module having been applied and ignores the mocks.
Repro:
Indeed, that would make sense.
Given it's been a few days, the exact order of operations may be a little off, but at one point I did clean up the S3 bucket and Dynamo lock table. Entirely possible I had something else going on that may have shadowed your repro though.
I'm doing a greenfield leg of this process now (and thus no statefile should exist) and may be able to help clarify.
Thanks for digging in - really appreciate it!
What would be the preferred type of fix for something like this? Would the preference be to add another CLI flag that would preserve the current behavior but add ignoring empty outputs of deleted deployments, or just make that the default behavior. I haven't dug through terragrunt code in some time but I could perhaps take a crack at it.
After thinking through, I think the proper fix here is to merge the mocks with the outputs when it is allowed to use the mock (terraform command is mock_outputs_allowed_terraform_commands). This should work to preserve the current behavior, while always falling back to the mock if the output is not available because when there are actual outputs, the keys from the real outputs will override the mock when doing the dictionary merge.
This has the added benefit of allowing usage of the mock automatically for new outputs, which is something that is not supported in the current mock semantics.
So doing some stepping through now with my setup and I think there's two distinct issues. The first one you are correct, if there's an output block, it is used (or skipped) in its entirety. It does seem to handle empty/missing state files okay, so the act of my clearing out the S3 bucket makes a bit more sense now.
My original reported issue with transitive dependencies is actually related to the shouldReturnMockOutputs returning false when the terraform command is output, as opposed to the original user intent of a plan. So it looks like as it works its way up the dependency tree, the command is changing which is why my config above isn't working, I simply don't have "output" in the slice of acceptable mock plans.
So the question now becomes does output always become a valid mock command if there's a mock available, or do we need to express the original module's intent of a plan into a different attribute that can be added to checks for whether to use mocks.
Happy to provide a fix, just want to work on the right thing 馃憤
Curious your thoughts on this - as this explains a lot. It looks like a change was made about 17 days ago to modify the command to output when processing a dependency block to avoid triggering hooks. Fits my timeline because I just upgraded my machine and got the latest terragrunt whereas I probably had a version or two older before on my old laptop. If I remove this line, everything works again as expected:
targetOptions.TerraformCommand = "output"
As for ways to move forward, it seems one option is to remove that line and express the need to suppress hooks another way, or perhaps simply add documentation to add the output command to the mock allowed commands.
I ran into this problem as well. I rolled back to v0.23.30 and the problem went away without having to change any of my code.
Agree, that may be my short-term plan as well. I'm probably a little stretched for time until closer to the weekend but may try to figure out another way to suppress the hooks without modifying the command and see if that can be a relatively clean change. Seems like we need both, and I get why you wouldn't want to have hooks run in this situation as well.
v0.23.38 of terragrunt I am no longer able to reproduce this issue, it looks like it has now been fixed. Thank you!
Most helpful comment
After thinking through, I think the proper fix here is to merge the mocks with the outputs when it is allowed to use the mock (terraform command is
mock_outputs_allowed_terraform_commands). This should work to preserve the current behavior, while always falling back to the mock if the output is not available because when there are actual outputs, the keys from the real outputs will override the mock when doing the dictionary merge.This has the added benefit of allowing usage of the mock automatically for new outputs, which is something that is not supported in the current mock semantics.