After we refactor the manifest deployment operations to remove the need for the liveManifestCalls flag, we should take the opportunity to re-evaluate and strongly type the contracts among Deck's execution details, Orca's manifest-related tasks, and Clouddriver's manifest operations.
Kubernetes V2
Deck, Orca, Clouddriver
This is not a definite plan, but possibilities to consider while doing this cleanup:
OperationResult the right model for the type of information we need to surface for every Kubernetes operation? For instance, it might be useful to surface more targeted information for Undo Rollout stages.manifests key stores either applied manifests or applied patches depending on stage type, which I believe is why we poll continuously for the updated manifest rather than using the context we already have when rendering Deploy and Patch Manifest stage's execution details.This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:
@spinnakerbot remove-label stale
Just wanted to throw this scenario in there as well from a previous issue: https://github.com/spinnaker/spinnaker/issues/3228
In this particular pipeline structure, the fan-in fan-out nature leads to huge payloads for downstream stages due to parentExecutions containing all the manifests (especially if deploy stages have large manifest deployments). In our case, we can even start seeing pipeline execution context sizes near 100s of megabytes in SQL (though even before that point the system is pretty much breaking down because none of this data can be fetched efficiently, or updated).
Just for some additional context, we had implemented a hacky fix in our fork as far back as 1.8 (I think) to help mitigate this issue by removing a bunch of stuff from the context:
orca-queue/src/main/kotlin/com/netflix/spinnaker/orca/q/handler/RunTaskHandler.kt:
private fun Stage.processTaskOutput(result: TaskResult) {
val filterKeys: MutableList<String> = mutableListOf("stageTimeoutMs",
"outputs.createdArtifacts",
"outputs.manifestNamesByNamespace",
"outputs.boundArtifacts",
"outputs.manifests",
"artifacts")
if (!context.contains("expectedArtifacts")) {
filterKeys.add("resolvedExpectedArtifacts")
}
val filteredOutputs = result.outputs.filterKeys { !filterKeys.contains(it) }
if (result.context.isNotEmpty() || filteredOutputs.isNotEmpty()) {
context.putAll(result.context)
outputs.putAll(filteredOutputs)
repository.storeStage(this)
}
}
This seemed to be "working" for us until we upgraded to 1.19 recently, in which this code no longer seems to be taking effect even though we have it in our fork. I wasn't involved in this change in our fork so I don't know a lot the context around it, but it does seem like what we're doing probably does break some stuff, since I see outputs.manifests and outputs.manifestNamesByNamespace used for certain functionalities (CleanupArtifacts, traffic management).
Anyway, just wanted to throw this out there since I've been looking into the exploding manifests issue again recently - I'm pretty scared to touch things because like mentioned in this issue, there isn't a well defined contract about what needs to be in the execution context, so having that and trying to get manifest contents out of the execution context would be great. From what I can tell from searching around in the code, I think most functionality just needs the manifest name, namespace, and annotations.
As a sidenote, getting manifest contents out of execution context would also help if you're trying to deploy Secrets from a private github repo or something, since Kubernetes Secrets are just plaintext and currently will show up there.
Unassigning now that this is in our active tracking project.
Most helpful comment
Just wanted to throw this scenario in there as well from a previous issue: https://github.com/spinnaker/spinnaker/issues/3228
In this particular pipeline structure, the fan-in fan-out nature leads to huge payloads for downstream stages due to parentExecutions containing all the manifests (especially if deploy stages have large manifest deployments). In our case, we can even start seeing pipeline execution context sizes near 100s of megabytes in SQL (though even before that point the system is pretty much breaking down because none of this data can be fetched efficiently, or updated).
Just for some additional context, we had implemented a hacky fix in our fork as far back as 1.8 (I think) to help mitigate this issue by removing a bunch of stuff from the context:
orca-queue/src/main/kotlin/com/netflix/spinnaker/orca/q/handler/RunTaskHandler.kt:This seemed to be "working" for us until we upgraded to 1.19 recently, in which this code no longer seems to be taking effect even though we have it in our fork. I wasn't involved in this change in our fork so I don't know a lot the context around it, but it does seem like what we're doing probably does break some stuff, since I see outputs.manifests and outputs.manifestNamesByNamespace used for certain functionalities (CleanupArtifacts, traffic management).
Anyway, just wanted to throw this out there since I've been looking into the exploding manifests issue again recently - I'm pretty scared to touch things because like mentioned in this issue, there isn't a well defined contract about what needs to be in the execution context, so having that and trying to get manifest contents out of the execution context would be great. From what I can tell from searching around in the code, I think most functionality just needs the manifest name, namespace, and annotations.
As a sidenote, getting manifest contents out of execution context would also help if you're trying to deploy Secrets from a private github repo or something, since Kubernetes Secrets are just plaintext and currently will show up there.