As of #2666 , it's now possible to use non-docker storage with dockerized agents such as KubernetesAgent. This is a super exciting feature!
However, if you are using S3 storage and KubernetesAgent, it seems that it's not possible to customize the jobs created for flow runs using KubernetesJobEnvironment.
I've observed that no errors are generated today (prefect 0.12.0) when you use this combination:
S3 storageKubernetesJobEnvironmentKubernetesAgentThe agent I've created is happily creating jobs in the cluster for flow runs, but the manifests for those jobs are all default values and ignore anything I customize in KubernetesJobEnvironment.
After digging for a bit, I found the root cause. From "Kubernetes Job Environment":
The
KubernetesJobEnvironmentaccepts an argumentjob_spec_filewhich is a string representation of a path to a Kubernetes Job YAML file. On initialization that Job spec file is loaded and stored in the Environment. It will never be sent to Prefect Cloud and will only exist inside your Flow's Docker storage.
I can also see this in code in KubernetesJobEnvironment.create_flow_run_job(), where the environment explicitly expects a docker image containing the job_spec_file
I'd like to be able to store the job_spec_file from KubernetesJobEnvironment in S3 storage, so that the job details of a flow run using S3 storage and run by KubernetesAgent can be customized.
The benefits of non-docker Storage are explained in https://docs.prefect.io/orchestration/execution/storage_options.html#non-docker-storage-for-containerized-environments. Adding this proposed behavior would allow flows using KubernetesAgent to take advantage of that storage without sacrificing the ability to customize the jobs using KubernetesJobEnvironment.
Without this proposed behavior, I think users of KubernetesAgent have to choose between non-docker storage and customizing their jobs.
@jcrist Is this something you should address in your environment refactor? I can see where the custom spec overlaps with the metadata image.
I've been thinking more about this...I think it could be accomplished by changing S3 storage to upload a .tar.gz instead of a single flow with the cloudpickle-ed flow.
Backwards compatibility could be preserved by changing S3.get_flow(). It could inspect the key of the object in S3 and say:
.tar.gz:cloudpickle.load() it)That would open the opportunity to bundle the job spec .yaml for KubernetesJobEnvironment and the cloudpickle-ed flow together, which I think would be enough to make it possible to use S3 storage and KubernetesJobEnvironment together :grinning:
Alternatively, could have the KubernetesJobEnvironment load the spec.yaml at build time rather than run time. Then the yaml file would be stored in the pickled flow, just like everything else. I can't think of any downsides of this behavior, and it should be simple enough to do.
Alternatively, could have the
KubernetesJobEnvironmentload the spec.yaml at build time rather than run time. Then the yaml file would be stored in the pickled flow, just like everything else. I can't think of any downsides of this behavior, and it should be simple enough to do.
oh yeah that's way cleaner, I like that! It kind of looks like the file is already stored on the environment in the flow
Hi, I have a similar request: KubernetesAgent, KubernetesJobEnvironment and storage:GitHub.
I expected the file specified in KubernetesJobEnvironment to be picked up, and
prefect register flow validates that it can read the yaml file, but when running from KuvernetesAgent I get Failed to load and execute Flow's environment: FileNotFoundError(2, 'No such file or directory')
Alternatively, could have the
KubernetesJobEnvironmentload the spec.yaml at build time rather than run time. Then the yaml file would be stored in the pickled flow, just like everything else. I can't think of any downsides of this behavior, and it should be simple enough to do.
I just attempted this in #2950
Jw, is this closed by #2950?
@joshmeek I think it can be closed, yes.
I haven't tried with S3 storage since #2950 was merged, but I have been using another non-Docker storage (Webhook) successfully with KubernetesJobEnvironment + a custom spec file for a few days and it's been working exactly as expected.