Describe the feature:
Given I have a package that contains an ElasticSearch Transform specification, we should be able to install, update, delete and start the transform based on the state of the package and prior history of the package.
Describe a specific use case for the feature:
Pinging @elastic/ingest-management (Team:Ingest Management)
Pinging @elastic/endpoint-management (Team:Endpoint Management)
We have an early checklist to add custom/new types on the package https://github.com/elastic/package-spec/issues/27
@ph It looks like a request to document what is needed for different assets in the package specs. I think the next step is to create some implementation tickets in the different repos.
@ruflin Can we get some implementation so we can start knocking off some of them off. I will add some constraints to the ticket about the source index has to exist for the transform to be successfully applied.
Constraints and Notes:
There are two main issues I see at the moment:
To get things moving here I suggest the following:
The above sidesteps a few issue:
@nnamdifrankie Could you get the above 2 PR's started and link them here?
@ruflin
What are we doing if not data is there yet (aka no index)
Currently our best case is where data exist e.g. 7.9 to 7.10 upgrade. We have to consider the best option for getting a document into the into the source index. Currently we ignore certain document with certain attributes, we can create a similar document to use as a seed. But we have to decide who will send this document.
Create a PR with a package that contains a transform inside (probably best an updated endpoint package). I expect this PR to happen somewhere, where a registry can be spun up with this package for testing
We always test the registry locally while developing using the docker, but we can also use the code https://github.com/elastic/endpoint-package/blob/master/Makefile#L161
@elastic/ingest-management @elastic/endpoint-management
Following my exploration of the EPM code, I have been able to install a transform but not start it. Starting it requires the source index to exist, we will explore options with the Elastic Search team. I however want to get you input on installation strategy for Transforms.
Transforms in a dataset in a package have the cases that influence how we perform the installation.
A transform can be moved to another dataset (different installation name) after it has been installed in a different dataset in prior versions. If we do not properly detect this drift and clean up the old version we could multiples.
It is wasteful and incorrect to have multiple version of the transform doing the same processing if the intention was to move the transform like in the example given above.
Other attributes of the transform can change. We can always detect this using some form of hashing.
Candidate Solutions:
Assuming current state is the desired state, we can delete the old transform with the dataset prefix and version along with its reference in SO after we have successfully installed the new transform if any exist (a move case). Before installation capture all Transform and Object reference with the dataset prefix. After successful installation delete using the information from the capture. The consideration is that we continue to support any rollback agreements in the case of failure. Deleting after installing may help satisfy this requirement. Also deleting may fail which result in multiple version and old transforms.
Calculate the diffs, drift and change cases and apply install and delete actions. The consideration here is catching all the cases and also ensuring that we maintain rollback agreements in the case of failure. Also have to consider delete failures which can result in an inconsistent and undesired state.
Your comments are welcomed.
Thanks @nnamdifrankie for the spike, @skh or @neptunian can you look at this?
@nnamdifrankie I think I don't fully understand your example of a transform moving to an other dataset which unfortunately is important for all the follow ups. Could you share an example?
To get started, I would keep it as simple as possible. Would the following flow be possible?
@nnamdifrankie Could this cause any side effects? You mention above that parts of the transform can change. My assumption is that these changes always mean a new version of the package?
If we follow the above, also rollback should be pretty straight forward as it is just the same in reverse.
An other open question for me is where in the chain of asset installation does transform fit it on install / upgrade. I assume after the templates and ingest pipelines have been loaded but before UI elements?
An other thing I learned yesterday when talk to the ML team is that the source index can be a patter. So it can be logs-foo-* as the source. Interesting would be if the target index could contain a variable. Something like logs-foo-{data_stream.namespace}. It would mean, a single transform would act for multiple namespaces, but I doubt this is possible at the moment. If there would be multiple source indices, what would be our expectations for the target index. All data in a single target index or one per namespace?
@ruflin Sorry I was not clear earlier.
I think I don't fully understand your example of a transform moving to an other dataset which unfortunately is important for all the follow ups. Could you share an example?
Given we have the transform in previous version.
Endpoint Package 0.15.0:
We have a transform in the metadata dataset package/endpoint/dataset/metadata/elasticsearch/transform/default.json. This will create a transform metrics-endpoint.metadata-default-0.15.0-[timestamp]. We can also use metrics-endpoint.metadata-default but this will not allow us to rollback in case of failure of the transform if the current version fails.
Endpoint Package 0.16.0:
We move the transform to metadata_current dataset package/endpoint/dataset/metadata_current/elasticsearch/transform/default.json. This will create a transform metrics-endpoint.metadata_current-default-0.16.0-[timestamp]. This could be a new transform or similar to the transform in metadata dataset. But the ultimate outcome is the metadata dataset transform should be deleted, and this one installed.
@ruflin
An other thing I learned yesterday when talk to the ML team is that the source index can be a patter. So it can be logs-foo-* as the source. Interesting would be if the target index could contain a variable. Something like logs-foo-{data_stream.namespace}. It would mean, a single transform would act for multiple namespaces, but I doubt this is possible at the moment. If there would be multiple source indices, what would be our expectations for the target index. All data in a single target index or one per namespace?
It is possible that the wildcard picks up disjoint documents that matches the query and pivot of the transform. The documents will be transferred to the destination index. But the mapping of the destination index determines how usefulness of the documents. If the document maps correctly then it will be retrieved in queries, else it will not.
@ruflin
To get started, I would keep it as simple as possible. Would the following flow be possible?
Stop transform
Delete transform
Load new transform
Start new transform
@nnamdifrankie Could this cause any side effects? You mention above that parts of the transform can change. My assumption is that these changes always mean a new version of the package?
If we follow the above, also rollback should be pretty straight forward as it is just the same in reverse.
My plan is was to
Install Transform. Since we are time stamping our ids, we install the transform. Also this could be a force install of the package again. Also this could be a NoOp if the transform has been removed from the dataset.
Start Transform. TBD given the option we choose.
If Success. Delete the old state using the captured information from Step 1. Deleting old information can fail also in the worst case scenario.
If Failure. Delete the current state or resources we created for this version, technically this is a rollback. Deleting can also fail in a worst case scenario.
The goal is to have one transform per purpose because any slight difference in the code could mean different documents in the target.
We have a transform in the metadata dataset package/endpoint/dataset/metadata/elasticsearch/transform/default.json. This will create a transform metrics-endpoint.metadata-default-0.15.0-[timestamp]. We can also use metrics-endpoint.metadata-default but this will not allow us to rollback in case of failure of the transform if the current version fails.
@nnamdifrankie Isn't the version in the name enough without having the timestamp to allow a rollback in case of a failure? Can you explain the need for the timestamp? Thanks.
@neptunian
Isn't the version in the name enough without having the timestamp to allow a rollback in case of a failure? Can you explain the need for the timestamp? Thanks.
It is for the forced install where the versions are the same.
I think I'm also still missing the part around the force install and the timestamp. When is this exactly happening?
You mention above
Capture current transform state information
What does this exactly mean? Could you share an example?
You also mention:
The goal is to have one transform per purpose
Are you referring here to multiple transform per dataset?
My current assumptions are and please let me know which ones are wrong:
@ruflin Sorry it was not clear. First let me answer in the context of your steps here
Stop transform
Delete transform
Load new transform
Start new transform
My proposal will do which is similar to your step but just a change in order.
I think I'm also still missing the part around the force install and the timestamp. When is this exactly happening?
You mention above
Capture current transform state information
What does this exactly mean? Could you share an example?
With my steps above then we will do not plan to install over any transforms. Transform are only removed after we have successfully installed the new transforms. Hence the need for timestamps or unique identifier even in a forced version update.
You also mention:
The goal is to have one transform per purpose
Are you referring here to multiple transform per dataset?
Yes if we have dataset1.transform1 and dataset2.transform2 that have the same code and update the same index and run at different time. I believe this not a desirable state. What do you think?
We can do exactly the same to rollback. The old version of the transform is always available in the old package.
Do you currently have a rollback handler in case of failure that tries to install the previous version?
We can stop, overwrite, start a transform. As long as we do this in a short time, no data in the target index will be missing.
I can only answer this question by testing. Everything is timing with this setup.
If everything falls apart during upgrade / downgrade, we can wipe the target index and start the transform for scratch again and it will be able to fully rebuild the target index. This assumes we don't throw away data in the source index.
Let talk about this for clarity. Wiping the destination index is technical a service outage.
Looks like we are mostly on the same page. The only different is if transform is overwritten (what we do for index templates) or if we use versions (what we do for ingest pipelines). I guess both will work. If we use a version for the transform, lets use the package version.
Could you test on what the maximum time is we have to get the new transform in place?
For the rollback, @neptunian can share here more.
What is our naming convention for the case where we have multiple transforms in a single dataset? Will we postfix the file name without the .json part? What is the final name in Elasticsearch.
What is the final name in Elasticsearch
{
"id": "metrics-endpoint.metadata-current-default-0.16.0-dev.0-20207319",
"type": "transform"
}
Where 0.16.0-dev.0 is the version and 20207319 is a unique timestamp.
Two questions:
default part in the name above, is this the name of the file in the package?@nnamdifrankie It seems the force install you need for development purpose or is this something you expect to see in production? I'm worried we build something for dev that we should potentially solve in a different way. For example we could have a special method for "overwrite / force install" that does the right thing for each asset and in the case of transform, it would be delete and install it again.
If by forced install, if you mean "reinstall" and the versions are the same, we don't delete the previous ingest pipeline if the version is the same or it's a reinstall. Since we PUT to the ingest pipeline it would update the existing versionized one if it exists.
@ruflin We don't have a rollback currently for unknown errors that aren't handled. I think it was mentioned here and that we'd improve on it but I realize some default behavior should happen. Currently it will just error out and when you refresh kibana it will do a "package health check" and try to reinstall whatever package you were trying to install. This behaviour is mainly to handle unknown errors that cause kibana to crash, though. We should add a case for catching any unknown error and trying to reinstall the previous version. This should be a minor change where we try to install the previous in an else clause here. There is no special rollback handler.
@neptunian @ruflin let make sure there is a ticket for this rollback so it does not fall through the cracks. Right now I have only handled the happen path with the belief that rollback will be handled by the main handler.
@ruflin @kevinlog Are we fine to close this ticket? We can create more focused stories if needed.
@nnamdifrankie I'm good with closing it but lets make sure we follow up with the issues like the index problem.
@nnamdifrankie Can you create an issue for index problem
The source index has to exist before the transform is created and started. For Datastream indices the indices are created when document and added to the index.
And close this one?
@ph I will probably have to create in ML issues board and link this one.