When the package manager was created, it was created with the idea that the registry and packages are always available. The current implementation uses a local in-memory cache for package contents. Whenever a package is missing from this cache it is re-fetched from the registry. Over the last 2 releases, quite a few issues have shown up where it became a problem that packages are only available from the registry:
To solve all the above problems, I'm proposing to not only cache the packages in memory, but also store them in a dedicated ES index. This also unifies how packages work which are uploaded through zip, coming from the registry or any other model to add packages. Below is an image to have this visualised:

One important detail here is, that for browsing packages from the registry without installing them, these should not be downloaded, see https://github.com/elastic/kibana/issues/76261. Packages which are uploaded are always installed.
How exactly packages and assets stored in Elasticsearch should be a follow up discussion if we decide to move forward with this.
Decision: Agreed to move forward. Follow up discussion ticket: https://github.com/elastic/kibana/issues/83426
Pinging @elastic/ingest-management (Team:Ingest Management)
I would clarify the use of the word 'local' in the initial description:
"The current implementation uses a local cache and pulls down the package again if any files are missing or a package does not exist." -- The current implementation uses a local in-memory cache [...]. If Kibana runs clustered, every Kibana instance has its own in-memory cache.
"store locally" -- I assume what is meant is to store in Elasticsearch in a dedicated index? As Kibana can be clustered, local file storage should not be used.
@skh Spot on. Can you directly update the issue?
In addition, I see two ways to store packages in ES:
The zip file would always need to be downloaded and unpacked as a whole.
Single files could be queried by file type or path so that we can access single assets more quickly, but there may be many of them in some packages.
I like the idea around storing each file as a document instead of the zip file. It not only allows us to query as you mentioned but in addition, but allows us to exclude certain files form storing and add metadata to each like hash, date modified / date installed etc.
You basically have a VFS over Elasticsearch, I like the idea @skh, one thing to consider for a future improvement/release is signature of package / files?. I am not sure of the level of risk here, but it could be possible for a user to update a file via a document.
馃憤 to the problem description and proposal. Two things that come to mind are
I'm curious about the difference between making 10, 100, etc requests to ES vs moving them from memory. The best case scenario (few assets & fast connection to ES) might not be noticeably affected, but the more assets or greater latency to ES the slower things will feel. One option is to keep the memory cache (changing to an LRU or something less naive than now) and adding values in there on thier way in to ES. That way we keep the durability of ES but still avoid the latency issues.
We'll have to base64 encode any binary asset. That adds about 30% to their file size. There's also a CPU cost of decoding them. Again, storing the decoded Buffer in the memory cache would help.
Here a quick check of the image files sizes. Remember these will be about 30% larger after base64 encoding
image files sizes
du -h -d1 */*/img/*
396K aws/0.2.5/img/filebeat-aws-cloudtrail.png
1.1M aws/0.2.5/img/filebeat-aws-elb-overview.png
188K aws/0.2.5/img/filebeat-aws-s3access-overview.png
2.5M aws/0.2.5/img/filebeat-aws-vpcflow-overview.png
8.0K aws/0.2.5/img/logo_aws.svg
384K aws/0.2.5/img/metricbeat-aws-billing-overview.png
260K aws/0.2.5/img/metricbeat-aws-dynamodb-overview.png
1.1M aws/0.2.5/img/metricbeat-aws-ebs-overview.png
600K aws/0.2.5/img/metricbeat-aws-ec2-overview.png
600K aws/0.2.5/img/metricbeat-aws-elb-overview.png
496K aws/0.2.5/img/metricbeat-aws-lambda-overview.png
792K aws/0.2.5/img/metricbeat-aws-overview.png
640K aws/0.2.5/img/metricbeat-aws-rds-overview.png
332K aws/0.2.5/img/metricbeat-aws-s3-overview.png
720K aws/0.2.5/img/metricbeat-aws-sns-overview.png
348K aws/0.2.5/img/metricbeat-aws-sqs-overview.png
560K aws/0.2.5/img/metricbeat-aws-usage-overview.png
396K aws/0.2.7/img/filebeat-aws-cloudtrail.png
1.1M aws/0.2.7/img/filebeat-aws-elb-overview.png
188K aws/0.2.7/img/filebeat-aws-s3access-overview.png
2.5M aws/0.2.7/img/filebeat-aws-vpcflow-overview.png
8.0K aws/0.2.7/img/logo_aws.svg
384K aws/0.2.7/img/metricbeat-aws-billing-overview.png
260K aws/0.2.7/img/metricbeat-aws-dynamodb-overview.png
1.1M aws/0.2.7/img/metricbeat-aws-ebs-overview.png
600K aws/0.2.7/img/metricbeat-aws-ec2-overview.png
600K aws/0.2.7/img/metricbeat-aws-elb-overview.png
496K aws/0.2.7/img/metricbeat-aws-lambda-overview.png
792K aws/0.2.7/img/metricbeat-aws-overview.png
640K aws/0.2.7/img/metricbeat-aws-rds-overview.png
332K aws/0.2.7/img/metricbeat-aws-s3-overview.png
720K aws/0.2.7/img/metricbeat-aws-sns-overview.png
348K aws/0.2.7/img/metricbeat-aws-sqs-overview.png
560K aws/0.2.7/img/metricbeat-aws-usage-overview.png
8.0K checkpoint/0.1.0/img/checkpoint-logo.svg
4.0K cisco/0.3.0/img/cisco.svg
796K cisco/0.3.0/img/kibana-cisco-asa.png
12K crowdstrike/0.1.2/img/logo-integrations-crowdstrike.svg
392K crowdstrike/0.1.2/img/siem-alerts-cs.jpg
512K crowdstrike/0.1.2/img/siem-events-cs.jpg
4.0K endpoint/0.14.0/img/security-logo-color-64px.svg
4.0K endpoint/0.15.0/img/security-logo-color-64px.svg
4.0K fortinet/0.1.0/img/fortinet-logo.svg
4.0K microsoft/0.1.0/img/logo.svg
424K o365/0.1.0/img/filebeat-o365-audit.png
296K o365/0.1.0/img/filebeat-o365-azure-permissions.png
16K o365/0.1.0/img/logo-integrations-microsoft-365.svg
436K okta/0.1.0/img/filebeat-okta-dashboard.png
4.0K okta/0.1.0/img/okta-logo.svg
476K panw/0.1.0/img/filebeat-panw-threat.png
1.5M panw/0.1.0/img/filebeat-panw-traffic.png
12K panw/0.1.0/img/logo-integrations-paloalto-networks.svg

To clarify, I'm saying we would still put assets in ES, but use a cache to store ready-to-serve values to avoid hitting ES and doing any unnecessary work. We could add TTL or any other logic to decide when use or invalidate cache entries.
One option is to keep the memory cache (changing to an LRU or something less naive than now) and adding values in there on thier way in to ES. That way we keep the durability of ES but still avoid the latency issues.
Would it be an option to keep the in-memory cache, but purge some files, like ES and Kibana assets from it regularly, while keeping others, like images, for longer?
Would it be an option to keep the in-memory cache, but purge some files, like ES and Kibana assets from it regularly, while keeping others, like images, for longer?
Definitely. That's what I was getting at with
We could add TTL or any other logic to decide when use or invalidate cache entries.
We'll have to define the rules and then see if there's an existing package that does what we want out of the box or if we need to wrap one with some code to manage it.
Seems like we want both support for both TTL (different by assert class) _and_ max memory size for the cache.
https://github.com/isaacs/node-lru-cache is an existing dependency and my go-to, but it doesn't support per-entry TTL. I think we'd have to create multiple caches to get different expiration policies.
I did some searching and both https://github.com/node-cache/node-cache & https://github.com/thi-ng/umbrella/tree/develop/packages/cache seem like they'd work for this case
Before we add a cache, we should first test if we really need it. Having a cache will speed up things but also make things more complicated.
Quite a few of the large assets are images and are only used when run through the browser. I assume the browser cache will also help use here to only load it once per user?
I agree we should profile. The additional work/complexity is low so we can add it later.
The browser cache will also need some work (setting headers) but we can look at that when profiling.
When a package is uninstalled from the system, I'd propose it will be removed from the storage index as well.
That way the storage index doesn't silently turn into a secondary installation source that we need to check during package listings and installations.
Upgrade rollback: When an upgrade of a package fails, it is rolled back. If the older version of a package does not exist anymore, Fleet ends up in a state between two packages.
I'm not sure at what point during the package installation process we want to update the storage index, but if possible, it seems like it'd be easiest to add it to the storage index only if installation has successfully completed. Then during rollback we can perhaps use the storage index if the previous version is not available in the registry. If we update the storage index as we are installing, this probably won't be possible.
I agree with @ruflin here, adding cache seems great but this adds a level of complexity. +1 on @jfsiii to add it later.
I just want to highlight that a) we already use a cache b) the proposal specifically mentions.
To solve all the above problems, I'm proposing to not only cache the packages in memory, but also store them in a dedicated ES index.
I don't want to pull us into the weeds re: caching. We can discuss it in the implementation ticket(s). Just highlighting it's not an alteration to the proposal
Closing since we agreed on the proposal and are discussing further in https://github.com/elastic/kibana/issues/83426
@jfsiii Can you share what the final proposal is that was agreed on? What I put here is more a high level proposal and I hoped the questions around storing etc. (which are also mentioned in https://github.com/elastic/kibana/issues/83426) would be answered in a detailed proposal.
Most helpful comment
When a package is uninstalled from the system, I'd propose it will be removed from the storage index as well.
That way the storage index doesn't silently turn into a secondary installation source that we need to check during package listings and installations.