TestEnterpriseSearchVersionUpgradeToLatest7x is failing on Openshift.
This test upgrades EnterpriseSearch from 7.8.1 to 7.9.0. But deploying a brand new 7.9.0 Enterprise Search resource (i.e. without upgrading) seems to fail the same way, so I don't think it is related to an upgrade from 7.8.1 to 7.9.0
NAME HEALTH NODES VERSION PHASE AGE
elasticsearch.elasticsearch.k8s.elastic.co/test-ent-version-upgrade-584t green 1 7.9.0 Ready 15m
NAME HEALTH NODES VERSION AGE
enterprisesearch.enterprisesearch.k8s.elastic.co/test-ent-version-upgrade-w82m green 2 7.9.0 15m
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/test-ent-version-upgrade-584t-es-masterdata-0 1/1 Running 0 15m 10.129.2.15 local-lv4nr-w-b-zg7mh.c.elastic-cloud-dev.internal <none> <none>
pod/test-ent-version-upgrade-w82m-ent-6fccc7bc98-55s4k 1/1 Running 0 15m 10.128.2.18 local-lv4nr-w-d-4cgwj.c.elastic-cloud-dev.internal <none> <none>
pod/test-ent-version-upgrade-w82m-ent-6fccc7bc98-6m5zs 1/1 Running 0 15m 10.129.2.14 local-lv4nr-w-b-zg7mh.c.elastic-cloud-dev.internal <none> <none>
pod/test-ent-version-upgrade-w82m-ent-bb56cdbd6-x5p9z 0/1 CrashLoopBackOff 6 12m 10.131.0.21 local-lv4nr-w-c-4cdmr.c.elastic-cloud-dev.internal <none> <none>
Found java executable in PATH
Java version detected: 1.8.0_252 (major version: 8)
Enterprise Search is starting...
[2020-08-25T07:05:04.582+00:00][1][2000][app-server][INFO]: Enterprise Search version=7.9.0, JRuby version=9.2.9.0, Ruby version=2.5.7, Rails version=4.2.11.3
[2020-08-25T07:05:10.135+00:00][1][2000][app-server][INFO]: Successfully connected to Elasticsearch
[2020-08-25T07:05:20.949+00:00][1][2000][app-server][INFO]: Loaded a new read-only flag value from Elasticsearch: true
[2020-08-25T07:05:20.952+00:00][1][2000][app-server][INFO]: Disabling read-only mode for indices: .ent-search-actastic-togo_migrations_v1
[2020-08-25T07:05:21.007+00:00][1][2000][app-server][INFO]: [db_lock] [installation] Status: [Starting] Ensuring migrations tracking index exists
[2020-08-25T07:05:21.042+00:00][1][2000][app-server][INFO]: [db_lock] [installation] Status: [Finished] Ensuring migrations tracking index exists
[2020-08-25T07:05:21.612+00:00][1][2000][app-server][INFO]: Enterprise Search indices are ready
[2020-08-25T07:05:21.614+00:00][1][2000][app-server][INFO]: Disabling read-only mode for indices: .ent-search-actastic-app_search_accounts_v7,.ent-search-actastic-app_search_accounts_v7-key-unique-constraint,.ent-search-actastic-app_search_api_tokens,.ent-search-actastic-app_search_api_tokens-authentication_token-unique-constraint,.ent-search-actastic-app_search_roles_v2,.ent-search-actastic-clusters,.ent-search-actastic-clusters-name-unique-constraint,.ent-search-actastic-user_external_identities_v1,.ent-search-actastic-user_external_identities_v1-external_id-service_type-unique-constraint,.ent-search-actastic-users_v4,.ent-search-actastic-users_v4-auth_source-elasticsearch_username-unique-constraint,.ent-search-actastic-users_v4-email-unique-constraint,.ent-search-actastic-workplace_search_accounts_v10,.ent-search-actastic-workplace_search_accounts_v10-user_oid-unique-constraint,.ent-search-actastic-workplace_search_organizations_v6,.ent-search-actastic-workplace_search_organizations_v6-default_group_id-unique-constraint
[2020-08-25T07:05:21.702+00:00][1][2000][app-server][INFO]: [db_lock] [installation] Status: [Starting] Creating a default Elasticsearch cluster configuration record
[2020-08-25T07:05:21.772+00:00][1][2000][app-server][INFO]: [db_lock] [installation] Status: [Finished] Creating a default Elasticsearch cluster configuration record
[2020-08-25T07:05:22.077+00:00][1][2000][app-server][INFO]: Enabling read-only mode for the product!
Unexpected exception while running Enterprise Search:
Error: Permission denied - /usr/share/enterprise-search/lib/war/WEB-INF/web.xml at org/jruby/RubyIO.java:1239:in `sysopen'
org/jruby/RubyIO.java:3804:in `write'
/usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/cli/app_server_command.class:67:in `render_web_xml'
/usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/cli/app_server_command.class:55:in `run'
/usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/cli/command.class:10:in `run_and_exit'
/usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/cli.class:140:in `run_supported_command'
/usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/cli.class:122:in `run_command'
/usr/share/enterprise-search/lib/war/shared_togo/lib/shared_togo/cli.class:109:in `run!'
bin/enterprise-search-internal:15:in `<main>'
bash-4.2$ ls -la /usr/share/enterprise-search/lib/war/WEB-INF/web.xml
ls: cannot access /usr/share/enterprise-search/lib/war/WEB-INF/web.xml: No such file or directory
bash-4.2$ ls -la /usr/share/enterprise-search/lib/war/WEB-INF/
total 20
drwxr-xr-x. 2 enterprise-search enterprise-search 110 Aug 11 23:52 .
drwxr-xr-x. 18 enterprise-search enterprise-search 4096 Aug 11 23:52 ..
-rw-r--r--. 1 enterprise-search enterprise-search 2708 Aug 11 23:52 web.xml.erb
-rw-r--r--. 1 enterprise-search enterprise-search 3555 Aug 11 23:52 webserver-ssl-with-redirect.xml
-rw-r--r--. 1 enterprise-search enterprise-search 2080 Aug 11 23:52 webserver-ssl.xml
-rw-r--r--. 1 enterprise-search enterprise-search 505 Aug 11 23:52 webserver.xml
It seems to affect only Openshift
It seems that the Enterprise Search docker image expects to be run as user 1000. If the container is allowed to run with UID 1000 then web.xml is created during startup:
bash-4.2$ id
uid=1000(enterprise-search) gid=1000(enterprise-search) groups=1000(enterprise-search
bash-4.2$ ls -la /usr/share/enterprise-search/lib/war/WEB-INF/
total 32
drwxr-xr-x 1 enterprise-search enterprise-search 4096 Aug 25 07:37 .
drwxr-xr-x 1 enterprise-search enterprise-search 4096 Aug 11 23:52 ..
-rw-r--r-- 1 enterprise-search enterprise-search 2702 Aug 25 07:37 web.xml <--- here
-rw-r--r-- 1 enterprise-search enterprise-search 2708 Aug 11 23:52 web.xml.erb
-rw-r--r-- 1 enterprise-search enterprise-search 3555 Aug 11 23:52 webserver-ssl-with-redirect.xml
-rw-r--r-- 1 enterprise-search enterprise-search 2080 Aug 11 23:52 webserver-ssl.xml
-rw-r--r-- 1 enterprise-search enterprise-search 505 Aug 11 23:52 webserver.xml
On Openshift and on secured K8S clusters the containers run as a random user ID. This makes Enterprise Search 7.9.0 incompatible with that kind of clusters.
Great catch! This is definitely worth fixing, we'll file a bug for the solution and look into fixing it for the next release.
@barkbay do you think there's something we can do in ECK for 7.9.0 to work on Openshift? We may be able to do some filesystem tricks through an init container but not sure it is worth it.
From the user point of view I guess the workaround right now is to use a manipulate Openshift SCCs so the container runs with uid 1000.
Doing nothing on ECK side and just considering Enterprise Search 7.9.0 + Openshift do not play well together is also an option, especially if this is fixed in 7.9.1.
@kovyrin in general this is something we should have caught earlier (before the 7.9.0 release) with our e2e tests. It turns out our Openshift tests were failing for some other reasons, which hid this bug :(
@barkbay do you think there's something we can do in ECK for 7.9.0 to work on Openshift?
Unfortunately no. I think that any filesystem trick would require some privileges we don't have in this context. The only options I can think of are the one you mentioned: document that Enterprise Search 7.9.0 is not compatible with Openshift and/or document how to create a service account and add it to the anyuid SCC
I opened #3664 in order to detect this kind of problems when testing snapshots.
Started looking into this and I'm not sure what's the recommended way of dealing with this issue (we do need to be able to create files on disk when starting the product after all). Where can we write in those secured k8s clusters?
Some options that come to mind @kovyrin:
ENTERPRISE_SEARCH_DATA_DIR (name tbd) env var/flag, specifying a directory where all files will be written at runtime. it's up to the user (eg. ECK) to specify something that makes sense there (likely a folder in /tmp or an additional emptyDir volume)root group. This is a bit weird outside the scope of Openshift in my opinion 🤷♂️ @sebgl
- follow Red Hat recommendations and make the directory where files are written owned by the
rootgroup. This is a bit weird outside the scope of Openshift in my opinion man_shrugging
Well, it's not that weird actually :) When a container starts with an UID not specified in /etc/passwd, the user automatically joins the root group. It's valid not only for OpenShift, but for Docker and vanilla Kubernetes as well. - Even though the change is not enforced by default, an end user might decide to do so.
If the UID is in /etc/passwd, then the user will be part only of the groups from /etc/group.
So, to make the behavior more consistent and predictable, IMO, it's better not only to have +w set for the GID 0 (chown -R enterprise-search:root /usr/share/enterprise-search/lib/war/WEB-INF/ && chmod -R g+w /usr/share/enterprise-search/lib/war/WEB-INF/), but also add the default user (1000 in this case) to the group.
I did the trick for other images, and it worked well in OCP 3.11 every time.
Using of /tmp should be fine as well, though, personally, I don't like it, because it forces you to spread the files around.
None of the above will protect you from the case where rootfs is in read-only mode. - Here, we'll need an additional mount point (emptyDir) configured through manifests. As a consequence, if the folder you mount to already has some files in it (like in our case), they will not be visible anymore. So, having the additional ENV to make the path configurable is a good idea.
Just merged in a set of changes that should make it possible to run Enterprise Search starting 7.11.0+ on Openshift. Thank you for reporting the issue and all of the insights into community best-practices around secured Docker environments 🙇
Thanks @kovyrin!
Checklist before closing that issue:
7.11.0-SNAPSHOT works fine on Openshiftoc get es,ent,pod
NAME HEALTH NODES VERSION PHASE AGE
elasticsearch.elasticsearch.k8s.elastic.co/elasticsearch-sample green 1 7.11.0 Ready 3m
NAME HEALTH NODES VERSION AGE
enterprisesearch.enterprisesearch.k8s.elastic.co/ent-sample green 1 7.11.0 3m
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-sample-es-default-0 1/1 Running 0 3m
pod/ent-sample-ent-6465fb45f5-xmw4p 1/1 Running 0 3m
....
[2021-02-08T18:02:02.822+00:00][8][2150][app-server][INFO]: Enterprise Search version=7.11.0, JRuby version=9.2.13.0, Ruby version=2.5.7, Rails version=4.2.11.3
....
Thanks all for your help !