Cloud-on-k8s: Plugin s3 not working

Created on 2 Dec 2020 · 7Comments · Source: elastic/cloud-on-k8s

Bug Report

I tried to install the repository-s3 plugin but I got the following error when starting ElasticSearch StatefulSet.

Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/config/repository-s3
Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/config/repository-s3
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:431)
        at java.base/java.nio.file.Files.newDirectoryStream(Files.java:476)
        at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:300)
        at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
        at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
        at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:220)
        at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:129)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:354)
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
        at org.elasticsearch.cli.Command.main(Command.java:90)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
For complete error details, refer to the log at /usr/share/elasticsearch/logs/elastic-cluster.log

The manifest used for plugin installation is:

initContainers:
        - command:
          - sh
          - -c
          - sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
        - command:
          - sh
          - -c
          - bin/elasticsearch-plugin install --batch repository-s3
          name: s3
          securityContext:
            privileged: true

What did you expect to see?
The ElasticSearch Pod should start with plugin installed.

What did you see instead? Under which circumstances?
The ElasticSearch Pod doesn't start and reports the aforementioned exception.

Environment

ECK version:

1.3.0

Kubernetes information:

insert any information about your Kubernetes environment that could help us:
- On premise ? yes
- Cloud: GKE / EKS / AKS ? no
- Kubernetes distribution: Openshift / Rancher / PKS ? k3d
for each of them please give us the version you are using

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:30:33Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.9+k3s1", GitCommit:"630bebf94b9dce6b8cd3d402644ed023b3af8f90", GitTreeState:"clean", BuildDate:"2020-09-17T19:05:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Resource definition:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  annotations: {}
  labels:
    name: elastic-cluster
  name: elastic-cluster
spec:
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - config:
      node.roles:
      - master
      - data
      - ingest
    count: 3
    name: master
    podTemplate:
      metadata:
        annotations:
          traffic.sidecar.istio.io/excludeInboundPorts: 9300,9400
          traffic.sidecar.istio.io/excludeOutboundPorts: "9300"
          traffic.sidecar.istio.io/includeInboundPorts: '*'
      spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: elastic-cluster
                topologyKey: failure-domain.beta.kubernetes.io/zone
              weight: 50
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: elastic-cluster
                topologyKey: kubernetes.io/hostname
              weight: 100
        containers:
        - env:
          - name: METRICS_PORT
            value: "9400"
          name: elasticsearch
        initContainers:
        - command:
          - sh
          - -c
          - sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
        - command:
          - sh
          - -c
          - bin/elasticsearch-plugin install --batch repository-s3
          name: s3
          securityContext:
            privileged: true
        topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              elasticsearch.k8s.elastic.co/cluster-name: elastic-cluster
          maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: local-path
  podDisruptionBudget:
    spec:
      minAvailable: 2
      selector:
        matchLabels:
          elasticsearch.k8s.elastic.co/cluster-name: elastic-cluster
  version: 7.10.0

Logs:

insert operator logs or any relevant message to the issue here

>non-issue

Source

irizzant

👍1

Most helpful comment

i had the same issue with operator 1.3.1 and istio 1.8.2. The issue turns out to be because of a kubernetes bug, which was fixed in v1.19. The following resource definition works can be used as a workaround for older clusters:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-cluster
  namespace: monitoring
spec:
  version: 7.10.2
  nodeSets:
    - name: default
      count: 3
      podTemplate:
        spec:
          initContainers:
            - name: elastic-internal-init-keystore
              securityContext:
                runAsUser: 1000
                runAsGroup: 1000
            - name: plugins-install
              command:
                - sh
                - -c
                - |
                  bin/elasticsearch-plugin install --batch repository-s3 && chown -R 1000:0 /usr/share/elasticsearch/config/repository-s3
            - name: sysctl
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
              securityContext:
                privileged: true
          containers:
            - name: elasticsearch
              env:
                - name: ES_JAVA_OPTS
                  value: -Xms2g -Xmx2g
                - name: READINESS_PROBE_TIMEOUT
                  value: '10'
              readinessProbe:
                exec:
                  command:
                    - bash
                    - -c
                    - /mnt/elastic-internal/scripts/readiness-probe-script.sh
                failureThreshold: 3
                initialDelaySeconds: 10
                periodSeconds: 12
                successThreshold: 1
                timeoutSeconds: 12
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 2
                  memory: 4Gi
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 50Gi
            storageClassName: gp3
  secureSettings:
    - secretName: s3-credentials

thorion3006 on 15 Jan 2021

🚀1 👍1

All 7 comments

I applied the exact same manifest on a k3d/k3s cluster and it is working as expected:

$ k version

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3+k3s3", GitCommit:"0e4fbfefe1dd8734756dfa4f9ab4fc89665cece4", GitTreeState:"clean", BuildDate:"2020-11-13T07:19:02Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

$ k get es,pods
NAME                                                         HEALTH   NODES   VERSION   PHASE   AGE
elasticsearch.elasticsearch.k8s.elastic.co/elastic-cluster   green    3       7.10.0    Ready   3m30s

NAME                              READY   STATUS    RESTARTS   AGE
pod/elastic-cluster-es-master-0   1/1     Running   0          3m7s
pod/elastic-cluster-es-master-2   1/1     Running   0          3m7s
pod/elastic-cluster-es-master-1   1/1     Running   0          3m7s

$ kbash pod/elastic-cluster-es-master-0
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@elastic-cluster-es-master-0 elasticsearch]# elasticsearch-plugin list
repository-s3

Is there any PSP on your cluster which would change the securityContext of your containers ?

barkbay on 3 Dec 2020

My manifest has been deployed with Istio 1.8 enabled in the cluster, following these indications:
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-service-mesh-istio.html

Please note from the manifest above that the following points have been all fulfilled, but the 3) which is not needed in my case since my cluster supports 3rd party issued tokens.

Disable the default self-signed certificate generated by the operator and allow TLS to be managed by Istio. Disabling the self-signed certificate might interfere with some features such as Kibana Alerting and Actions.
Exclude the transport port (port 9300) from being proxied. Currently ECK does not support switching off X-Pack security and TLS for the Elasticsearch transport port. If Istio is allowed to proxy the transport port, the traffic is encrypted twice and communication between Elasticsearch nodes is disrupted.
Optional. Only set automountServiceAccountToken to true if your Kubernetes cluster does not have support for issuing third-party security tokens.

Please also note that the very same manifest worked perfectly fine with previous ECK version 1.2.1 and Istio 1.8

irizzant on 3 Dec 2020

👍1

I'm also seeing this error using 1.3.1 and also using istio

patrickleet on 12 Jan 2021

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-cluster
  namespace: monitoring
spec:
  version: 7.10.2
  nodeSets:
    - name: default
      count: 3
      podTemplate:
        spec:
          initContainers:
            - name: elastic-internal-init-keystore
              securityContext:
                runAsUser: 1000
                runAsGroup: 1000
            - name: plugins-install
              command:
                - sh
                - -c
                - |
                  bin/elasticsearch-plugin install --batch repository-s3 && chown -R 1000:0 /usr/share/elasticsearch/config/repository-s3
            - name: sysctl
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
              securityContext:
                privileged: true
          containers:
            - name: elasticsearch
              env:
                - name: ES_JAVA_OPTS
                  value: -Xms2g -Xmx2g
                - name: READINESS_PROBE_TIMEOUT
                  value: '10'
              readinessProbe:
                exec:
                  command:
                    - bash
                    - -c
                    - /mnt/elastic-internal/scripts/readiness-probe-script.sh
                failureThreshold: 3
                initialDelaySeconds: 10
                periodSeconds: 12
                successThreshold: 1
                timeoutSeconds: 12
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 2
                  memory: 4Gi
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 50Gi
            storageClassName: gp3
  secureSettings:
    - secretName: s3-credentials

thorion3006 on 15 Jan 2021

🚀1 👍1

@thorion3006 thanks a lot for the workaround. Out of curiosity, do you have by any chance a link to this Kubernetes issue ?

barkbay on 28 Jan 2021

@barkbay istio/istio#26882 kubernetes/kubernetes#57923

thorion3006 on 8 Feb 2021

Thanks for providing the workaround and links @thorion3006 . This problem seems to be fixed with Kubernetes 1.19 so I am closing the issue.