Cloud-on-k8s: ECK on ARM architectures

Created on 22 Jul 2020  路  12Comments  路  Source: elastic/cloud-on-k8s

Support running ECK on ARM architectures

Elastic is starting to provide multi-architecture docker images to support ARM architectures:
https://www.elastic.co/blog/elasticsearch-on-arm

It would be useful to be able to run Elastic Stack on ARM hardware via ECK.

Use cases:
Home Lab - Raspberry Pi Kubernetes clusters
Enterprise - ARM servers are gaining popularity in public clouds. https://docs.aws.amazon.com/eks/latest/userguide/arm-support.html

>feature

Most helpful comment

We published ARM-64 images for the latest ECK 1.4.0 release.

Please note that at the moment you can only run a subset of the Elastic Stack applications on ARM platforms. Check for availability of ARM images for the applications before attempting to use ECK to deploy them.

All 12 comments

@charith-elastic I wanted to pull your comments on CI testing for ARM into this open issue.

https://github.com/elastic/cloud-on-k8s/issues/922#issuecomment-691962564

I had a brief look at this out of interest. From the point of view of ECK, the main issue is how to run the automated test suites on ARM architectures. ~AFAIK, there are no major cloud providers that fully support ARM, so it is not currently possible to set up the test infrastructure to automatically test ECK. (Testing on RPi's is not an automated option, unfortunately.)~

I'm guessing you came across the link above for ARM nodes on ECK in AWS, but I figured it'd be helpful to be a bit more explicit.

AWS seems to be investing pretty heavily in arm64 instance types:

These GA'd on June 11, 2020 and they are already seeing production use:

I was hoping that some of the road is paved for us already based on what Elasticsearch is doing around arm64 support.

Yes, AWS ARM machines are now GA and EKS seems to support them so that's one hurdle down.

There are still some logistics problems to solve in terms of providing _official_ ARM support. The CI system we use to build and publish ECK does not have any ARM workers. Both Go and Docker support cross-compiling so we don't need actual ARM machines to do the work anyway. However, it seems that we still need to do some invasive modifications to the CI nodes in order to make the Docker multi-arch build work:

  • The host must be running a newer version of Docker with support for the buildx extension.
  • The host must have qemu-arm and binfmt-misc installed to support ARM architectures.
  • The host must have a builder configured that supports multi-arch (docker-container builder instead of docker builder.)
  • Docker needs to be configured to use binfmt.

We share the CI system with other teams at Elastic, so this needs some coordination and extra work on the infrastructure and security sides to roll-out -- which can take some time.

Other minor annoyances (not strictly blockers) are:

If you are interested in running ECK on non-production scenarios like RPi's, you should be able to build the operator image from source after applying the following patch:

diff --git Dockerfile Dockerfile
index 5d904ec4..e2bd9293 100644
--- Dockerfile
+++ Dockerfile
@@ -1,8 +1,11 @@
 # Build the operator binary
-FROM golang:1.15.2 as builder
+FROM --platform=$BUILDPLATFORM golang:1.15.2 as builder

+ARG TARGETPLATFORM
+ARG BUILDPLATFORM
 ARG GO_LDFLAGS
 ARG GO_TAGS
+
 WORKDIR /go/src/github.com/elastic/cloud-on-k8s

 # cache deps before building and copying source so that we don't need to re-download as much
@@ -15,7 +18,7 @@ COPY pkg/    pkg/
 COPY cmd/    cmd/

 # Build
-RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
+RUN CGO_ENABLED=0 GOOS=linux \
        go build \
             -mod readonly \
            -ldflags "$GO_LDFLAGS" -tags="$GO_TAGS" -a \
diff --git Makefile Makefile
index f8a12b77..68d0d5ef 100644
--- Makefile
+++ Makefile
@@ -341,10 +341,12 @@ switch-eks:
 #################################

 docker-build: go-generate generate-config-file
-   docker build . \
+   docker buildx build . \
        --build-arg GO_LDFLAGS='$(GO_LDFLAGS)' \
        --build-arg GO_TAGS='$(GO_TAGS)' \
        --build-arg VERSION='$(VERSION)' \
+       --platform linux/amd64,linux/arm64 \
+       --push \
        -t $(OPERATOR_IMAGE)

 docker-push:

I had to install qemu-user-static and qemu-system-arm and do the following to prepare my system first:

docker buildx create --driver docker-container --name multi-arch --platform linux/amd64,linux/arm64
docker buildx use multi-arch
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

Then, run:

make docker-build REGISTRY=my.docker.registry REGISTRY_NAMESPACE=myns

Obviously, the operator hasn't been properly tested under ARM so don't try this on production or systems you care about.

Thanks @charith-elastic. That was very helpful!

I was originally trying to avoid buildx since I figured that might not be readily available or desired on your build systems. I was following the docker manifest approach outlined here. I was successful with the build image since I could use FROM arm64v8/golang:1.15.2. I was unsuccessful with the ubi-minimal image because they don't publish separate images for each architecture. The buildx approach seems a lot more elegant, and it sounds like that's the direction you'd take anyway.

My build was successful with the approach you outlined above.

$ docker buildx imagetools inspect ghcr.io/jeffspahr/eck-operator-jspahr:1.3.0-SNAPSHOT-681c8953
Name:      ghcr.io/jeffspahr/eck-operator-jspahr:1.3.0-SNAPSHOT-681c8953
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest:    sha256:99a196f4db96174b956142da155434bf7f548060b192eed0ea9dfa035aa68cc0

Manifests: 
  Name:      ghcr.io/jeffspahr/eck-operator-jspahr:1.3.0-SNAPSHOT-681c8953@sha256:89a3d9259f284dc3bd90ab22c3a822f775aafcad8eed828a9eef14abadf6713d
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

  Name:      ghcr.io/jeffspahr/eck-operator-jspahr:1.3.0-SNAPSHOT-681c8953@sha256:4a6fe64ef084b018b4fc12539e121b963ebda2491ada8bc600ef5edf967ba617
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64

I'll test running the image on arm later this week.

We share the CI system with other teams at Elastic, so this needs some coordination and extra work on the infrastructure and security sides to roll-out -- which can take some time.

Does Elasticsearch use the same build systems? If so they may have already solved this problem when they introduced multi arch builds.

I'm interested in contributing to this, but it's not obvious where I could assist now that the next step is digging into your build systems. I was going to submit a PR, but you've already got the Dockerfile and Makefile changes figured out. It also can't be merged until the build dependencies are worked through. Let me know if there's something I can do to keep this issue moving.

Linking similar efforts from other components of Elastic Stack:
https://github.com/elastic/beats/issues/18334
https://github.com/elastic/kibana/issues/72884

I made some changes and got this working. I submitted a PR for it https://github.com/elastic/cloud-on-k8s/pull/3849.

See the diff here:
https://github.com/elastic/cloud-on-k8s/pull/3849/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557

I tested this and confirmed it worked on a 3 node Raspberry Pi 4 Model B K3s v1.19.3 cluster.

$ make docker-build
$ make deploy

$ kubectl get no -o wide
NAME                STATUS   ROLES         AGE     VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
k3s-01a.spahr.dev   Ready    etcd,master   6m37s   v1.19.3+k3s1   192.168.2.201   <none>        Ubuntu 20.04.1 LTS   5.4.0-1019-raspi   containerd://1.4.0-k3s1
k3s-01b.spahr.dev   Ready    etcd,master   5m50s   v1.19.3+k3s1   192.168.2.202   <none>        Ubuntu 20.04.1 LTS   5.4.0-1019-raspi   containerd://1.4.0-k3s1
k3s-01c.spahr.dev   Ready    etcd,master   4m44s   v1.19.3+k3s1   192.168.2.203   <none>        Ubuntu 20.04.1 LTS   5.4.0-1019-raspi   containerd://1.4.0-k3s1   

$ kubectl get po -o wide -n elastic-system
NAME                 READY   STATUS    RESTARTS   AGE   IP          NODE                NOMINATED NODE   READINESS GATES
elastic-operator-0   1/1     Running   1          99s   10.42.2.3   k3s-01c.spahr.dev   <none>           <none>

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.9.2
  nodeSets:
  - name: default
    count: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true
      node.store.allow_mmap: false
EOF

$ oc get Elasticsearch -A
NAMESPACE   NAME         HEALTH   NODES   VERSION   PHASE   AGE
default     quickstart   green    1       7.9.2     Ready   4m9s

Thanks for the help @charith-elastic! Let me know what else I can do to get this PR accepted.

Bumping this issue to see how this support is progressing. This would yield significant cost savings for us on EKS.

Looking forward to this. I have been able to do a 5-6 pi elasticsearch instance, but it requires oversight on the hardware, which i wanted to abstract away from. It is tedious to inspect and monitor Every pi on an individual basis.

To get better, I thought it would be nicer to scale by using k8s, which my work does currently, but i wanted to implement on a smaller scale. It allows me to better monitor the systems hosting as well as the instance of elastic itself.

I am looking forward to this instance because I have a whole cluster on standby to run.

Any idea when this will be available?

We published ARM-64 images for the latest ECK 1.4.0 release.

Please note that at the moment you can only run a subset of the Elastic Stack applications on ARM platforms. Check for availability of ARM images for the applications before attempting to use ECK to deploy them.

Thanks for the work on this, @charith-elastic!

hi @charith-elastic I have tried to run eck-operator 1.5.0 and it did not work properly. is the operator still not supported on ARM? https://download.elastic.co/downloads/eck/1.5.0/all-in-one.yaml

@AlaaAttya can you share more details about what's not working? Any log or error message?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

barkbay picture barkbay  路  4Comments

SebastianCaceresUltra picture SebastianCaceresUltra  路  3Comments

anyasabo picture anyasabo  路  3Comments

nkvoll picture nkvoll  路  4Comments

deepaksinghcs14 picture deepaksinghcs14  路  4Comments