Kops: AWS ENA Driver Not Enabled On Default AMI

Created on 20 Jan 2017  Â·  29Comments  Â·  Source: kubernetes/kops

Hello,

I noticed that the ENA (Enhanced Networking Adapter) isn't enabled by default in the AMIs (1.4) that kops uses by default:

root@ip-172-21-35-87:~# cat /etc/debian_version
8.6
root@ip-172-21-35-87:~# ethtool -i eth0
driver: vif
$ kops version
Version 1.5.0-alpha3 (git-51b7644)

AMI Version: k8s-1.4-debian-jessie-amd64-hvm-ebs-2016-12-05 (ami-03fdf814)

References:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html
https://wiki.debian.org/Cloud/AmazonEC2Image/Jessie

areimage blocks-next lifecyclrotten

Most helpful comment

@bcorijn I found a k8s-1.7-debian-jessie AMI that I spun up on an EC2 instance in my k8s VPC. I then followed this guide to install and enable ENA on it. Once installed I made an AMI from that instance which is what I'm using now for my kops created nodes.

All 29 comments

Which instance size was this on? Did it have enhanced networking available?

It was an R4.xlarge. Should be available.

On Thu, Jan 19, 2017, 6:34 PM Justin Santa Barbara notifications@github.com
wrote:

Which instance size was this on? Did it have enhanced networking available?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kops/issues/1558#issuecomment-273961424,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABfZI82__Lmc0uTMbGSRZjSui0cR_6Tlks5rUB1PgaJpZM4Lo18I
.

I see now - this is a separate driver from the ixgbevf driver - my mistake.

We'll have to add a module to the base image:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html

And we'll also have to enable EnaSupport on the base image

It's not entirely relevant, but I confirmed that the ixgbevf driver is installed and enabled on e.g. a c4.large:

> ethtool -i eth0
driver: ixgbevf
version: 2.12.1-k
firmware-version: 
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

Awesome. Definitely good to know. I wonder what the real difference is
between the ENA and the intel one. Anyway, really appreciate your efforts.

On Thu, Jan 19, 2017 at 8:54 PM Justin Santa Barbara <
[email protected]> wrote:

It's not entirely relevant, but I confirmed that the ixgbevf driver is
installed and enabled on e.g. a c4.large:

ethtool -i eth0
driver: ixgbevf
version: 2.12.1-k
firmware-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kops/issues/1558#issuecomment-273977488,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABfZI4hu8PO_Lcz4s02xz_H1oK7U0BPlks5rUD4fgaJpZM4Lo18I
.

AWS recommends ixgbevf > 2.14 for stability and performance.
ENA driver is needed on those beefy 20Gbit/s instances (R4 family, m4.16xlarge). This driver is not available on stock Ubuntu images and has to be installed manually.

Does kops/kubernetes provide any 'official' AMIs ? always thought that it utilises 'bare' images.

edit: ok, I see we do, so I guess this should be added (bumped ixgbevf and ENA driver)

Does anyone know if the ixgbevf 2.12.1-k in Debian 8.6 k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-01-09 (ami-aaf84aca) is affected by the stability issues, of TCP timeouts and just random packet corruption?

edit: the reason I ask is I understand that ixgbevf (2.12.1-k) is an out of tree version and the version number does not dictate what patches for ixgbevf were actually added to the kernel from the out of tree one.
I'm getting many of these as well:

[22117.455919] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[22117.489707] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

On ixgbevf:

On the "k8s AMIs" (which are debian jessie with a 4.4. kernel), we're running the ixgbevf driver from the linux kernel, not the out of tree version. The versioning numbering appears to not correspond directly. We switched to this as part of the move to the 4.4 kernel; with the jessie kernel we were seeing kernel panics, particularly on m4 instances (_with_ the AWS-recommended driver): https://github.com/kubernetes/kubernetes/issues/30706

I compared the 2.12.1 driver from sourceforge with the 2.14.2 driver (I could not find the upstream version control):

(there are more differences, but these seemed a reasonable sample of non-trivial changes)

@Jkirsher I see you do a lot of the work on the ixgbevf driver in the kernel... Is it reasonable to run the ixgbevf driver from the 4.4 LTS kernel on AWS? Any guidance is greatly appreciated!

On Thu, 2017-02-02 at 20:40 -0800, Justin Santa Barbara wrote:

On ixgbevf:
On the "k8s AMIs" (which are debian jessie with a 4.4. kernel), we're
running the ixgbevf driver from the linux kernel, not the out of tree
version. The versioning numbering appears to not correspond directly. We
switched to this as part of the move to the 4.4 kernel; with the jessie
kernel we were seeing kernel panics, particularly on m4 instances (with
the AWS-recommended driver): kubernetes/kubernetes#30706
I compared the 2.12.1 driver from sourceforge with the 2.14.2 driver (I
could not find the upstream version control):
2.14.2 introduced ixgbevf_check_tx_hang, added here: torvalds/linux@e0840
0b, and in kernel >= 4.0
2.14.2 introduced ixgbevf_set_ivar, added in the initial commit of the
driver into the kernel torvalds/linux@92915f7 . Note that a version
somewhere in between 2.12.1 and 2.14.2 was labeled in the kernel as
1.0.0-k0. This suggests that the -k scheme is not comparable to the non-k
scheme.
2.14.2 introduced an errata check, in the kernel in torvalds/linux@8bae1b
2 .
(there are more differences, but these seemed a reasonable sample of non-
trivial changes)
@Jkirsher I see you do a lot of the work on the ixgbevf driver in the
kernel... Is it reasonable to run the ixgbevf driver from the 4.4 LTS
kernel on AWS? Any guidance is greatly appreciated!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Using the in-kernel driver is preferred, unless you are seeing issues. At
which point, our first suggestion is to try the sourceforge.net driver, to
see if the issue goes away (which would mean we fixed a known issue and
have not pushed the fix upstream yet). For the most part, the upstream
driver is kept up-to-date on a regular basis, so there should not be a
large discrepancy between the in-kernel and out-of-kernel drivers.

Thanks @Jkirsher so much for the guidance on the ixgbevf driver :-)

FWIW The linux-aws package in Ubuntu 16.04 is a huge performance win for a number of reasons, as well as providing the ENA/ixgbevf drivers out of the box. Other than that package nothing is required save marking the image as "SR-IOV" ready - Perhaps kube should install this when detecting it's installing on Ubuntu on AWS? I've seen pretty dramatic wins on the networking and IO layer using this kernel, and I believe there are fixes for T* instances as well, which is fairly common in the kube ecosystem. Additionally, it works everywhere - even on legacy servers that only support the old xen vif driver!

https://insights.ubuntu.com/2017/04/05/ubuntu-on-aws-gets-serious-performance-boost-with-aws-tuned-kernel/

Also for reference, packer sets the two flags together when enhanced_networking is enabled: https://github.com/hashicorp/packer/blob/81522dced0b25084a824e79efda02483b12dc7cd/builder/amazon/instance/step_register_ami.go#L32-L40

Another +1 for this.

Running kops 1.5.3, kubernetes 1.5.5 on r4.xlarge shows the vif driver in play:

# ethtool -i eth0
driver: vif
# cat /etc/debian_version
8.8

But I concur that a c4.2xlarge shows:

$ sudo ethtool -i eth0
driver: ixgbevf
$ cat /etc/debian_version
8.7

(The latter cluster will be updated later this week).

I have an m4.large instance that supports enhanced networking and thus should run the ixgbevf driver. However, it's running the vif driver. Can we get kops to set all of this up for us in the k8s debian AMI? Otherwise I'll probably just move over to Ubuntu 16.04.

From my limited experience the ENA driver is a must.

I had a gRPC service that was experiencing poor throughput on a Kops created cluster and I narrowed it down to the fact that the default Debian image kops uses did not have the ENA installed. After making my own AMI from the kops default Debian image + ENA I have seen a ~7x improvement in throughput on i3.xlarge nodes (single node throughput increased from 1.2Gbps -> 8.03Gbps).

Cluster Setup:
Node Size: i3.xlarge
Topology: private
Networking: weave

After seeing this mentioned on HN I double-checked my own cluster, and sure enough my R4.XL machines running the kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 AMI are not running with ENA enabled.

$ cat /etc/debian_version
8.9
$ sudo modinfo ena
modinfo: ERROR: Module ena not found.
$ sudo ethtool -i eth0
driver: vif
version:
firmware-version:

What was your process to build a custom image @jmasonISP? The official Debian image claims this should already be supported, so wondering what is the missing piece here.

We are on k8s 1.7.0 (with kops) and using amis

ami-b2137ea4 k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02 us-east-1
ami-800803e3 k8s-1.6-debian-jessie-amd64-hvm-ebs-2017-05-02 ap-southeast-2

Driver seems to be vif and not ixgbevf:

driver: vif
version: 
firmware-version: 
bus-info: vif-0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
admin@ip-172-20-74-151:/sbi

@bcorijn I found a k8s-1.7-debian-jessie AMI that I spun up on an EC2 instance in my k8s VPC. I then followed this guide to install and enable ENA on it. Once installed I made an AMI from that instance which is what I'm using now for my kops created nodes.

We are running Kops 1.8 with k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02 (ami-06a57e7e) where it says that ENA is enabled. However, when i run check on that instance it returns simple network:
aws ec2 describe-instance-attribute --instance-id 123 --attribute sriovNetSupport
{
"InstanceId": "123",
"SriovNetSupport": {
"Value": "simple"
}
}

I also ssh into the instance and ran lsmod | grep ixgbevf to verify that the needed module for ENA is installed and it is not there?!!!

@mv78 have you tried the debian stretch image?

I have not, but looking at source code https://github.com/kubernetes/kube-deploy/blob/master/imagebuilder/templates/1.8-stretch.yml, I dont see "ixgbevf" module included either.

@mv78 that is because the base image already has it in-kernel, so there is no need to install it on top.
Your SriovNetSupport looks fine to me, if ENA is not supported that property will be empty, while a value of simple means that enhanced networking is enabled. (see documentation)

It all depends on which instance type you are using, there's different ways to do ENA as described here. If you use one of the newer types, you need to check for the ENA driver, not the ixgbevf one.
What instance type are you using?

i was using m5 .

@veksler you probably where, but you need to be using the stretch ami with m5's

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

I am not sure if this one should still be open? As far as I am aware the current kops default AMI's have both types of ENA supported.
The C5/M5 instances still have issues with EBS support but ENA should not be blocking.

That is correct.

On Wed, May 30, 2018, 3:26 AM Bruno notifications@github.com wrote:

I am not sure if this one should still be open? As far as I am aware the
current kops default AMI's have both types of ENA supported.
The C5/M5 instances still have issues with EBS support but ENA should not
be blocking.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kops/issues/1558#issuecomment-393111245,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGs-2Xmdox7ZL7fbogdOVxcBPN_pcsI8ks5t3nPKgaJpZM4Lo18I
.

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

/close

Was this page helpful?
0 / 5 - 0 ratings