On Master - private networking with HA masters
So I ran into and issue where I had a LB named "LoadBalancer/bastion.clove-weave-latest-small-noeth.dev.datapipe.io"
The LB did not create, and the task just looped. The api LB did create with:
Load balancer: api-clove-weave-latest-small-noe
Notice the missing letters in the name. The cluster came up, but the LB for the bastion never did.
kops eventually errored:
error running tasks: deadline exceeded executing task LoadBalancer/bastion.clove-weave-latest-small-noeth.dev.datapipe.io. Example error: subnet changes on LoadBalancer not yet implemented: actual=[subnet-0408014d subnet-3de3ba10 subnet-3f396e64] -> expected=[subnet-38104815 subnet-d92c7b82 subnet-fc070eb5]
So I am guessing that we need to limit the length of the cluster name, or use a guid of sorts instead. We cannot use tag names that are too long.
This only seemed to impact the LB, as instances where spun up, and even k8s started fine. The bastion lb's name was just too long.
According to the AWS docs, the name is limited to 32 characters, which matches up nicely with the cutoff you saw.
I wonder where else AWS has name limits that we don't check for.
So I wrote this function ages ago that was designed for doing exactly what we see above..
Looks like we finally hit the default case, of truncating the name because we couldn't figure out a better thing to do.
Enforce a shorter cluster name, because the ELB 32 char limit is non-negotiable.
Use some sort of smart ELB naming scheme other than trying to be smart about the ELB name in the function pasted above. This is how kops used to do things.. the problem here (and the inspiration for naming it otherwise) is that if we generate a random ID and track that in memory.. the user in the console will have no clean (or easy) way to determine which ELB is for which cluster other than tracking it down via other resources (several clicks in AWS)
Any body else have any other ideas?
I strongly dislike using random GUID, since as you say it is not user friendly.
Enforcing a short cluster name seems like the way to go. Just need to pick a good size that allows us enough room to add prefixes/extra detail (e.g. bastion. + name in this case).
Agree @yissachar!
Once we do the math... we can put a limit in.. probably in validation. Anybody interested in PRing this? I can give hints if needed, err code it myself if anyone volunteers me!
@kris-nova, I'll take this one. Will ping you if hints are needed :)
Sounds good @robinpercy! Looking forward to it.
@kris-nova @yissachar: I think I misinterpreted what you meant by "enforcing a short cluster name". On first glance, I was thinking of just validating that the string passed with --name was under N characters long. But, obviously, that restricts the domain name for the cluster as well. Mind pointing out what I'm missing?
@robinpercy you are not missing anything :) We cannot restrict the domain name, but need to figure out how to handle a name that is unique, and will fix in the naming scheme that AWS has.
Thoughts?
Ok, thanks for the clarification @chrislovecnm. Wanted to make sure I don't take off in a completely wrong direction :)
I haven't really dug into the root problem yet, but my first instinct is to build on the truncation approach that @kris-nova posted above. If we're concerned about losing top level domain info, we could "collapse" the domain (eg "bastion.foo.mysubdomain.example.com" -> "bastion.foo.m.e.c" ) to retain some of the information before doing a hard chop at 32 chars.
My limited understanding is that most of the work would be in making other, dependent, resources/tasks aware of the truncation scheme.
We are hitting the same issue. It would be good to validate the name length against a minimum that's short enough when prepended with {api,bastion} etc.
@ahawkins: From what I've found, the ELB issue only occurs if you spin up two clusters that share a name prefix. (Or in my case, I had ELBs from an old cluster hanging around). What's happening is the "old" ELBs matched by name when the "new" cluster is being built and have an unexpected 'actual' state. If I create a cluster with a long name in a clean environment, the ELBs come up fine with truncated names (though I have noticed role names fail when the cluster name gets close to 64 chars).
Can you confirm that this is consistent with what you're seeing? I want to make sure I catch all cases.
@robinpercy so foo.bar.baz should conflict with bar.baz. Do I understand correctly?
@ahawkins Yeah, that's what I'm seeing. Not quite, I mean foo.bar.baz would conflict with foo.anything. Both result in trying to name the ELB "bastion-foo".
Edit: misread the first time :)
Did we work this out? If I remember we had a couple of PRs that went though to address this problem. @brandoncole this was one of the issues that I mentioned about naming.
@chrislovecnm: Yeah, @justinsb sorted it out https://github.com/kubernetes/kops/pull/2019#issuecomment-285842762
Closing ... thanks @robinpercy