Openshift-ansible: Openshift 3.11 : DNS[CNAME] Entries not created

Created on 22 Oct 2018  ·  16Comments  ·  Source: openshift/openshift-ansible

Description

OCP 3.11 environment with 1 master and 3 nodes ,

During the Installation of 3.11

Reference link : https://github.com/openshift/openshift-ansible/tree/master/playbooks/aws
in Step 2: Which provisions the openshift cluster :+1:
Fails in the TASK [openshift_control_plane : Wait for control plane pods to appear]:
TASK [openshift_control_plane : Report control plane errors] ************
fatal: [ec2-54-236-48-144.compute-1.amazonaws.com]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}

Version 3.11

ansible --version

ansible 2.6.5

git describe

v3.11.0-35-g65b6661

v.3.11
Steps To Reproduce
  1. Once the inventory and the provisioning_vars.yml file has been updated with the correct settings for the desired AWS account then we are ready to build an AMI.

$ ansible-playbook -i inventory.yml build_ami.yml -e @provisioning_vars.yml

  1. Now that we have created an AMI for our Openshift installation, there are two ways to use the AMI.

In the default behavior, the AMI id will be found and used in the last created fashion.
The openshift_aws_ami option can be specified. This will allow the user to override the behavior of the role and use a custom AMI specified in the openshift_aws_ami variable.
We are now ready to provision and install the cluster. This can be accomplished by calling all of the following steps at once or one-by-one. The all in one can be called like this:

$ ansible-playbook -i inventory.yml provision_install.yml -e @provisioning_vars.yml

Expected Results

Once this playbook completes, it should create the compute and infra node scale groups. These nodes will attempt to register themselves to the cluster.

Observed Results

After creating public and private subnets, CNAME entries should be created in Route53 with elb endpoints using playbooks, but these entries are not being created inside Route 53.

```
failed: [ec2-54-236-48-144.compute-1.amazonaws.com] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/bin/oc get pod master-etcd-ip-172-31-47-147.ec2.internal -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "Unable to connect to the server: dial tcp: lookup internal.api.shegde.sysdeseng.com on 172.31.47.147:53: no such hostn", "stdout": ""}}

##### Additional Information

cat /etc/redhat-release

Red Hat Enterprise Linux Server release 7.5 (Maipo)

cat inventory.yml

[OSEv3:vars]
debug_level=2
osm_etcd_image=registry.access.redhat.com/rhel7/etcd:3.2.22
ansible_user=ec2-user
ansible_become=yes
openshift_deployment_type=openshift-enterprise
openshift_release='3.11'
openshift_master_api_port=443
openshift_master_console_port=443

openshift_master_api_port=80

openshift_master_console_port=80

openshift_portal_net=172.30.0.0/16

openshift_portal_net=172.0.0.0/8
os_sdn_network_plugin_name='redhat/openshift-ovs-networkpolicy'
openshift_master_cluster_method=native
openshift_node_local_quota_per_fsgroup=512Mi
osm_use_cockpit=true
openshift_hostname_check=false
openshift_builddefaults_nodeselectors="{'node-role.kubernetes.io/infra': 'true'}"
openshift_hosted_router_selector='node-role.kubernetes.io/infra=true'
openshift_hosted_router_replicas=2
openshift_install_examples=true
openshift_examples_modify_imagestreams=true
openshift_master_bootstrap_auto_approve=True
oreg_url=registry.access.redhat.com/openshift3/ose-${component}:${version}
openshift_disable_check=package_version,memory_availability,disk_availability,docker_image_availability

openshift_master_default_subdomain=apps.shegde.sysdeseng.com
openshift_master_cluster_hostname=internal.api.shegde.sysdeseng.com
openshift_master_cluster_public_hostname=api.shegde.sysdeseng.com

Cloud Provider

openshift_cloudprovider_kind=aws
openshift_clusterid=shegde
openshift_cloudprovider_aws_access_key=****
openshift_cloudprovider_aws_secret_key=
*******

Hosted registry

openshift_hosted_manage_registry=true
openshift_hosted_registry_storage_kind=object
openshift_hosted_registry_storage_provider=s3
openshift_hosted_registry_storage_s3_accesskey=***
openshift_hosted_registry_storage_s3_secretkey=**
openshift_hosted_registry_storage_s3_bucket=
*
openshift_hosted_registry_storage_s3_region=us-east-1
openshift_hosted_registry_storage_s3_chunksize=26214400
openshift_hosted_registry_storage_s3_rootdirectory=/registry
openshift_hosted_registry_pullthrough=true
openshift_hosted_registry_acceptschema2=true
openshift_hosted_registry_enforcequota=true
openshift_hosted_registry_replicas=2
#

Aggregated logging

openshift_logging_install_logging=True
openshift_logging_storage_kind=dynamic
openshift_logging_storage_volume_size=25Gi
openshift_logging_es_cluster_size=3
openshift_logging_kibana_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_curator_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

# Metrics

openshift_metrics_install_metrics=True
openshift_metrics_storage_kind=dynamic
openshift_metrics_storage_volume_size=25Gi
openshift_metrics_hawkular_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_cassandra_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_heapster_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_enable_service_catalog=True
template_service_broker_install=True

#

# cluster specific settings maybe be placed here

[masters]

[masters:vars]
openshift_node_group_name=node-config-master

[etcd]

[etcd:children]
masters

[infra]

[infra:vars]
openshift_node_group_name=node-config-infra

[nodes]
[nodes:children]
masters

[nodes:vars]
openshift_node_group_name=node-config-compute

Most helpful comment

You should remove your AWS Credentials from your post.

All 16 comments

You should remove your AWS Credentials from your post.

Still have AWS credentials uncensored(and they'd be in the edit history anyway). https://help.github.com/articles/tracking-changes-in-a-comment/#deleting-sensitive-information-from-a-comments-history

The S3 credentials are still here in plaintext as well. At this point, you should probably refresh these credentials even if you can manage to hide them here. They've been up for almost a full day.

I see that you've edited the post to remove the credentials but they're still available in plaintext from the Edit history. You should follow @noramtkane's post but, more importantly, you need to change out those keys in your environment ASAP.

All the sensitive keys are hidden and also changed in the environment.

On Tue, Oct 23, 2018 at 2:25 PM Scott Williams notifications@github.com
wrote:

I see that you've edited the post to remove the credentials but they're
still available in plaintext from the Edit history. You should follow
@noramtkane https://github.com/noramtkane's post but, more importantly,
you need to change out those keys in your environment ASAP.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/openshift/openshift-ansible/issues/10478#issuecomment-432363817,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AnLnAWPuyZ-8rGdu4eG8uTENVEuRRODzks5un18fgaJpZM4X0Nsp
.

@DirectedSoul1 FYI - they're not hidden if you look at the edit history of your post. It doesn't really matter at this point if you've expired the credentials, but you should follow @noramtkane 's link if you want to remove it from the edit history as well.

We were seeing much the same thing and the problem exists in 3.10 also.
A recent backport from 3.11 to 3.10 introduced a change which dispenses with the use of openshift_hostname in favour of openshift_kubelet_name_override.

In our case the master api service was not starting because the etcd url written into /etc/origin/master/master-config.xml was being extracted from hostvars and was actually incorrect, being a combination of the host part of the aws private_dns_name and the domain name as returned by the dns server.

I went backwards and forwards with this attempting to get it working without modifying the openshift-ansible playbooks.

The solution for us was to change the old openshift_hostname references in the inventory to openshift_kubelet_name_override.

After that everything worked fine in a 3.10 installation I have yet to test the upgrade to 3.11, I will let you know what I find.

@vwbusguy and @malagant thanks for the heads up. The creds were removed from IAM after this was posted.

1) You should be using the 3.11 rpm or git tag upstream/release-3.11
2) Please provide the contents of extra_vars (provisioning_vars.yml)
3) Please do git status if /roles/openshift_aws/defaults/main.yml was modified please provide contents.
4) re inventory file...
osm_etcd_image is not necessary. Please remove this override.
5) Please log into masters and provide output from mount and system status atomic-openshift-node

Also ... is the * shown in the inventory file a bad copy/paste or is it actually in the file? If so please remove it.

1. [root@localhost aws]# git tag | grep v3.11.

v3.11
v3.11.0

2. cat provisioning_vars.yml
`openshift_deployment_type: 'openshift-enterprise'
openshift_version: '3.11'
openshift_master_api_port: 80

openshift_aws_clusterid: *
openshift_aws_region: us-east-1

openshift_aws_region: us-east-2

openshift_aws_create_launch_config: true
openshift_aws_create_scale_group: true
openshift_aws_create_vpc: true
openshift_aws_vpc:
name: "{{ openshift_aws_vpc_name }}"
cidr: 172.31.0.0/16
subnets:
us-east-1:
- cidr: 172.31.48.0/20
az: "us-east-1e"
# default_az: true
- cidr: 172.31.32.0/20
az: "us-east-1a"
- cidr: 172.31.16.0/20
az: "us-east-1c"
# Name of the vpc. Needs to be set if using a pre-existing vpc.
openshift_aws_vpc_name: "{{ openshift_aws_clusterid }}"
openshift_aws_ssh_key_name: *
openshift_aws_build_ami_ssh_user: ec2-user
container_runtime_docker_storage_type: overlay2
container_runtime_docker_storage_setup_device: /dev/xvdb
# must specify a base_ami when building an AMI
openshift_aws_base_ami: ami-0d70a070
# when creating an encrypted AMI please specify use_encryption
openshift_aws_ami_encrypt: False

# Create an s3 bucket.
openshift_aws_create_s3: True

# openshift_aws_elb_name will be the base-name of the ELBs.
openshift_aws_elb_name: "{{ openshift_aws_clusterid }}"

# custom certificates are required for the ELB
openshift_aws_create_iam_cert: false
openshift_cluster_autoscaler_install: false
openshift_cluster_autoscaler_node_groups:
-name: "{{ openshift_aws_clusterid }} compute group 1"
min: "5"
max: "20"
-name: "{{ openshift_aws_clusterid }} dummy group 1"
min: "4"
max: "20"

# Red Hat Subscription Manager #

rhsub_user: '----------'
rhsub_pass: '----------'
rhsub_pool: '----------'`

3. Modified Contents of main.yml
[root@localhost aws]# git status
HEAD detached at upstream/release-3.11
# cat main.yml
openshift_aws_ami_build_set_gquota_on_slashfs: False

openshift_aws_launch_config_bootstrap_token: ''

openshift_aws_users: []

openshift_aws_copy_base_ami_tags: False

openshift_aws_ami_tags:
bootstrap: "true"
openshift-created: "true"
parent: "{{ openshift_aws_base_ami | default('unknown') }}"
openshift_aws_s3_mode: create
openshift_aws_s3_bucket_name: "{{ openshift_aws_clusterid }}-docker-registry"

openshift_aws_vpc_tags:
Name: "{{ openshift_aws_vpc_name }}"

openshift_aws_vpc:
name: "{{ openshift_aws_vpc_name }}"
cidr: 172.31.0.0/16
subnets:
us-east-1:
- cidr: 172.31.48.0/20
az: "us-east-1c"

openshift_aws_create_dns: False
openshift_aws_dns_provider: "route53"

openshift_aws_elb_names:
-"{{ openshift_aws_elb_master_internal_name }}"
-"{{ openshift_aws_elb_master_external_name }}"
-"{{ openshift_aws_elb_infra_name }}"

openshift_aws_dns_records:
# Pertains to inventory file key: openshift_master_cluster_public_hostname
'api':
type: 'CNAME'
# A public or private vpc attached Route53 zone will be created based on
# private_zone boolean. Split-tier dns is supported.

private_zone: False
value: "{{ l_openshift_aws_elb_facts[openshift_aws_elb_master_external_name].dns_name }}"
# Pertains to inventory file key: openshift_master_cluster_hostname
'internal.api':
type: 'CNAME'
private_zone: False
value: "{{ l_openshift_aws_elb_facts[openshift_aws_elb_master_internal_name].dns_name }}"
# Pertains to inventory file key: openshift_master_default_subdomain
'*.apps':
type: "CNAME"
private_zone: False
value: "{{ l_openshift_aws_elb_facts[openshift_aws_elb_infra_name].dns_name }}"
'logs':
type: "CNAME"
private_zone: False
value: "{{ l_openshift_aws_elb_facts[openshift_aws_elb_infra_name].dns_name }}"
'metrics':
type: "CNAME"
private_zone: False
value: "{{ l_openshift_aws_elb_facts[openshift_aws_elb_infra_name].dns_name }}"
'registry':
type: "CNAME"
private_zone: False
value: "{{ l_openshift_aws_elb_facts[openshift_aws_elb_infra_name].dns_name }}"

4. re inventory file...
osm_etcd_image is not necessary. Please remove this override.
------Removed this value from inventory.yml

5. mount and systemctl status atomic-openshift-node
[root@ip-172-31-20-135 master]# df -khT
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda2 xfs 100G 3.0G 98G 3% /
devtmpfs devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs tmpfs 7.8G 1.6M 7.8G 1% /run
tmpfs tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/docker_vg-docker--root--lv xfs 100G 1.8G 99G 2% /var/lib/docker
overlay overlay 40G 8.0K 40G 1% /var/lib/docker/overlay2/84030a8a04f73c13f2a45da2fc99d0e11cd3c3c00d5a0e60f9217ad81688858d/merged
shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/5ddab84598476114b375ec0826197919e794aab62311b0415482d064eeaa1772/shm
overlay overlay 40G 8.0K 40G 1% /var/lib/docker/overlay2/c3014b5f2ace59f999e35f572416cd026e8c183d3624b3dcd5cd1f2d7c279b8b/merged
shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/c99a0e8d01876b55a11afe553ef1e60e7b9275ce50d2360ecb417f7354b98328/shm
overlay overlay 40G 8.0K 40G 1% /var/lib/docker/overlay2/d233a711188116313f42693ec0b7186785972510ddad2a2644f74b17cbc5610e/merged
shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/255d09ae68b3701f23980513e49368676995afcade683f507154401d7940be32/shm
overlay overlay 40G 8.0K 40G 1% /var/lib/docker/overlay2/9bbedac4cc09bd190a6d877290f32159bf1d924a97933890048f552cd71b32bf/merged
overlay overlay 40G 8.0K 40G 1% /var/lib/docker/overlay2/c2772576cae3dfe86f8370120d8912c2c15ae546e6a65bf3c5a0b99045d04d8b/merged
overlay overlay 40G 20K 40G 1% /var/lib/docker/overlay2/a37681c5a5ff4f601fae17f99c3ec53f0b8acc1d57ecbab16b3b79990e0186ab/merged
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
overlay overlay 40G 8.0K 40G 1% /var/lib/docker/overlay2/289bb5c73a057053e06b879a98c23f4b2ba2ea9b39f3b45afbfee76c5dbe7dbe/merged
shm tmpfs 64M 0 64M 0% /var/lib/docker/containers/80e8e644148fc819f7d688b1ae900b7a9009a8e93d96c126bdad086f9570c31b/shm
#systemctl status atomic-openshift-node

[root@ip-172-31-20-135 master]# systemctl status atomic-openshift-node
● atomic-openshift-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/atomic-openshift-node.service.d
└─override.conf
Active: active (running) since Tue 2018-10-30 15:18:39 UTC; 13min ago
Docs: https://github.com/openshift/origin
Main PID: 21767 (hyperkube)
Memory: 52.6M
CGroup: /system.slice/atomic-openshift-node.service
└─21767 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --an.

Submit a correction or amendment below (click here to make a fresh posting)

Please edit the extra_vars content using GitHub code markdown so we can all view it sanely.

Number 1 issue ... openshift_aws_ami_build_set_gquota_on_slashfs: False. This is manifested in /dev/xvda2 xfs 100G 3.0G 98G 3% /. Grpquota option is not available for OpenShift emptydir storage on / mountpoint. Frankly I'm surprised atomic-openshift-node.service is in running state.

Number 2 issue ... openshift_aws_create_dns: False. Did you create all the dns CNAMES manually? You would be running the installer then bouncing between web browser tabs to query the elb names and creating the resources in Route53. I would set this to True and automate that.

RECAP...

  • Set openshift_aws_ami_build_set_gquota_on_slashfs: True
  • Set openshift_aws_create_dns: True
  • Run playbooks/aws/openshift-cluster/uninstall.yml to terminate everything (or do it manually in the dashboard but be sure you get everything)
  • Recreate the ami via build_ami.yml per procedure
  • Recreate the cluster via provision_install.yml per procedure

The above modifications in the playbooks worked and I was able to deploy the OCP 3.11 successfully , Thanks for all the precious suggestions.
# oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-5ntg9 1/1 Running 0 35m
docker-registry-1-gd84t 1/1 Running 0 35m
registry-console-1-f476q 1/1 Running 0 35m
router-1-gj9kz 1/1 Running 0 36m
router-1-rsjzh 1/1 Running 0 36m
# oc get nodes

NAME STATUS ROLES AGE VERSION
ip-172-31-17-199.ec2.internal Ready compute 37m v1.11.0+d4cacc0
ip-172-31-26-34.ec2.internal Ready master 16h v1.11.0+d4cacc0
ip-172-31-29-222.ec2.internal Ready infra 37m v1.11.0+d4cacc0
ip-172-31-34-93.ec2.internal Ready master 16h v1.11.0+d4cacc0
ip-172-31-45-134.ec2.internal Ready compute 37m v1.11.0+d4cacc0
ip-172-31-46-54.ec2.internal Ready infra 37m v1.11.0+d4cacc0
ip-172-31-49-134.ec2.internal Ready master 16h v1.11.0+d4cacc0
ip-172-31-53-195.ec2.internal Ready compute 37m v1.11.0+d4cacc0

Was this issue resolved by turning on openshift_aws_ami_build_set_gquota_on_slashfs and openshift_aws_create_dns ?

Yes, I managed to get further by setting openshift_aws_ami_build_set_gquota_on_slashfs to true, rigging down everything and taking it up again.

I find it a bit obscure, openshift_aws_ami_build_set_gquota_on_slashfs is set to false by default and apparently not very well documented, still things just doesn't work without it.

Was this page helpful?
0 / 5 - 0 ratings