The iptables templating logic contains unreliable conditional checks that _may_ result in an undefined variable being referenced, which causes the provisioning run to fail hard.
Note that I was running on 0.7.0 in Tails, so it's possible the bug described cannot reproduced on release/0.8.0 and/or develop.
em1 (rather than eth0 / eth1 used in our test VMs). ./securedrop-admin install against hardware machines.Playbook run completes, machines reboot, everything is copacetic.
Ansible errors out with a cryptic message about undefined dict attributes. Machines are incompletely configured.
TASK [restrict-direct-access : Determine admin network - next compute admin network cidr] **************************************************
TASK [restrict-direct-access : Copy IPv4 iptables rules.] **********************************************************************************
fatal: [app]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'ipv4'"}
fatal: [mon]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'ipv4'"}
There's a history to the logic referred to here.
While testing against hardware, I applied this patch _against the 0.7.0 source_ and it resolved the problem:
diff --git a/install_files/ansible-base/roles/restrict-direct-access/templates/rules_v4 b/install_files/ansible-base/roles/restrict-direct-access/templates/rules_v4
index a7b78557..5c2f8082 100644
--- a/install_files/ansible-base/roles/restrict-direct-access/templates/rules_v4
+++ b/install_files/ansible-base/roles/restrict-direct-access/templates/rules_v4
@@ -95,7 +95,7 @@
# Permit direct SSH access.
# Allowed for staging and optionally for production (disables ssh over tor)
{% for interface in ansible_interfaces -%}
- {%- if hostvars[inventory_hostname]['ansible_'+interface].ipv4 -%}
+ {%- if 'ipv4' in hostvars[inventory_hostname]['ansible_'+interface] -%}
{%- set int_details = hostvars[inventory_hostname]['ansible_'+interface].ipv4 -%}
{%- set net_string = '/'.join([int_details.network,int_details.netmask]) -%}
{%- if ssh_ip|ipaddr(net_string) -%}
So next step should be to evaluate whether this bug occurs against the same hardware on the 0.8.0 release branch.
If this problem arose with enable_ssh_over_tor = true, then this should be resolved by #3466
Re-ran playbook on 0.8.0~rc2 and encountered no problems. As @redshiftzero mentioned, @dachary's changes in #3466 appear to resolve the problem. Still, I'm leaving this ticket open to be absolutely sure. We'll need to perform a fresh install of 0.8.0~rc2 on these hardware machines prior to release, so if another member of the team agrees the issue is resolved, let's close.
Yes, this ticket should be left open because it will show in situations where enable_ssh_over_tor = false.
Fair - let's see if anyone can reproduce with enable_ssh_over_tor = false (if we can't I advocate that we do not make late stage changes as this particular code path requires a fair bit of manual testing to ensure no regressions were introduced)
I also saw this issue when enable_ssh_over_tor = false was configured. (SSH only over LAN). @conorsch's fix worked for me.
On which branch?
@redshiftzero sorry I tested 0.8.0rc2
Confirming: you ran ./securedrop-admin install on the 0.8.0-rc2 tag and hit this bug with enable_ssh_over_tor = false on the HP Proliant? You then made the change suggested in the issue description here, and you were able to successfully install and SSH into both servers?
@redshiftzero yes that is absolutely correct. I did a fresh install on the 0.8.2-rc2 tag. I had enable_ssh_over_tor = false configured when I was installing on the Proliant server. It threw this error
TASK [restrict-direct-access : Copy IPv4 iptables rules.] **********************************************************************************
fatal: [app]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'ipv4'"}
fatal: [mon]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'ipv4'"}
I applied Conor's patch and the install continued and I was able to SSH into both the app and the mon server
OK excellent, thanks for the clarity. We should indeed make this suggested change
I reran a clean install of the 0.8rc2debs using the Ansible logic found in the release/0.8 branch. I set SSH_over_tor = no and was able to complete the clean install without an issues. This was using the an HP Proliant DL 368 G7 as the app and one Dell PowerEdge 620 as the mon server. I can confirm that this issue is resolved. On to full QA in #3512