ubuntu 16.04
digitalocean
Devices are sometimes stuck "connecting" randomly. When one device is experiencing the problem other devices are still working fine. The device is usually stuck for about 3-5 minutes and then everything returns to normal.
This might be a problem with your router. See #520 and #727 for suggestions on how to fix it.
Update: The problem occurs with 3 different wifi networks as well as cellular
I'm not sure if this is related, just posting fyi; i have noticed this issue on my iPhone when switching from Wifi to 4G (when i go just out of reach of the Wifi network.)
My iPhone then sometimes goes completely offline (i have connect on demand set to enabled ) and if i go to settings -> VPN i see Status = Connecting...
It stays stuck like this forever, until i toggle the "Status" switch to "Off", and back to "On", and then it will say "Connected" and everything works again.
(I should note that the Algo server i am running was deployed a long time ago, in December 2017, on AWS EC2)
@notDavid I'm having the same problem. I deployed the ansible2.5 branch, but no improvement.
I've even had the disconnect / reconnect loop on my MBA recently. I'm not sure what the problem is. It almost feels like some type of session timeout issue. If I leave the VPN disconnected for a few hours it will allow me to reconnect.
Would it be possible when the device disconnects the session is not properly terminated on the VPN?
@notDavid I disabled charon.dos_protection and I haven't had the problem all weekend.
@QuentinMoss thanks for sharing that! I've disabled charon.dos_protection also, lets see if the problem reoccurs in the next week...
@notDavid @QuentinMoss I also have a test server using the ansible2.5 branch. Where's the charon.dos_protection setting in order to disable it? strongswan.conf? then restart?
@digeratus in file /etc/strongswan.d/charon.conf search for dos_protection
Thanks. Then "ipsec reload/restart" for it take? what command?
On Jul 24, 2018, at 6:29 PM, David notifications@github.com wrote:
@digeratus in file /etc/strongswan.d/charon.conf search for dos_protection
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
I disabled charon.dos_protection and I haven't had the problem all weekend.
@QuentinMoss This solved the connection issues for me as well... great find!
Like @QuentinMoss I've had connect/disconnect loops on a deployment using the ansible2.5 branch, in my case with an iPad. I've also not seen the problem with dos_protection disabled.
Thanks folks. Need to get it covered in the docs
Just a follow-up, even with disable dos_protection I still get the connect/disconnect loops on occasion. I can post a log, if someone gives me an idea what to grep for - charon seems to post about 200 entries for each connect/disconnect loop.
I also had a connect/disconnect loop with dos_protection off. When it happens I see log entries like this:
Aug 14 13:53:29 vpn4 charon[13590]: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 943, the same policy for reqid 16 exists
Aug 14 13:53:29 vpn4 charon[13590]: 08[IKE] unable to install IPsec policies (SPD) in kernel
The problem wouldn't clear until I restarted strongswan. The problem server is on DO. The iOS device was still able to connect to an older Algo server I have on EC2.
I configure my servers with uniqueids=yes in /etc/ipsec.conf so my setup is not quite the same as other Algo users. I also use the DO firewall.
Edited to add: I've seen other odd iOS networking problems since iOS 11.4.1 was released on 2018-07-09. Maybe the connect/disconnect loop is related.
@TC1977 and @davidemyers email me please! dan trailofbits
@davidemyers I scanned my logs and also found similar entries for the period of time in question:
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.2/32 out for reqid 26, the same policy for reqid 24 exists
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[IKE] unable to install IPsec policies (SPD) in kernel
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[IKE] failed to establish CHILD_SA, keeping IKE_SA
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy 0.0.0.0/0 === 10.19.48.2/32 out
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] policy still used by another CHILD_SA, not removed
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] not updating policy 0.0.0.0/0 === 10.19.48.2/32 out [priority 383615, refcount 1]
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy 10.19.48.2/32 === 0.0.0.0/0 in
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy 10.19.48.2/32 === 0.0.0.0/0 fwd
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] deleting policy ::/0 === fd9d:bc11:4020::2/128 out
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] policy still used by another CHILD_SA, not removed
Aug 14 16:58:53 ip-172-31-39-82 charon: 16[KNL] not updating policy ::/0 === fd9d:bc11:4020::2/128 out [priority 334463, refcount 1]
Maybe this is more of a strongswan problem than an Algo install problem? Do they have similar reports in the strongswan docs? I'll try googling.
Check out this issue which sounds similar: https://wiki.strongswan.org/issues/431
Going by the first solution suggested, I inserted the line reauth=no in /etc/ipsec.conf after rekey=no. I have no idea if this will work, and no idea if it will completely screw security - please if someone who knows what they're doing can comment? Anyway, I'll try it like this and see if I get the loop again. At the very least, I can still connect to the server and I can still web browse from client.
I think this issue sounds more like what we're seeing: https://wiki.strongswan.org/issues/2607
I'm trying auto=route instead of auto=add as suggested by the issue reporter to see if that helps.
@davidemyers Interesting that in that issue, though, the guy had rekey=yes and reauth=yes. Also I don't have any XFRMA messages or "trap not found" messages in my error logs. Just to compare notes, I'm running a version of Algo from late May (not the ansible2.5 branch), and 'ipsec version' gives me 'Linux strongSwan U5.6.2/K4.15.0-1019-aws'.
The next thing I'll try, if "reauth=no" doesn't work, will be to enable "make_before_break" in /etc/strongswan.d/strongswan.conf. Of course it might then mess up non-macOS or non-iOS devices that have a different IKEv2 implementation. (maybe that explains why you've seen networking errors since iOS 11.4.1?)
When I had a device in a connect loop, I ran ip xfrm policy list and found entries for the conflicting reqid that should have been deleted (there were out entries with no matching in or fwd entries). I don't really understand what this means, but I thought it could be the same symptom being reported in issue 2607. I made sure the bad entries were cleaned out by running ipsec stop; ip xfrm policy flush; ipsec start.
If your server was deployed in May it's using the old cipher suite, and you appear to be on AWS. I'm testing on DO with an ansible2.5 branch server with the new cipher suite. I assume you're using the Algo default of uniqueids=never while I'm using the strongSwan default of uniqueids=yes. So that gives us a few things we can rule out.
I also have an old cipher suite server on EC2 deployed in late May and I've never seen this problem there. Weird.
You're correct; I'm on the old cipher suite, on AWS. Also on checking my logs tonight, it appears I continued to have "unable to install policy" errors, and old policies upon checking sudo ip xfrm policy list. I didn't notice any problems connecting, though. I'll delete the "reauth=no" line in /etc/ipsec.conf, and try the "auto=route" line.
So after two days of running with "auto=route" in ipsec.conf and reviewing syslog, I continue to get "unable to install policy" errors, and sudo ip xfrm policy list continues to show dead policies. Here's the thing, though. I'm not sure this is causing any problems that I've noticed, and I definitely haven't seen any "Connecting..." "Disconnecting..." loops. My wife had a lot of problems connecting with her iPhone, but I just reinstalled the mobileconfig via Airdrop, and now it works fine. So I'm at a loss as to whether these errors actually correspond to something the end user will notice. Just for the hell of it, I've added reauth=no back into ipsec.conf, as well as lifetime=1h to see if it makes a difference.
Edit: changed filename above.
@TC1977 so which charon.conf do you feel show the best results/most promise? I don't mind setting up a few droplets with different settings to get to the bottom of this.
I, on the other hand, have had none of those error messages, have no orphaned policies, and have yet to have a reconnect loop. But I don't think it's been long enough yet to declare auto=route a fix.
Did you flush the policies before testing auto=route like I mentioned in my previous message?
Sometimes when I've had problems getting an iOS device to connect (but not when it's looping) I find it helps to toggle Wi-Fi off and on.
@davidemyers toggling wifi is the best way to reconnect, but I hate having to do it constantly throughout the day.
Yes, I've been doing sudo ip xfrm policy flush and sudo ipsec restart after every change to ipsec.conf and charon.conf.
OK, thanks for confirming we're both testing starting with the policies flushed. I'm disappointed that you're still getting orphaned policies.
@digeratus If you want to test the potential fix we're currently testing, edit /etc/ipsec.conf, change auto=add to auto=route, then run ipsec stop; ip xfrm policy flush; ipsec start (or just reboot).
@davidemyers @digeratus I was able to get a connect/disconnect loop on my iPhone connected to LTE this afternoon. My ipsec.conf has rekey=no, reauth=no, and auto=route. I also have lifetime=1h but I think that's an implicit default when I check https://wiki.strongswan.org/projects/strongswan/wiki/ConnSection. Interestingly, the loop went on for about a minute, and just as I was thinking of trying to doing a screenshot or live video, it resolved and I was connected just fine.
sudo ip xfrm pol list results in this output:
sudo ip xfrm pol list
src ::/0 dst fd9d:bc11:4020::4/128
dir out priority 334463
tmpl src 172.31.39.82 dst xxx.xxx.xxx.94
proto esp spi 0x0697e047 reqid 41 mode tunnel
src 0.0.0.0/0 dst 10.19.48.4/32
dir out priority 383615
tmpl src 172.31.39.82 dst xxx.xxx.xxx.94
proto esp spi 0x0697e047 reqid 41 mode tunnel
src ::/0 dst fd9d:bc11:4020::3/128
dir out priority 334463
tmpl src 172.31.39.82 dst xxx.xxx.xxx.112
proto esp spi 0x0ba89523 reqid 19 mode tunnel
src 0.0.0.0/0 dst 10.19.48.3/32
dir out priority 383615
tmpl src 172.31.39.82 dst xxx.xxx.xxx.112
proto esp spi 0x0ba89523 reqid 19 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
Here's the result of grep "unable to install policy" /var/log/syslog. And I'm attaching the result of grep charon /var/log/syslog for that one minute, 15:14, where all hell broke loose. ("xxx.xxx.xxx.xxx" is the Algo server IP, and "yyy.yyy.yyy.123" is the cell tower IP for the two users.)
Aug 21 15:14:14 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.4/32 out for reqid 85, the same policy for reqid 41 exists
Aug 21 15:14:16 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 87, the same policy for reqid 19 exists
Aug 21 15:14:16 ip-172-31-39-82 charon: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 88, the same policy for reqid 19 exists
Aug 21 15:14:19 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 89, the same policy for reqid 19 exists
Aug 21 15:14:19 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 90, the same policy for reqid 19 exists
Aug 21 15:14:20 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 91, the same policy for reqid 19 exists
Aug 21 15:14:20 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 92, the same policy for reqid 19 exists
Aug 21 15:14:21 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 93, the same policy for reqid 19 exists
Aug 21 15:14:21 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 94, the same policy for reqid 19 exists
Aug 21 15:14:22 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 95, the same policy for reqid 19 exists
Aug 21 15:14:22 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 96, the same policy for reqid 19 exists
Aug 21 15:14:22 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 97, the same policy for reqid 19 exists
Aug 21 15:14:23 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 98, the same policy for reqid 19 exists
Aug 21 15:14:24 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 99, the same policy for reqid 19 exists
Aug 21 15:14:24 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 100, the same policy for reqid 19 exists
Aug 21 15:14:24 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 101, the same policy for reqid 19 exists
Aug 21 15:14:25 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 102, the same policy for reqid 19 exists
Aug 21 15:14:25 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 103, the same policy for reqid 19 exists
Aug 21 15:14:25 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 104, the same policy for reqid 19 exists
Aug 21 15:14:26 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 105, the same policy for reqid 19 exists
Aug 21 15:14:26 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 106, the same policy for reqid 19 exists
Aug 21 15:14:26 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 107, the same policy for reqid 19 exists
Aug 21 15:14:27 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 108, the same policy for reqid 19 exists
Aug 21 15:14:27 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 109, the same policy for reqid 19 exists
Aug 21 15:14:28 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 110, the same policy for reqid 19 exists
Aug 21 15:14:28 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 111, the same policy for reqid 19 exists
Aug 21 15:14:28 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 112, the same policy for reqid 19 exists
Aug 21 15:14:29 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 113, the same policy for reqid 19 exists
Aug 21 15:14:29 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 114, the same policy for reqid 19 exists
Aug 21 15:14:30 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 115, the same policy for reqid 19 exists
Aug 21 15:14:30 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 116, the same policy for reqid 19 exists
Aug 21 15:14:30 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 117, the same policy for reqid 19 exists
Aug 21 15:14:31 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 118, the same policy for reqid 19 exists
Aug 21 15:14:31 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 119, the same policy for reqid 19 exists
Aug 21 15:14:31 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 120, the same policy for reqid 19 exists
Aug 21 15:14:32 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 121, the same policy for reqid 19 exists
Aug 21 15:14:32 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 122, the same policy for reqid 19 exists
Aug 21 15:14:32 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 123, the same policy for reqid 19 exists
Aug 21 15:14:33 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 124, the same policy for reqid 19 exists
Aug 21 15:14:33 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 125, the same policy for reqid 19 exists
Aug 21 15:14:33 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 126, the same policy for reqid 19 exists
Aug 21 15:14:34 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 127, the same policy for reqid 19 exists
Aug 21 15:14:34 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 128, the same policy for reqid 19 exists
Aug 21 15:14:35 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 129, the same policy for reqid 19 exists
Aug 21 15:14:35 ip-172-31-39-82 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 130, the same policy for reqid 19 exists
Aug 21 15:14:36 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 131, the same policy for reqid 19 exists
Aug 21 15:14:36 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 132, the same policy for reqid 19 exists
Aug 21 15:14:37 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 133, the same policy for reqid 19 exists
Aug 21 15:14:37 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 134, the same policy for reqid 19 exists
Aug 21 15:14:38 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 135, the same policy for reqid 19 exists
Aug 21 15:14:38 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 136, the same policy for reqid 19 exists
Aug 21 15:14:39 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 137, the same policy for reqid 19 exists
Aug 21 15:14:39 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 138, the same policy for reqid 19 exists
Aug 21 15:14:40 ip-172-31-39-82 charon: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 139, the same policy for reqid 19 exists
Aug 21 15:14:40 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 140, the same policy for reqid 19 exists
Aug 21 15:14:41 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 141, the same policy for reqid 19 exists
Aug 21 15:14:41 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 142, the same policy for reqid 19 exists
Aug 21 15:14:42 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 143, the same policy for reqid 19 exists
Aug 21 15:14:42 ip-172-31-39-82 charon: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 144, the same policy for reqid 19 exists
Aug 21 15:14:42 ip-172-31-39-82 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 145, the same policy for reqid 19 exists
Aug 21 15:14:43 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 146, the same policy for reqid 19 exists
Aug 21 15:14:43 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 147, the same policy for reqid 19 exists
Aug 21 15:14:44 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 148, the same policy for reqid 19 exists
Aug 21 15:14:44 ip-172-31-39-82 charon: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 149, the same policy for reqid 19 exists
Aug 21 15:14:44 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 150, the same policy for reqid 19 exists
Aug 21 15:14:45 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 151, the same policy for reqid 19 exists
Aug 21 15:14:45 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 152, the same policy for reqid 19 exists
Aug 21 15:14:45 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 153, the same policy for reqid 19 exists
Aug 21 15:14:46 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 154, the same policy for reqid 19 exists
Aug 21 15:14:47 ip-172-31-39-82 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 155, the same policy for reqid 19 exists
Aug 21 15:14:47 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 156, the same policy for reqid 19 exists
Aug 21 15:14:48 ip-172-31-39-82 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 157, the same policy for reqid 19 exists
Aug 21 15:14:48 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 158, the same policy for reqid 19 exists
Aug 21 15:14:49 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 159, the same policy for reqid 19 exists
Aug 21 15:14:49 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 160, the same policy for reqid 19 exists
Aug 21 15:14:50 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 161, the same policy for reqid 19 exists
Aug 21 15:14:50 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 162, the same policy for reqid 19 exists
Aug 21 15:14:51 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 163, the same policy for reqid 19 exists
Aug 21 15:14:51 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 164, the same policy for reqid 19 exists
Aug 21 15:14:51 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 165, the same policy for reqid 19 exists
Aug 21 15:14:52 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 166, the same policy for reqid 19 exists
Aug 21 15:14:52 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 167, the same policy for reqid 19 exists
Aug 21 15:14:52 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 168, the same policy for reqid 19 exists
Aug 21 15:14:53 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 169, the same policy for reqid 19 exists
Aug 21 15:14:53 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 170, the same policy for reqid 19 exists
Aug 21 15:14:53 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 171, the same policy for reqid 19 exists
Aug 21 15:14:54 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 172, the same policy for reqid 19 exists
Aug 21 15:14:54 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 173, the same policy for reqid 19 exists
Aug 21 15:14:55 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 174, the same policy for reqid 19 exists
Aug 21 15:14:55 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 175, the same policy for reqid 19 exists
Aug 21 15:14:55 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 176, the same policy for reqid 19 exists
Aug 21 15:14:56 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 177, the same policy for reqid 19 exists
Aug 21 15:14:56 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 178, the same policy for reqid 19 exists
Aug 21 15:14:57 ip-172-31-39-82 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 179, the same policy for reqid 19 exists
Aug 21 15:14:57 ip-172-31-39-82 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 180, the same policy for reqid 19 exists
Aug 21 15:14:57 ip-172-31-39-82 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 181, the same policy for reqid 19 exists
Aug 21 15:14:58 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 182, the same policy for reqid 19 exists
Aug 21 15:14:59 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 183, the same policy for reqid 19 exists
Aug 21 15:14:59 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 184, the same policy for reqid 19 exists
Aug 21 15:14:59 ip-172-31-39-82 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 185, the same policy for reqid 19 exists
Aug 21 15:15:00 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 186, the same policy for reqid 19 exists
Aug 21 15:15:00 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 187, the same policy for reqid 19 exists
Aug 21 15:15:01 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 188, the same policy for reqid 19 exists
Aug 21 15:15:02 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 189, the same policy for reqid 19 exists
Aug 21 15:15:02 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 190, the same policy for reqid 19 exists
Aug 21 15:15:03 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 191, the same policy for reqid 19 exists
Aug 21 15:15:04 ip-172-31-39-82 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 192, the same policy for reqid 19 exists
Aug 21 15:15:04 ip-172-31-39-82 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 193, the same policy for reqid 19 exists
Aug 21 15:15:05 ip-172-31-39-82 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 194, the same policy for reqid 19 exists
Aug 21 15:15:05 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 195, the same policy for reqid 19 exists
Aug 21 15:15:06 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 196, the same policy for reqid 19 exists
Aug 21 15:15:06 ip-172-31-39-82 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 197, the same policy for reqid 19 exists
Aug 21 15:15:06 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 198, the same policy for reqid 19 exists
Aug 21 15:15:07 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 199, the same policy for reqid 19 exists
Aug 21 15:15:08 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 200, the same policy for reqid 19 exists
Aug 21 15:15:08 ip-172-31-39-82 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 201, the same policy for reqid 19 exists
Aug 21 15:15:09 ip-172-31-39-82 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 202, the same policy for reqid 19 exists
Aug 21 15:15:10 ip-172-31-39-82 charon: 13[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 203, the same policy for reqid 19 exists
Aug 21 15:15:10 ip-172-31-39-82 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 204, the same policy for reqid 19 exists
Aug 21 15:15:11 ip-172-31-39-82 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 205, the same policy for reqid 19 exists
Aug 21 15:15:12 ip-172-31-39-82 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 206, the same policy for reqid 19 exists
Aug 21 15:15:12 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 207, the same policy for reqid 19 exists
Aug 21 15:15:13 ip-172-31-39-82 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 208, the same policy for reqid 19 exists
Aug 21 15:15:13 ip-172-31-39-82 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 209, the same policy for reqid 19 exists
Aug 21 15:15:14 ip-172-31-39-82 charon: 09[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 210, the same policy for reqid 19 exists
Aug 21 15:15:15 ip-172-31-39-82 charon: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 211, the same policy for reqid 19 exists
Well, so much for auto=route.
Now I've modified strongswan.conf to include charon.delete_rekeyed_delay=10, and modified ipsec.conf to include lifetime=2h under reauth=no and rekey=no. Then I rebooted the whole server for the heck of it. Let's see what happens tomorrow.
No connect/reconnect loops today. grep "unable to install policy" /var/log/syslog shows no errors, and ip xfrm pol list shows no dead policies. Of course that doesn't prove anything. @digeratus Here's my /etc/ipsec.conf if you want to do some testing. The key changes are rekey=no, reauth=no, lifetime=2h, and auto=route. Who knows which of these actually make a difference.
config setup
uniqueids=never # allow multiple connections per user
charondebug="ike 2, knl 2, cfg 2, net 2, esp 2, dmn 2, mgr 2"
conn %default
fragmentation=yes
rekey=no
reauth=no
lifetime=2h
dpdaction=clear
keyexchange=ikev2
compress=yes
dpddelay=35s
ike=aes128gcm16-prfsha512-ecp256!
esp=aes128gcm16-ecp256!
left=%any
leftauth=pubkey
leftid=35.174.56.117
leftcert=35.174.56.117.crt
leftsendcert=always
leftsubnet=0.0.0.0/0,::/0
right=%any
rightauth=pubkey
rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
rightdns=172.16.0.1
conn ikev2-pubkey
auto=route
Here's my etc/strongswan.d/strongswan.conf. The only parameter that's been changed is charon.delete_rekeyed_delay=10.
# Options for the charon IKE daemon.
charon {
# Accept unencrypted ID and HASH payloads in IKEv1 Main Mode.
# accept_unencrypted_mainmode_messages = no
# Maximum number of half-open IKE_SAs for a single peer IP.
# block_threshold = 5
# Whether Certificate Revocation Lists (CRLs) fetched via HTTP or LDAP
# should be saved under a unique file name derived from the public key of
# the Certification Authority (CA) to /etc/ipsec.d/crls (stroke) or
# /etc/swanctl/x509crl (vici), respectively.
# cache_crls = no
# Whether relations in validated certificate chains should be cached in
# memory.
# cert_cache = yes
# Send Cisco Unity vendor ID payload (IKEv1 only).
# cisco_unity = no
# Close the IKE_SA if setup of the CHILD_SA along with IKE_AUTH failed.
# close_ike_on_child_failure = no
# Number of half-open IKE_SAs that activate the cookie mechanism.
# cookie_threshold = 10
# Delete CHILD_SAs right after they got successfully rekeyed (IKEv1 only).
# delete_rekeyed = no
# Delay in seconds until inbound IPsec SAs are deleted after rekeyings
# (IKEv2 only).
delete_rekeyed_delay = 10
# Use ANSI X9.42 DH exponent size or optimum size matched to cryptographic
# strength.
# dh_exponent_ansi_x9_42 = yes
# Use RTLD_NOW with dlopen when loading plugins and IMV/IMCs to reveal
# missing symbols immediately.
# dlopen_use_rtld_now = no
# DNS server assigned to peer via configuration payload (CP).
# dns1 =
# DNS server assigned to peer via configuration payload (CP).
# dns2 =
# Enable Denial of Service protection using cookies and aggressiveness
# checks.
# dos_protection = no
# Compliance with the errata for RFC 4753.
# ecp_x_coordinate_only = yes
# Free objects during authentication (might conflict with plugins).
# flush_auth_cfg = no
# Whether to follow IKEv2 redirects (RFC 5685).
# follow_redirects = yes
# Maximum size (complete IP datagram size in bytes) of a sent IKE fragment
# when using proprietary IKEv1 or standardized IKEv2 fragmentation, defaults
# to 1280 (use 0 for address family specific default values, which uses a
# lower value for IPv4). If specified this limit is used for both IPv4 and
# IPv6.
# fragment_size = 1280
# Name of the group the daemon changes to after startup.
# group =
# Timeout in seconds for connecting IKE_SAs (also see IKE_SA_INIT DROPPING).
# half_open_timeout = 30
# Enable hash and URL support.
# hash_and_url = no
# Allow IKEv1 Aggressive Mode with pre-shared keys as responder.
# i_dont_care_about_security_and_use_aggressive_mode_psk = no
# Whether to ignore the traffic selectors from the kernel's acquire events
# for IKEv2 connections (they are not used for IKEv1).
# ignore_acquire_ts = no
# A space-separated list of routing tables to be excluded from route
# lookups.
# ignore_routing_tables =
# Maximum number of IKE_SAs that can be established at the same time before
# new connection attempts are blocked.
# ikesa_limit = 0
# Number of exclusively locked segments in the hash table.
# ikesa_table_segments = 1
# Size of the IKE_SA hash table.
# ikesa_table_size = 1
# Whether to close IKE_SA if the only CHILD_SA closed due to inactivity.
# inactivity_close_ike = no
# Limit new connections based on the current number of half open IKE_SAs,
# see IKE_SA_INIT DROPPING in strongswan.conf(5).
# init_limit_half_open = 0
# Limit new connections based on the number of queued jobs.
# init_limit_job_load = 0
# Causes charon daemon to ignore IKE initiation requests.
# initiator_only = no
# Install routes into a separate routing table for established IPsec
# tunnels.
# install_routes = yes
# Install virtual IP addresses.
# install_virtual_ip = yes
# The name of the interface on which virtual IP addresses should be
# installed.
# install_virtual_ip_on =
# Check daemon, libstrongswan and plugin integrity at startup.
# integrity_test = no
# A comma-separated list of network interfaces that should be ignored, if
# interfaces_use is specified this option has no effect.
# interfaces_ignore =
# A comma-separated list of network interfaces that should be used by
# charon. All other interfaces are ignored.
# interfaces_use =
# NAT keep alive interval.
# keep_alive = 20s
# Plugins to load in the IKE daemon charon.
# load =
# Determine plugins to load via each plugin's load option.
# load_modular = no
# Initiate IKEv2 reauthentication with a make-before-break scheme.
# make_before_break = yes
# Maximum number of IKEv1 phase 2 exchanges per IKE_SA to keep state about
# and track concurrently.
# max_ikev1_exchanges = 3
# Maximum packet size accepted by charon.
# max_packet = 10000
# Enable multiple authentication exchanges (RFC 4739).
# multiple_authentication = yes
# WINS servers assigned to peer via configuration payload (CP).
# nbns1 =
# WINS servers assigned to peer via configuration payload (CP).
# nbns2 =
# UDP port used locally. If set to 0 a random port will be allocated.
# port = 500
# UDP port used locally in case of NAT-T. If set to 0 a random port will be
# allocated. Has to be different from charon.port, otherwise a random port
# will be allocated.
# port_nat_t = 4500
# Whether to prefer updating SAs to the path with the best route.
# prefer_best_path = no
# Prefer locally configured proposals for IKE/IPsec over supplied ones as
# responder (disabling this can avoid keying retries due to
# INVALID_KE_PAYLOAD notifies).
# prefer_configured_proposals = yes
# By default public IPv6 addresses are preferred over temporary ones (RFC
# 4941), to make connections more stable. Enable this option to reverse
# this.
# prefer_temporary_addrs = no
# Process RTM_NEWROUTE and RTM_DELROUTE events.
# process_route = yes
# Delay in ms for receiving packets, to simulate larger RTT.
# receive_delay = 0
# Delay request messages.
# receive_delay_request = yes
# Delay response messages.
# receive_delay_response = yes
# Specific IKEv2 message type to delay, 0 for any.
# receive_delay_type = 0
# Size of the AH/ESP replay window, in packets.
# replay_window = 32
# Base to use for calculating exponential back off, see IKEv2 RETRANSMISSION
# in strongswan.conf(5).
# retransmit_base = 1.8
# Maximum jitter in percent to apply randomly to calculated retransmission
# timeout (0 to disable).
# retransmit_jitter = 0
# Upper limit in seconds for calculated retransmission timeout (0 to
# disable).
# retransmit_limit = 0
# Timeout in seconds before sending first retransmit.
# retransmit_timeout = 4.0
# Number of times to retransmit a packet before giving up.
# retransmit_tries = 5
# Interval in seconds to use when retrying to initiate an IKE_SA (e.g. if
# DNS resolution failed), 0 to disable retries.
# retry_initiate_interval = 0
# Initiate CHILD_SA within existing IKE_SAs (always enabled for IKEv1).
# reuse_ikesa = yes
# Numerical routing table to install routes to.
# routing_table =
# Priority of the routing table.
# routing_table_prio =
# Whether to use RSA with PSS padding instead of PKCS#1 padding by default.
# rsa_pss = no
# Delay in ms for sending packets, to simulate larger RTT.
# send_delay = 0
# Delay request messages.
# send_delay_request = yes
# Delay response messages.
# send_delay_response = yes
# Specific IKEv2 message type to delay, 0 for any.
# send_delay_type = 0
# Send strongSwan vendor ID payload
# send_vendor_id = no
# Whether to enable Signature Authentication as per RFC 7427.
# signature_authentication = yes
# Whether to enable constraints against IKEv2 signature schemes.
# signature_authentication_constraints = yes
# The upper limit for SPIs requested from the kernel for IPsec SAs.
# spi_max = 0xcfffffff
# The lower limit for SPIs requested from the kernel for IPsec SAs.
# spi_min = 0xc0000000
# Number of worker threads in charon.
# threads = 16
# Name of the user the daemon changes to after startup.
# user =
crypto_test {
# Benchmark crypto algorithms and order them by efficiency.
# bench = no
# Buffer size used for crypto benchmark.
# bench_size = 1024
# Number of iterations to test each algorithm.
# bench_time = 50
# Test crypto algorithms during registration (requires test vectors
# provided by the test-vectors plugin).
# on_add = no
# Test crypto algorithms on each crypto primitive instantiation.
# on_create = no
# Strictly require at least one test vector to enable an algorithm.
# required = no
# Whether to test RNG with TRUE quality; requires a lot of entropy.
# rng_true = no
}
host_resolver {
# Maximum number of concurrent resolver threads (they are terminated if
# unused).
# max_threads = 3
# Minimum number of resolver threads to keep around.
# min_threads = 0
}
leak_detective {
# Includes source file names and line numbers in leak detective output.
# detailed = yes
# Threshold in bytes for leaks to be reported (0 to report all).
# usage_threshold = 10240
# Threshold in number of allocations for leaks to be reported (0 to
# report all).
# usage_threshold_count = 0
}
processor {
# Section to configure the number of reserved threads per priority class
# see JOB PRIORITY MANAGEMENT in strongswan.conf(5).
priority_threads {
}
}
# Section containing a list of scripts (name = path) that are executed when
# the daemon is started.
start-scripts {
}
# Section containing a list of scripts (name = path) that are executed when
# the daemon is terminated.
stop-scripts {
}
tls {
# List of TLS encryption ciphers.
# cipher =
# List of TLS key exchange methods.
# key_exchange =
# List of TLS MAC algorithms.
# mac =
# List of TLS cipher suites.
# suites =
}
x509 {
# Discard certificates with unsupported or unknown critical extensions.
# enforce_critical = yes
}
}
@TC1977 thanks. Like you, I had no success with the previous settings. Let's see if these make a difference. Give me 24hrs.
Yep. Nope. Still getting "unable to install policy" errors, and orphaned outgoing policies.. I didn't see any loops for myself, though.
sudo ip xfrm pol list
src ::/0 dst fd9d:bc11:4020::2/128
dir out priority 334463
tmpl src 172.31.39.82 dst 107.242.116.98
proto esp spi 0x0a29e4fb reqid 45 mode tunnel
src 0.0.0.0/0 dst 10.19.48.2/32
dir out priority 383615
tmpl src 172.31.39.82 dst 107.242.116.98
proto esp spi 0x0a29e4fb reqid 45 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
My charon has now been running for 15 days with 3 devices connected without any "unable to install policy" errors. I wonder if it's because I'm using both uniqueids=yes and auto=route?
It might actually be because my devices don't switch networks very often and I don't usually use the VPN over LTE.
Meanwhile the "connecting.../disconnecting..." loop has gotten so bad for me, on all devices, that my VPN is basically unusable. The problem is most noticeable for me when moving from LTE (or weak 4G) to a public wifi with captive portal screen, or back. It seemed to either go away after a few seconds, or last for hours.
At this point maybe my server was just too old. I've installed a new server, using the ansible2.5 branch and new cipher suite, and I'm changing my ipsec.conf to auto=route. I'm going to leave uniqueids=never as per default config.
auto=route is still giving me orphaned policies, but maybe there's an improvement. What I notice in today's logs is that after a few tries the client (an iPhone on LTE) will get assigned a different virtual IP other than 10.19.48.3 (the orphaned lease), and then connect just fine. So the errors are occurring every few minutes, and not more than twice in a row. Therefore there aren't any connect/reconnect loops to speak of, and the user didn't notice anything.
ubuntu@ip-172-16-254-145:~$ grep "unable to install policy" /var/log/syslog
Sep 6 15:04:49 ip-172-16-254-145 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 106, the same policy for reqid 92 exists
Sep 6 15:04:49 ip-172-16-254-145 ipsec[23716]: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 106, the same policy for reqid 92 exists
Sep 6 16:24:16 ip-172-16-254-145 ipsec[23716]: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 109, the same policy for reqid 92 exists
Sep 6 16:24:16 ip-172-16-254-145 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 109, the same policy for reqid 92 exists
Sep 6 16:36:52 ip-172-16-254-145 charon: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 113, the same policy for reqid 92 exists
Sep 6 16:36:52 ip-172-16-254-145 ipsec[23716]: 14[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 113, the same policy for reqid 92 exists
Sep 6 18:15:41 ip-172-16-254-145 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 116, the same policy for reqid 92 exists
Sep 6 18:15:41 ip-172-16-254-145 ipsec[23716]: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 116, the same policy for reqid 92 exists
Sep 6 19:17:12 ip-172-16-254-145 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 119, the same policy for reqid 92 exists
Sep 6 19:17:12 ip-172-16-254-145 ipsec[23716]: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 119, the same policy for reqid 92 exists
Sep 6 19:37:23 ip-172-16-254-145 charon: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 122, the same policy for reqid 92 exists
Sep 6 19:37:23 ip-172-16-254-145 ipsec[23716]: 12[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 122, the same policy for reqid 92 exists
Sep 6 19:52:51 ip-172-16-254-145 ipsec[23716]: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 124, the same policy for reqid 92 exists
Sep 6 19:52:51 ip-172-16-254-145 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 124, the same policy for reqid 92 exists
Sep 6 20:02:45 ip-172-16-254-145 charon: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 127, the same policy for reqid 92 exists
Sep 6 20:02:46 ip-172-16-254-145 ipsec[23716]: 15[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 127, the same policy for reqid 92 exists
Sep 6 20:06:55 ip-172-16-254-145 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 129, the same policy for reqid 92 exists
Sep 6 20:07:01 ip-172-16-254-145 ipsec[23716]: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 129, the same policy for reqid 92 exists
@davidemyers Reviewing logs from earlier today show two big freakouts, with numerous back-to-back loops spanning more than a minute at a time. What I notice is that each involved a mobile client with two IPs that weren't working. First one lasted until 15:23:24, with multiple attempts per second.
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 176, the same policy for reqid 92 exists
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 07[IKE] unable to install IPsec policies (SPD) in kernel
Sep 8 15:22:28 ip-172-16-254-145 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 176, the same policy for reqid 92 exists
Sep 8 15:22:28 ip-172-16-254-145 charon: 07[IKE] unable to install IPsec policies (SPD) in kernel
Sep 8 15:22:28 ip-172-16-254-145 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.5/32 out for reqid 177, the same policy for reqid 174 exists
Sep 8 15:22:28 ip-172-16-254-145 charon: 16[IKE] unable to install IPsec policies (SPD) in kernel
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.5/32 out for reqid 177, the same policy for reqid 174 exists
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 16[IKE] unable to install IPsec policies (SPD) in kernel
There's a similar loop which started at 15:39:27, lasted until 15:45:51 (with an 11-second break in the middle).
These logs are way too large to read through all at once. I do notice that grep "reqid 174" syslog and then grep "15:06:58" syslog gives the key entries here:
Sep 8 15:06:58 ip-172-16-254-145 charon: 13[CFG] trap not found, unable to acquire reqid 174
Sep 8 15:09:18 ip-172-16-254-145 ipsec[23716]: 13[CFG] trap not found, unable to acquire reqid 174
Sep 8 15:06:58 ip-172-16-254-145 charon: 10[CFG] lease fd9d:bc11:4020::5 by '[username redacted]' went offline
Sep 8 15:06:58 ip-172-16-254-145 charon: 10[CFG] lease 10.19.48.5 by '[username redacted]' went offline
Sep 8 15:06:58 ip-172-16-254-145 charon: 10[MGR] checkin and destroy of IKE_SA successful
Sep 8 15:06:58 ip-172-16-254-145 charon: 03[NET] waiting for data on sockets
Sep 8 15:06:58 ip-172-16-254-145 charon: 15[KNL] received a XFRM_MSG_ACQUIRE
Sep 8 15:06:58 ip-172-16-254-145 charon: 15[KNL] XFRMA_TMPL
Sep 8 15:06:58 ip-172-16-254-145 charon: 15[KNL] creating acquire job for policy 40.97.222.34/32[tcp/https] === 10.19.48.5/32[tcp/49240] with reqid {174}
Sep 8 15:06:58 ip-172-16-254-145 charon: 13[CFG] trap not found, unable to acquire reqid 174
So basically, I'm back to where you were three weeks ago, with "trap not found" and "XFRMA" messages before the policy gets orphaned out and starts interfering with new policies. I didn't see these before, but I do now. I was running with auto=route in ipsec.conf. Now I've also added reauth=no in ipsec.conf and delete_rekeyed_delay=10s and keepalive=30s in /etc/strongswan.d/charon.conf.
@TC1977
Would you mind attaching the entire log as a file?
On Sep 9, 2018, at 4:38 AM, TC1977 notifications@github.com wrote:
@davidemyers Reviewing logs from earlier today show two big freakouts, with numerous back-to-back loops spanning more than a minute at a time. What I notice is that each involved a mobile client with two IPs that weren't working. First one lasted until 15:23:24, with multiple attempts per second.
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 176, the same policy for reqid 92 exists
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 07[IKE] unable to install IPsec policies (SPD) in kernel
Sep 8 15:22:28 ip-172-16-254-145 charon: 07[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.3/32 out for reqid 176, the same policy for reqid 92 exists
Sep 8 15:22:28 ip-172-16-254-145 charon: 07[IKE] unable to install IPsec policies (SPD) in kernel
Sep 8 15:22:28 ip-172-16-254-145 charon: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.5/32 out for reqid 177, the same policy for reqid 174 exists
Sep 8 15:22:28 ip-172-16-254-145 charon: 16[IKE] unable to install IPsec policies (SPD) in kernel
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 16[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.5/32 out for reqid 177, the same policy for reqid 174 exists
Sep 8 15:22:28 ip-172-16-254-145 ipsec[23716]: 16[IKE] unable to install IPsec policies (SPD) in kernel
There's a similar loop which started at 15:39:27, lasted until 15:45:51 (with an 11-second break in the middle).These logs are way too large to read through all at once. I do notice that grep "reqid 184" syslog and then grep "15:06:58" syslog gives the key entries here:
Sep 8 15:06:58 ip-172-16-254-145 charon: 13[CFG] trap not found, unable to acquire reqid 174
Sep 8 15:09:18 ip-172-16-254-145 ipsec[23716]: 13[CFG] trap not found, unable to acquire reqid 174
Sep 8 15:06:58 ip-172-16-254-145 charon: 10[CFG] lease fd9d:bc11:4020::5 by '[username redacted]' went offline
Sep 8 15:06:58 ip-172-16-254-145 charon: 10[CFG] lease 10.19.48.5 by '[username redacted]' went offline
Sep 8 15:06:58 ip-172-16-254-145 charon: 10[MGR] checkin and destroy of IKE_SA successful
Sep 8 15:06:58 ip-172-16-254-145 charon: 03[NET] waiting for data on sockets
Sep 8 15:06:58 ip-172-16-254-145 charon: 15[KNL] received a XFRM_MSG_ACQUIRE
Sep 8 15:06:58 ip-172-16-254-145 charon: 15[KNL] XFRMA_TMPL
Sep 8 15:06:58 ip-172-16-254-145 charon: 15[KNL] creating acquire job for policy 40.97.222.34/32[tcp/https] === 10.19.48.5/32[tcp/49240] with reqid {174}
Sep 8 15:06:58 ip-172-16-254-145 charon: 13[CFG] trap not found, unable to acquire reqid 174
So basically, I'm back to where you were three weeks ago, with "trap not found" and "XFRMA" messages before the policy gets orphaned out and starts interfering with new policies. I didn't see these before, but I do now. I was running with auto=route in ipsec.conf. Now I've also added reauth=no in ipsec.conf and delete_rekeyed_delay=10s and keepalive=30s in /etc/strongswan.d/charon.conf.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@digeratus The PDF of grep charon syslog.2 for just the period from 15:38:38 to 15:46:58 came out to 6.7MB! 3016 pages! TextEdit choked and died just after I redacted relevant IPs and user names, but before I could save it as a .txt rather than a .rtf.
Anyway, with the config listed above I haven't had any policy errors in the last 24+ hours. ipsec.conf attached below. Two iOS clients, moving between 4G/LTE/home trusted wifi all day yesterday. If I get more errors today, next idea per strongswan docs would be to set dpddelay=5s in ipsec.conf.
config setup
uniqueids=never # allow multiple connections per user
charondebug="ike 2, knl 2, cfg 2, net 2, esp 2, dmn 2, mgr 2"
conn %default
fragmentation=yes
rekey=no
reauth=no
forceencaps=yes
dpdaction=clear
keyexchange=ikev2
compress=yes
dpddelay=35s
ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!
left=%any
leftauth=pubkey
leftid=[redactedIP]
leftcert=[redactedIP].crt
leftsendcert=always
leftsubnet=0.0.0.0/0,::/0
right=%any
rightauth=pubkey
rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
rightdns=172.16.0.1
conn ikev2-pubkey
auto=route
My test system keeps humming along, though I'm not sure that proves anything. I really think we're seeing the strongSwan bug mentioned above and haven't yet found a way to avoid it.
root@vpn4:~# ipsec statusall | head -2
Status of IKE charon daemon (strongSwan 5.6.2, Linux 4.15.0-32-generic, x86_64):
uptime: 23 days, since Aug 18 08:22:00 2018
root@vpn4:~# journalctl | grep unable | tail -1
Aug 18 07:35:14 vpn4 ipsec[768]: 15[IKE] unable to install IPsec policies (SPD) in kernel
root@vpn4:~# ipsec leases
Leases in pool '10.19.48.0/24', usage: 3/254, 3 online
10.19.48.2 online 'daves-iphone'
10.19.48.1 online 'daves-mbp'
10.19.48.3 online 'daves-ipad'
Leases in pool 'fd9d:bc11:4020::/48', usage: 3/2147483646, 3 online
fd9d:bc11:4020::2 online 'daves-iphone'
fd9d:bc11:4020::1 online 'daves-mbp'
fd9d:bc11:4020::3 online 'daves-ipad'
root@vpn4:~# diff /etc/ipsec.conf.orig /etc/ipsec.conf
2c2
< uniqueids=never # allow multiple connections per user
---
> uniqueids=yes
29c29
< auto=add
---
> auto=route
@davidemyers Wanna try driving around with your iPhone? Or running into and out of your home Wifi's range? 😄
The interesting thing for me is, the user that keeps getting the connection loops is a single iPhone, no shared certificate - therefore uniqueids shouldn't make a difference. The user that doesn't get as many loops is a shared certificate between an iPhone and several others on excluded home Wifi, so uniqueids shouldn't come into play there either. The crazy loops seem to be in conjunction with two clients connecting nearly simultaneously. Both iPhones are on 11.4.1, as is yours I believe?
I did try switching my iPhone back and forth between LTE and Wi-Fi a few times but that got boring after a while. :smile:
I'm assuming uniqueids changes how charon reuses internal data structures, and therefore might be relevant.
My iOS devices are all at 11.4.1.
@digeratus Wanna try this config? Only one orphaned policy in two days.
ubuntu@ip-172-16-254-145:~$ cat /etc/ipsec.conf
config setup
uniqueids=never # allow multiple connections per user
charondebug="ike 2, knl 2, cfg 2, net 2, esp 2, dmn 2, mgr 2"
conn %default
fragmentation=yes
rekey=no
reauth=no
forceencaps=yes
dpdaction=clear
keyexchange=ikev2
compress=yes
dpddelay=35s
ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!
left=%any
leftauth=pubkey
leftid=[algo server IP here]
leftcert=[algo server IP here].crt
leftsendcert=always
leftsubnet=0.0.0.0/0,::/0
right=%any
rightauth=pubkey
rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
rightdns=172.16.0.1
conn ikev2-pubkey
auto=route
Here's charon.conf. Only changes are delete_rekeyed_delay=10 and keep_alive=30s.
ubuntu@ip-172-16-254-145:~$ cat /etc/strongswan.d/charon.conf
# Options for the charon IKE daemon.
charon {
# Accept unencrypted ID and HASH payloads in IKEv1 Main Mode.
# accept_unencrypted_mainmode_messages = no
# Maximum number of half-open IKE_SAs for a single peer IP.
# block_threshold = 5
# Whether Certificate Revocation Lists (CRLs) fetched via HTTP or LDAP
# should be saved under a unique file name derived from the public key of
# the Certification Authority (CA) to /etc/ipsec.d/crls (stroke) or
# /etc/swanctl/x509crl (vici), respectively.
# cache_crls = no
# Whether relations in validated certificate chains should be cached in
# memory.
# cert_cache = yes
# Send Cisco Unity vendor ID payload (IKEv1 only).
# cisco_unity = no
# Close the IKE_SA if setup of the CHILD_SA along with IKE_AUTH failed.
# close_ike_on_child_failure = no
# Number of half-open IKE_SAs that activate the cookie mechanism.
# cookie_threshold = 10
# Delete CHILD_SAs right after they got successfully rekeyed (IKEv1 only).
# delete_rekeyed = no
# Delay in seconds until inbound IPsec SAs are deleted after rekeyings
# (IKEv2 only).
delete_rekeyed_delay = 10
# Use ANSI X9.42 DH exponent size or optimum size matched to cryptographic
# strength.
# dh_exponent_ansi_x9_42 = yes
# Use RTLD_NOW with dlopen when loading plugins and IMV/IMCs to reveal
# missing symbols immediately.
# dlopen_use_rtld_now = no
# DNS server assigned to peer via configuration payload (CP).
# dns1 =
# DNS server assigned to peer via configuration payload (CP).
# dns2 =
# Enable Denial of Service protection using cookies and aggressiveness
# checks.
# dos_protection = yes
# Compliance with the errata for RFC 4753.
# ecp_x_coordinate_only = yes
# Free objects during authentication (might conflict with plugins).
# flush_auth_cfg = no
# Whether to follow IKEv2 redirects (RFC 5685).
# follow_redirects = yes
# Maximum size (complete IP datagram size in bytes) of a sent IKE fragment
# when using proprietary IKEv1 or standardized IKEv2 fragmentation, defaults
# to 1280 (use 0 for address family specific default values, which uses a
# lower value for IPv4). If specified this limit is used for both IPv4 and
# IPv6.
# fragment_size = 1280
# Name of the group the daemon changes to after startup.
# group =
# Timeout in seconds for connecting IKE_SAs (also see IKE_SA_INIT DROPPING).
# half_open_timeout = 30
# Enable hash and URL support.
# hash_and_url = no
# Allow IKEv1 Aggressive Mode with pre-shared keys as responder.
# i_dont_care_about_security_and_use_aggressive_mode_psk = no
# Whether to ignore the traffic selectors from the kernel's acquire events
# for IKEv2 connections (they are not used for IKEv1).
# ignore_acquire_ts = no
# A space-separated list of routing tables to be excluded from route
# lookups.
# ignore_routing_tables =
# Maximum number of IKE_SAs that can be established at the same time before
# new connection attempts are blocked.
# ikesa_limit = 0
# Number of exclusively locked segments in the hash table.
# ikesa_table_segments = 1
# Size of the IKE_SA hash table.
# ikesa_table_size = 1
# Whether to close IKE_SA if the only CHILD_SA closed due to inactivity.
# inactivity_close_ike = no
# Limit new connections based on the current number of half open IKE_SAs,
# see IKE_SA_INIT DROPPING in strongswan.conf(5).
# init_limit_half_open = 0
# Limit new connections based on the number of queued jobs.
# init_limit_job_load = 0
# Causes charon daemon to ignore IKE initiation requests.
# initiator_only = no
# Install routes into a separate routing table for established IPsec
# tunnels.
# install_routes = yes
# Install virtual IP addresses.
# install_virtual_ip = yes
# The name of the interface on which virtual IP addresses should be
# installed.
# install_virtual_ip_on =
# Check daemon, libstrongswan and plugin integrity at startup.
# integrity_test = no
# A comma-separated list of network interfaces that should be ignored, if
# interfaces_use is specified this option has no effect.
# interfaces_ignore =
# A comma-separated list of network interfaces that should be used by
# charon. All other interfaces are ignored.
# interfaces_use =
# NAT keep alive interval.
keep_alive = 30s
# Plugins to load in the IKE daemon charon.
# load =
# Determine plugins to load via each plugin's load option.
# load_modular = no
# Initiate IKEv2 reauthentication with a make-before-break scheme.
# make_before_break = yes
# Maximum number of IKEv1 phase 2 exchanges per IKE_SA to keep state about
# and track concurrently.
# max_ikev1_exchanges = 3
# Maximum packet size accepted by charon.
# max_packet = 10000
# Enable multiple authentication exchanges (RFC 4739).
# multiple_authentication = yes
# WINS servers assigned to peer via configuration payload (CP).
# nbns1 =
# WINS servers assigned to peer via configuration payload (CP).
# nbns2 =
# UDP port used locally. If set to 0 a random port will be allocated.
# port = 500
# UDP port used locally in case of NAT-T. If set to 0 a random port will be
# allocated. Has to be different from charon.port, otherwise a random port
# will be allocated.
# port_nat_t = 4500
# Whether to prefer updating SAs to the path with the best route.
# prefer_best_path = no
# Prefer locally configured proposals for IKE/IPsec over supplied ones as
# responder (disabling this can avoid keying retries due to
# INVALID_KE_PAYLOAD notifies).
# prefer_configured_proposals = yes
# By default public IPv6 addresses are preferred over temporary ones (RFC
# 4941), to make connections more stable. Enable this option to reverse
# this.
# prefer_temporary_addrs = no
# Process RTM_NEWROUTE and RTM_DELROUTE events.
# process_route = yes
# Delay in ms for receiving packets, to simulate larger RTT.
# receive_delay = 0
# Delay request messages.
# receive_delay_request = yes
# Delay response messages.
# receive_delay_response = yes
# Specific IKEv2 message type to delay, 0 for any.
# receive_delay_type = 0
# Size of the AH/ESP replay window, in packets.
# replay_window = 32
# Base to use for calculating exponential back off, see IKEv2 RETRANSMISSION
# in strongswan.conf(5).
# retransmit_base = 1.8
# Maximum jitter in percent to apply randomly to calculated retransmission
# timeout (0 to disable).
# retransmit_jitter = 0
# Upper limit in seconds for calculated retransmission timeout (0 to
# disable).
# retransmit_limit = 0
# Timeout in seconds before sending first retransmit.
# retransmit_timeout = 4.0
# Number of times to retransmit a packet before giving up.
# retransmit_tries = 5
# Interval in seconds to use when retrying to initiate an IKE_SA (e.g. if
# DNS resolution failed), 0 to disable retries.
# retry_initiate_interval = 0
# Initiate CHILD_SA within existing IKE_SAs (always enabled for IKEv1).
# reuse_ikesa = yes
# Numerical routing table to install routes to.
# routing_table =
# Priority of the routing table.
# routing_table_prio =
# Whether to use RSA with PSS padding instead of PKCS#1 padding by default.
# rsa_pss = no
# Delay in ms for sending packets, to simulate larger RTT.
# send_delay = 0
# Delay request messages.
# send_delay_request = yes
# Delay response messages.
# send_delay_response = yes
# Specific IKEv2 message type to delay, 0 for any.
# send_delay_type = 0
# Send strongSwan vendor ID payload
# send_vendor_id = no
# Whether to enable Signature Authentication as per RFC 7427.
# signature_authentication = yes
# Whether to enable constraints against IKEv2 signature schemes.
# signature_authentication_constraints = yes
# The upper limit for SPIs requested from the kernel for IPsec SAs.
# spi_max = 0xcfffffff
# The lower limit for SPIs requested from the kernel for IPsec SAs.
# spi_min = 0xc0000000
# Number of worker threads in charon.
# threads = 16
# Name of the user the daemon changes to after startup.
# user =
crypto_test {
# Benchmark crypto algorithms and order them by efficiency.
# bench = no
# Buffer size used for crypto benchmark.
# bench_size = 1024
# Number of iterations to test each algorithm.
# bench_time = 50
# Test crypto algorithms during registration (requires test vectors
# provided by the test-vectors plugin).
# on_add = no
# Test crypto algorithms on each crypto primitive instantiation.
# on_create = no
# Strictly require at least one test vector to enable an algorithm.
# required = no
# Whether to test RNG with TRUE quality; requires a lot of entropy.
# rng_true = no
}
host_resolver {
# Maximum number of concurrent resolver threads (they are terminated if
# unused).
# max_threads = 3
# Minimum number of resolver threads to keep around.
# min_threads = 0
}
leak_detective {
# Includes source file names and line numbers in leak detective output.
# detailed = yes
# Threshold in bytes for leaks to be reported (0 to report all).
# usage_threshold = 10240
# Threshold in number of allocations for leaks to be reported (0 to
# report all).
# usage_threshold_count = 0
}
processor {
# Section to configure the number of reserved threads per priority class
# see JOB PRIORITY MANAGEMENT in strongswan.conf(5).
priority_threads {
}
}
# Section containing a list of scripts (name = path) that are executed when
# the daemon is started.
start-scripts {
}
# Section containing a list of scripts (name = path) that are executed when
# the daemon is terminated.
stop-scripts {
}
tls {
# List of TLS encryption ciphers.
# cipher =
# List of TLS key exchange methods.
# key_exchange =
# List of TLS MAC algorithms.
# mac =
# List of TLS cipher suites.
# suites =
}
x509 {
# Discard certificates with unsupported or unknown critical extensions.
# enforce_critical = yes
}
}
@TC1977 Trying it out as we speak.
@TC1977 @davidemyers have you had any luck with the latest configs? I realized that my test device was running on iOS 12 beta and now on the GM version. Might not have realistic results.
My configuration mentioned above continues to run without issues. Someone else should really give it a try.
root@vpn4:~# ipsec statusall | head -2
Status of IKE charon daemon (strongSwan 5.6.2, Linux 4.15.0-32-generic, x86_64):
uptime: 30 days, since Aug 18 08:20:33 2018
root@vpn4:~# journalctl | grep unable | tail -1
Aug 18 07:35:14 vpn4 ipsec[768]: 15[IKE] unable to install IPsec policies (SPD) in kernel
@davidemyers
With these settings?:
My charon has now been running for 15 days with 3 devices connected without any "unable to install policy" errors. I wonder if it's because I'm using both uniqueids=yes and auto=route?
Yes, I'm only using those two changes to the default configuration.
@davidemyers I'm not about to reinstall Algo using uniqueids=yes, and auto=route appears to only run once on ipsec startup. It seems what happens is that the outgoing policy doesn't get deleted, because it's still used by another CHILD SA. Then, when that virtual IP gets assigned to the client again, we get the "unable to install policy, reqid x already exists" error. I've been trying to get it to delete sooner by using rekey=no, reauth=no, and dpddelay=5s but I'm still getting problems.
So I'm going to try another direction. Perhaps your success is because your clients are consistently getting the same virtual IP addresses? I'm going to set ikelifetime=28800s and inactivity=3600s and see if that helps. Note that the default iOS config only gives a rekeying interval of 20m, so this doesn't actually change rekeying behavior. But maybe it will let the client get the same virtual IP for longer.
New ipsec.conf:
config setup
uniqueids=never # allow multiple connections per user
charondebug="ike 1, knl 1, cfg 1, net 1, esp 1, dmn 1, mgr 1"
conn %default
fragmentation=yes
rekey=no
forceencaps=yes
dpdaction=hold
keyexchange=ikev2
compress=yes
dpddelay=35s
inactivity=3600s
ikelifetime=28800s
ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!
left=%any
leftauth=pubkey
leftid=[algo IP]
leftcert=[algo IP].crt
leftsendcert=always
leftsubnet=0.0.0.0/0,::/0
right=%any
rightauth=pubkey
rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
rightdns=172.16.0.1
conn ikev2-pubkey
auto=route
@davidemyers, question for you: I'm doing some reading and it seems that auto=route isn't recommended with right=%any. See this link, this link, and this link. My current config has been working well, without any drops and switching from LTE to Wifi without problems, but I'm getting a ton of these messages:
Sep 18 22:20:36 ip-172-16-254-145 charon: 05[CFG] installing trap failed, remote address unknown
Sep 18 22:41:14 ip-172-16-254-145 charon: 16[CFG] installing trap failed, remote address unknown
Sep 18 22:54:07 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 18 23:08:23 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 18 23:43:43 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 18 23:54:06 ip-172-16-254-145 charon: 13[CFG] installing trap failed, remote address unknown
Sep 19 00:01:58 ip-172-16-254-145 charon: 10[CFG] installing trap failed, remote address unknown
Sep 19 00:07:17 ip-172-16-254-145 charon: 13[CFG] installing trap failed, remote address unknown
Sep 19 00:20:40 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 19 00:28:21 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 19 00:42:53 ip-172-16-254-145 charon: 11[CFG] installing trap failed, remote address unknown
Sep 19 00:54:10 ip-172-16-254-145 charon: 07[CFG] installing trap failed, remote address unknown
Sep 19 01:08:20 ip-172-16-254-145 charon: 05[CFG] installing trap failed, remote address unknown
Sep 19 01:20:17 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 19 01:32:27 ip-172-16-254-145 charon: 09[CFG] installing trap failed, remote address unknown
Sep 19 01:39:01 ip-172-16-254-145 charon: 14[CFG] installing trap failed, remote address unknown
Sep 19 01:45:34 ip-172-16-254-145 charon: 12[CFG] installing trap failed, remote address unknown
Sep 19 02:05:27 ip-172-16-254-145 charon: 12[CFG] installing trap failed, remote address unknown
Are you getting any similar messages if you grep trap /var/log/syslog?
I don't have any of those messages in syslog. I don't know what to make of those strongSwan issues but I'm not having any issues with my configuration, now at 32 days of uptime. The two iOS devices on my test server are now on iOS 12 and working fine.
You don't have to deploy a new server to try uniqueids=yes as long as all of your devices are already using different mobileconfigs.
Ok, I just caught it! I had ssh open and was running tail -f /var/log/syslog|grep charon, watching for messages. I had dpdaction=hold as above, and auto=add, but in an effort to get rid of the installing trap failed messages, I switched it back to dpdaction=clear, and ran sudo ipsec reload. I had two iOS clients active on LTE, using two separate mobileconfigs, sending out DPD requests every few seconds. Then I got this:
Sep 19 14:20:30 ip-172-16-254-145 charon: 07[NET] sending packet: from 172.16.254.145[4500] to xxx.xxx.xxx.xxx[4500] (113 bytes)
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[NET] received packet: from xxx.xxx.xxx.xxx[4500] to 172.16.254.145[4500] (72 bytes)
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[ENC] parsed INFORMATIONAL request 80 [ D ]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[IKE] received DELETE for IKE_SA ikev2-pubkey[19]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[IKE] deleting IKE_SA ikev2-pubkey[19] between 172.16.254.145[52.22.108.80]... xxx.xxx.xxx.xxx[user1]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[IKE] IKE_SA deleted
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[ENC] generating INFORMATIONAL response 80 [ ]
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[NET] sending packet: from 172.16.254.145[4500] to xxx.xxx.xxx.xxx[4500] (57 bytes)
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[CFG] lease fd9d:bc11:4020::4 by 'user1' went offline
Sep 19 14:20:30 ip-172-16-254-145 charon: 08[CFG] lease 10.19.48.4 by 'user1' went offline
Sep 19 14:20:31 ip-172-16-254-145 charon: 15[KNL] creating acquire job for policy 40.97.145.146/32[tcp/https] === 10.19.48.4/32[tcp/54237] with reqid {14}
Sep 19 14:20:31 ip-172-16-254-145 charon: 15[CFG] trap not found, unable to acquire reqid 14
And sudo ip xfrm pol list shows this:
src ::/0 dst fd9d:bc11:4020::4/128
dir out priority 334463
tmpl src 172.16.254.145 dst xxx.xxx.xxx.xxx
proto esp spi 0x0633e4b6 reqid 14 mode tunnel
src 0.0.0.0/0 dst 10.19.48.4/32
dir out priority 383615
tmpl src 172.16.254.145 dst xxx.xxx.xxx.xxx
proto esp spi 0x0633e4b6 reqid 14 mode tunnel
src ::/0 dst fd9d:bc11:4020::1/128
dir out priority 334463
tmpl src 172.16.254.145 dst yyy.yyy.yyy.yyy
proto esp spi 0x08a2ae61 reqid 15 mode tunnel
src fd9d:bc11:4020::1/128 dst ::/0
dir fwd priority 334463
tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
proto esp reqid 15 mode tunnel
src fd9d:bc11:4020::1/128 dst ::/0
dir in priority 334463
tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
proto esp reqid 15 mode tunnel
src 0.0.0.0/0 dst 10.19.48.1/32
dir out priority 383615
tmpl src 172.16.254.145 dst yyy.yyy.yyy.yyy
proto esp spi 0x08a2ae61 reqid 15 mode tunnel
src 10.19.48.1/32 dst 0.0.0.0/0
dir fwd priority 383615
tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
proto esp reqid 15 mode tunnel
src 10.19.48.1/32 dst 0.0.0.0/0
dir in priority 383615
tmpl src yyy.yyy.yyy.yyy dst 172.16.254.145
proto esp reqid 15 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
where xxx.xxx.xxx.xxx is the IP of the phone that just got orphaned, and yyy.yyy.yyy.yyy is the IP of the other phone. So I think it's an issue where dpdaction=clear isn't working properly. I'm on auto=add btw. I'm going to switch back to dpdaction=hold and see if that helps.
@digeratus @davidemyers I've been running with the current setup for four days now without any drops or reconnection loops. Anecdotally, I notice that the VPN stays connected much longer, and I don't get any "leaks" where the phones are checking mail from their regular (cell tower) IPs rather than the VPN. The only issue now is a whole ton of installing trap failed, remote address unknown messages, but I can live with that. The two iPhones are on iOS 12.0 and 11.4.1.
ubuntu@ip-172-16-254-145:~$ sudo ipsec statusall | head -2
Status of IKE charon daemon (strongSwan 5.6.2, Linux 4.15.0-1021-aws, x86_64):
uptime: 4 days, since Sep 19 21:42:41 2018
ubuntu@ip-172-16-254-145:~$ journalctl | grep unable | tail -1
Sep 19 20:26:22 ip-172-16-254-145 ipsec[834]: 13[CFG] trap not found, unable to acquire reqid 29
Current /etc/ipsec.conf:
config setup
uniqueids=never # allow multiple connections per user
charondebug="ike 1, knl 1, cfg 1, net 1, esp 1, dmn 1, mgr 1"
conn %default
fragmentation=yes
rekey=no
forceencaps=yes
dpdaction=hold
keyexchange=ikev2
compress=yes
dpddelay=35s
inactivity=3600s
ikelifetime=28800s
ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!
left=%any
leftauth=pubkey
leftid=[my.algo.ip]
leftcert=[my.algo.ip].crt
leftsendcert=always
leftsubnet=0.0.0.0/0,::/0
right=%any
rightauth=pubkey
rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
rightdns=172.16.0.1
conn ikev2-pubkey
auto=add
Current /etc/strongswan.d/charon.conf settings are delete_rekeyed_delay = 10 and keep_alive = 25s.
I've now gone 45 days without a reconnection loop. Since we now know that the dos_protection change isn't enough to solve the problem, I propose replacing the section of the Troubleshooting document added by @QuentinMoss with the following:
If you're using 'Connect on Demand' on iOS or macOS and your client device appears stuck in a reconnection loop while trying to connect to the VPN, the following changes to the default IPsec configuration might help.
PLEASE NOTE: In order to use this particular configuration, every device must connect as a unique Algo user (as defined by users in config.cfg).
Make the following changes on the Algo server:
/etc/ipsec.conf:uniqueids=never to uniqueids=yes (near the top of the file)auto=add to auto=route (near the bottom of the file)xfrm policies:sudo ipsec stopsudo ip xfrm policy flushsudo ipsec startHere are the changes above as shell commands:
# This Perl command will create a backup copy of /etc/ipsec.conf named
# /etc/ipsec.conf.orig
sudo perl -p -i.orig -e 's/uniqueids=never.*$/uniqueids=yes/;' \
-e 's/auto=add/auto=route/;' /etc/ipsec.conf
# Restart IPsec after flushing the xfrm policies
sudo ipsec stop; sudo ip xfrm policy flush; sudo ipsec start
@davidemyers Sounds good to me. I've been traveling for the last few days and although the loops aren't as bad as before, I've had to go in and restart the server a couple of times. It seems the biggest problems are associated with hotel and other public Wi-Fis with captive portal login pages. The first connection will go through fine, but after the iPhone goes to sleep and disconnects, it has a hell of a time logging back in and connecting to the Algo server. I'm not sure if this is the same problem, though. I'll check out the logs when I get back home.
Aaargh! I just had a reconnect loop. So in regards to my previous post:
Never mind.
Ok @davidemyers, so I finally have time to try it your way. I've downloaded the latest Algo commit 399d472 and installed onto a brand new AWS instance, encrypted, connect on demand Wi-Fi and LTE, Wireguard disabled, dnscrypt-proxy and dnsmasq on. I created a separate .mobileconfig for each device, changed /etc/ipsec.conf to match your config of uniqueids=yes, auto=route, and ran sudo ipsec reload. One thing I noticed off the bat was this error message:
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] reusing virtual IP address pool 10.19.48.0/24
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] virtual IP pool too large, limiting to fd9d:bc11:4020::/97
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] reusing virtual IP address pool fd9d:bc11:4020::/48
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] loaded certificate "CN=54.82.89.174" from '54.82.89.174.crt'
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 06[CFG] added configuration 'ikev2-pubkey'
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 08[CFG] received stroke: route 'ikev2-pubkey'
Nov 01 22:46:28 ip-172-16-254-163 charon[6812]: 08[CFG] installing trap failed, remote address unknown
Nov 01 22:46:28 ip-172-16-254-163 ipsec_starter[6788]: routing 'ikev2-pubkey' failed
Nov 01 22:46:28 ip-172-16-254-163 ipsec_starter[6788]:
This is the error message I've usually received when trying auto=route. We'll see how it goes.
Well, that didn't take long. The problem is even worse in a way, because with uniqueids=yes the strongSwan server repeatedly tries to assign the same (stale) IP address to the client, with continuing errors. Check this out:
ubuntu@ip-172-16-254-163:~$ grep unable /var/log/syslog|grep charon
Nov 3 09:07:50 ip-172-16-254-163 charon: 10[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 46, the same policy for reqid 45 exists
Nov 3 09:07:50 ip-172-16-254-163 charon: 10[IKE] unable to install IPsec policies (SPD) in kernel
Nov 3 09:07:51 ip-172-16-254-163 charon: 06[CFG] trap not found, unable to acquire reqid 45
Nov 3 09:07:58 ip-172-16-254-163 charon: 05[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 47, the same policy for reqid 45 exists
Nov 3 09:07:58 ip-172-16-254-163 charon: 05[IKE] unable to install IPsec policies (SPD) in kernel
Nov 3 09:07:59 ip-172-16-254-163 charon: 11[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 48, the same policy for reqid 45 exists
Nov 3 09:07:59 ip-172-16-254-163 charon: 11[IKE] unable to install IPsec policies (SPD) in kernel
...
Nov 3 09:15:48 ip-172-16-254-163 charon: 06[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 366, the same policy for reqid 45 exists
Nov 3 09:15:48 ip-172-16-254-163 charon: 06[IKE] unable to install IPsec policies (SPD) in kernel
Nov 3 09:15:49 ip-172-16-254-163 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 367, the same policy for reqid 45 exists
Nov 3 09:15:49 ip-172-16-254-163 charon: 08[IKE] unable to install IPsec policies (SPD) in kernel
ubuntu@ip-172-16-254-163:~$ sudo ip xfrm pol list
src ::/0 dst fd9d:bc11:4020::1/128
dir out priority 334463
tmpl src 172.16.254.163 dst xxx.xxx.xxx.xxx
proto esp spi 0x0163ef20 reqid 45 mode tunnel
src 0.0.0.0/0 dst 10.19.48.1/32
dir out priority 383615
tmpl src 172.16.254.163 dst xxx.xxx.xxx.xxx
proto esp spi 0x0163ef20 reqid 45 mode tunnel
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket in priority 0
src 0.0.0.0/0 dst 0.0.0.0/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
src ::/0 dst ::/0
socket in priority 0
src ::/0 dst ::/0
socket out priority 0
I've rebooted the server, loaded higher level logging settings in /etc/ipsec.conf, and will post logs if and when it happens again.
As expected, it failed overnight, and therefore freaked out this morning. Nothing useful in the logs that we haven't seen before.
I've restarted the server with a couple of other options enabled in etc/strongswan.d/charon.conf to try to get those child_SA's closed.
close_ike_on_child_failure = yes
keep_alive = 25s
make_before_break = yes
I was hesitant to use make_before_break before, even though it was specifically mentioned in strongswan issue 2607, because I noticed a performance hit right after enabling it. But at this point I just care more about stability than anything else.
So after having the above config for a few hours, plus inactivity=3600s and ikelifetime=28800s in ipsec.conf, I've already run into another failure to delete a policy, with the resulting connecting/reconnecting loops. But this was yet another error message I hadn't seen before. You can also see that close_ike_on_child_failure didn't actually close the duplicate outgoing policy either. Googling "not enough input to parse rule 0 U_INT_8" led me to strongSwan issue #2438, which seems related to lifetime issues. So I've deleted inactivity=3600s and ikelifetime=28800s and restarted.
Nov 4 17:37:52 ip-172-16-254-163 charon: 12[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov 4 17:37:52 ip-172-16-254-163 charon: 12[ENC] not enough input to parse rule 0 U_INT_8
Nov 4 17:37:52 ip-172-16-254-163 charon: 12[ENC] payload type DELETE could not be parsed
Nov 4 17:37:52 ip-172-16-254-163 charon: 12[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 4 17:37:55 ip-172-16-254-163 charon: 05[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov 4 17:37:55 ip-172-16-254-163 charon: 05[ENC] not enough input to parse rule 0 U_INT_8
Nov 4 17:37:55 ip-172-16-254-163 charon: 05[ENC] payload type DELETE could not be parsed
Nov 4 17:37:55 ip-172-16-254-163 charon: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 4 17:37:58 ip-172-16-254-163 charon: 15[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov 4 17:37:58 ip-172-16-254-163 charon: 15[ENC] not enough input to parse rule 0 U_INT_8
Nov 4 17:37:58 ip-172-16-254-163 charon: 15[ENC] payload type DELETE could not be parsed
Nov 4 17:37:58 ip-172-16-254-163 charon: 15[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 4 17:38:01 ip-172-16-254-163 charon: 13[NET] received unencrypted informational: from xxx.xxx.xxx.103[4500] to 172.16.254.163[4500]
Nov 4 17:38:01 ip-172-16-254-163 charon: 13[ENC] not enough input to parse rule 0 U_INT_8
Nov 4 17:38:01 ip-172-16-254-163 charon: 13[ENC] payload type DELETE could not be parsed
Nov 4 17:38:01 ip-172-16-254-163 charon: 13[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 4 17:38:05 ip-172-16-254-163 charon: 10[NET] received packet: from xxx.xxx.xxx.87[500] to 172.16.254.163[500] (272 bytes)
Nov 4 17:38:05 ip-172-16-254-163 charon: 10[ENC] parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) ]
Nov 4 17:38:05 ip-172-16-254-163 charon: 10[CFG] looking for an ike config for 172.16.254.163...xxx.xxx.xxx.87
Nov 4 17:38:05 ip-172-16-254-163 charon: 10[CFG] candidate: %any...%any, prio 28
Nov 4 17:38:05 ip-172-16-254-163 charon: 10[CFG] found matching ike config: %any...%any with prio 28
Nov 4 17:38:05 ip-172-16-254-163 charon: 10[IKE] xxx.xxx.xxx.87 is initiating an IKE_SA
[...]
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[CFG] unable to install policy 0.0.0.0/0 === 10.19.48.1/32 out for reqid 9, the same policy for reqid 7 exists
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[IKE] unable to install IPsec policies (SPD) in kernel
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[IKE] closing IKE_SA due CHILD_SA setup failure
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy 0.0.0.0/0 === 10.19.48.1/32 out
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] policy still used by another CHILD_SA, not removed
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] not updating policy 0.0.0.0/0 === 10.19.48.1/32 out [priority 383615, refcount 1]
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy 10.19.48.1/32 === 0.0.0.0/0 in
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy 10.19.48.1/32 === 0.0.0.0/0 fwd
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] deleting policy ::/0 === fd9d:bc11:4020::1/128 out
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] policy still used by another CHILD_SA, not removed
Nov 4 17:38:05 ip-172-16-254-163 charon: 08[KNL] not updating policy ::/0 === fd9d:bc11:4020::1/128 out [priority 334463, refcount 1]
Similar failure today, after no failures yesterday. I'm now getting rid of forceencaps=yes in ipsec.conf, and close_ike_on_child_failure = yes and keep_alive = 25s in charon.conf. Next idea would be changing the client .mobileconfig, but I really didn't want to go down that route if possible.
Nov 6 15:09:32 ip-172-16-254-163 charon: 14[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov 6 15:09:32 ip-172-16-254-163 charon: 14[ENC] not enough input to parse rule 0 U_INT_8
Nov 6 15:09:32 ip-172-16-254-163 charon: 14[ENC] payload type DELETE could not be parsed
Nov 6 15:09:32 ip-172-16-254-163 charon: 14[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 6 15:09:35 ip-172-16-254-163 charon: 05[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov 6 15:09:35 ip-172-16-254-163 charon: 05[ENC] not enough input to parse rule 0 U_INT_8
Nov 6 15:09:35 ip-172-16-254-163 charon: 05[ENC] payload type DELETE could not be parsed
Nov 6 15:09:35 ip-172-16-254-163 charon: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 6 15:09:38 ip-172-16-254-163 charon: 06[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov 6 15:09:38 ip-172-16-254-163 charon: 06[ENC] not enough input to parse rule 0 U_INT_8
Nov 6 15:09:38 ip-172-16-254-163 charon: 06[ENC] payload type DELETE could not be parsed
Nov 6 15:09:38 ip-172-16-254-163 charon: 06[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 6 15:09:41 ip-172-16-254-163 charon: 12[NET] received unencrypted informational: from xx.xx.xx.31[4500] to 172.16.254.163[4500]
Nov 6 15:09:41 ip-172-16-254-163 charon: 12[ENC] not enough input to parse rule 0 U_INT_8
Nov 6 15:09:41 ip-172-16-254-163 charon: 12[ENC] payload type DELETE could not be parsed
Nov 6 15:09:41 ip-172-16-254-163 charon: 12[IKE] INFORMATIONAL request with message ID 0 processing failed
Has anyone tried configuring rekeying properly? Seems that's the cause of everything here
And I'd rather to file an issue to the StrongSwan bugtracker or someone's done that already?
The symptoms we're seeing are similar to those already reported here: https://wiki.strongswan.org/issues/2607
I've not added to that issue as I don't feel I really understand what's going on.
I don't understand what's going on at all. I had another hard connect/reconnect loop 2 days ago, but didn't see any payload type DELETE could not be parsed messages. I'm trying rekey=yes and disabled make_before_break now. Also went back to uniqueids=never, as uniqueids=yes only seemed to make it impossible to recover from a stale policy (since it would loop back to the same virtual IP again and again).
Ultimately I think solving the problem will also require editing the settings in the client config, which would require also editing the .mobileconfig in Apple Configurator and reinstalling with every iteration.
Is this problem only seen with Apple devices, and only when "Connect on Demand" is enabled?
> Sep 19 14:20:31 ip-172-16-254-145 charon: 15[KNL] creating acquire job for policy 40.97.145.146/32[tcp/https] === 10.19.48.4/32[tcp/54237] with reqid {14}
> Sep 19 14:20:31 ip-172-16-254-145 charon: 15[CFG] trap not found, unable to acquire reqid 14
If anyone is still following this, please try this. Next time you get a reconnect loop, try grep "creating acquire job" /var/log/syslog, and post it here. (Make sure your strongswan logging settings are on level 2, at least for 'knl'.) I noticed that looking back at this log, 40.97.145.146 is an IP owned by Microsoft, I'd guess on Azure, and I had another log around here somewhere which also implicated an IP owned by Microsoft. I wonder if we're all having problems with one specific provider blocking IPsec traffic. That might explain why some of us are having problems nearly every day, while some of us don't.
Meanwhile I've modified my ipsec.conf and the mobileconfigs further, with some success, but maybe at this point I'm better off putting these into a separate branch to track the changes more easily.
I've been running the same config for one week at this point with no stale policies and no connect/reconnect loops. Here's my /etc/ipsec.conf:
config setup
uniqueids=yes # do not allow multiple connections per user
charondebug="ike 2, knl 2, cfg 2, net 1, esp 1, enc 1, dmn 1, mgr 1"
conn %default
fragmentation=yes
rekey=yes
reauth=no
dpdaction=clear
keyexchange=ikev2
compress=yes
dpddelay=35s
lifetime=3h
ikelifetime=12h
ike=aes256gcm16-prfsha512-ecp384,aes256-sha2_512-prfsha512-ecp384,aes256-sha2_384-prfsha384-ecp384!
esp=aes256gcm16-ecp384,aes256-sha2_512-prfsha512-ecp384!
left=%any
leftauth=pubkey
leftid=[redacted IP]
leftcert=[redacted IP].crt
leftsendcert=always
leftsubnet=0.0.0.0/0,::/0
right=%any
rightauth=pubkey
rightsourceip=10.19.48.0/24,fd9d:bc11:4020::/48
rightdns=172.16.0.1
conn ikev2-pubkey
auto=add
Here's my /etc/strongswan.d/charon.conf. Rationale behind the changes is to try to minimize a pause I observe when switching from Wi-Fi to LTE. It seems the iPhone opens a new IKE_SA, which hangs, and then another one which succeeds. Going from LTE to Wi-Fi is seamless.
# Options for the charon IKE daemon.
charon {
# Accept unencrypted ID and HASH payloads in IKEv1 Main Mode.
# accept_unencrypted_mainmode_messages = no
# Maximum number of half-open IKE_SAs for a single peer IP.
# block_threshold = 5
# Whether Certificate Revocation Lists (CRLs) fetched via HTTP or LDAP
# should be saved under a unique file name derived from the public key of
# the Certification Authority (CA) to /etc/ipsec.d/crls (stroke) or
# /etc/swanctl/x509crl (vici), respectively.
# cache_crls = no
# Whether relations in validated certificate chains should be cached in
# memory.
# cert_cache = yes
# Send Cisco Unity vendor ID payload (IKEv1 only).
# cisco_unity = no
# Close the IKE_SA if setup of the CHILD_SA along with IKE_AUTH failed.
close_ike_on_child_failure = yes
# Number of half-open IKE_SAs that activate the cookie mechanism.
# cookie_threshold = 10
# Delete CHILD_SAs right after they got successfully rekeyed (IKEv1 only).
# delete_rekeyed = no
# Delay in seconds until inbound IPsec SAs are deleted after rekeyings
# (IKEv2 only).
# delete_rekeyed_delay = 10
# Use ANSI X9.42 DH exponent size or optimum size matched to cryptographic
# strength.
# dh_exponent_ansi_x9_42 = yes
# Use RTLD_NOW with dlopen when loading plugins and IMV/IMCs to reveal
# missing symbols immediately.
# dlopen_use_rtld_now = no
# DNS server assigned to peer via configuration payload (CP).
# dns1 =
# DNS server assigned to peer via configuration payload (CP).
# dns2 =
# Enable Denial of Service protection using cookies and aggressiveness
# checks.
# dos_protection = yes
# Compliance with the errata for RFC 4753.
# ecp_x_coordinate_only = yes
# Free objects during authentication (might conflict with plugins).
# flush_auth_cfg = no
# Whether to follow IKEv2 redirects (RFC 5685).
# follow_redirects = yes
# Maximum size (complete IP datagram size in bytes) of a sent IKE fragment
# when using proprietary IKEv1 or standardized IKEv2 fragmentation, defaults
# to 1280 (use 0 for address family specific default values, which uses a
# lower value for IPv4). If specified this limit is used for both IPv4 and
# IPv6.
# fragment_size = 1280
# Name of the group the daemon changes to after startup.
# group =
# Timeout in seconds for connecting IKE_SAs (also see IKE_SA_INIT DROPPING).
half_open_timeout = 5
# Enable hash and URL support.
# hash_and_url = no
# Allow IKEv1 Aggressive Mode with pre-shared keys as responder.
# i_dont_care_about_security_and_use_aggressive_mode_psk = no
# Whether to ignore the traffic selectors from the kernel's acquire events
# for IKEv2 connections (they are not used for IKEv1).
# ignore_acquire_ts = no
# A space-separated list of routing tables to be excluded from route
# lookups.
# ignore_routing_tables =
# Maximum number of IKE_SAs that can be established at the same time before
# new connection attempts are blocked.
# ikesa_limit = 0
# Number of exclusively locked segments in the hash table.
# ikesa_table_segments = 1
# Size of the IKE_SA hash table.
# ikesa_table_size = 1
# Whether to close IKE_SA if the only CHILD_SA closed due to inactivity.
inactivity_close_ike = yes
# Limit new connections based on the current number of half open IKE_SAs,
# see IKE_SA_INIT DROPPING in strongswan.conf(5).
# init_limit_half_open = 0
# Limit new connections based on the number of queued jobs.
# init_limit_job_load = 0
# Causes charon daemon to ignore IKE initiation requests.
# initiator_only = no
# Install routes into a separate routing table for established IPsec
# tunnels.
# install_routes = yes
# Install virtual IP addresses.
# install_virtual_ip = yes
# The name of the interface on which virtual IP addresses should be
# installed.
# install_virtual_ip_on =
# Check daemon, libstrongswan and plugin integrity at startup.
# integrity_test = no
# A comma-separated list of network interfaces that should be ignored, if
# interfaces_use is specified this option has no effect.
# interfaces_ignore =
# A comma-separated list of network interfaces that should be used by
# charon. All other interfaces are ignored.
# interfaces_use =
# NAT keep alive interval.
keep_alive = 25s
# Plugins to load in the IKE daemon charon.
# load =
# Determine plugins to load via each plugin's load option.
# load_modular = no
# Initiate IKEv2 reauthentication with a make-before-break scheme.
# make_before_break = yes
# Maximum number of IKEv1 phase 2 exchanges per IKE_SA to keep state about
# and track concurrently.
# max_ikev1_exchanges = 3
# Maximum packet size accepted by charon.
# max_packet = 10000
# Enable multiple authentication exchanges (RFC 4739).
# multiple_authentication = yes
# WINS servers assigned to peer via configuration payload (CP).
# nbns1 =
# WINS servers assigned to peer via configuration payload (CP).
# nbns2 =
# UDP port used locally. If set to 0 a random port will be allocated.
# port = 500
# UDP port used locally in case of NAT-T. If set to 0 a random port will be
# allocated. Has to be different from charon.port, otherwise a random port
# will be allocated.
# port_nat_t = 4500
# Whether to prefer updating SAs to the path with the best route.
# prefer_best_path = no
# Prefer locally configured proposals for IKE/IPsec over supplied ones as
# responder (disabling this can avoid keying retries due to
# INVALID_KE_PAYLOAD notifies).
# prefer_configured_proposals = yes
# By default public IPv6 addresses are preferred over temporary ones (RFC
# 4941), to make connections more stable. Enable this option to reverse
# this.
# prefer_temporary_addrs = no
# Process RTM_NEWROUTE and RTM_DELROUTE events.
# process_route = yes
# Delay in ms for receiving packets, to simulate larger RTT.
# receive_delay = 0
# Delay request messages.
# receive_delay_request = yes
# Delay response messages.
# receive_delay_response = yes
# Specific IKEv2 message type to delay, 0 for any.
# receive_delay_type = 0
# Size of the AH/ESP replay window, in packets.
# replay_window = 32
# Base to use for calculating exponential back off, see IKEv2 RETRANSMISSION
# in strongswan.conf(5).
# retransmit_base = 1.8
# Maximum jitter in percent to apply randomly to calculated retransmission
# timeout (0 to disable).
# retransmit_jitter = 0
# Upper limit in seconds for calculated retransmission timeout (0 to
# disable).
# retransmit_limit = 0
# Timeout in seconds before sending first retransmit.
# retransmit_timeout = 4.0
# Number of times to retransmit a packet before giving up.
# retransmit_tries = 5
# Interval in seconds to use when retrying to initiate an IKE_SA (e.g. if
# DNS resolution failed), 0 to disable retries.
# retry_initiate_interval = 0
# Initiate CHILD_SA within existing IKE_SAs (always enabled for IKEv1).
reuse_ikesa = yes
# Numerical routing table to install routes to.
# routing_table =
# Priority of the routing table.
# routing_table_prio =
# Whether to use RSA with PSS padding instead of PKCS#1 padding by default.
# rsa_pss = no
# Delay in ms for sending packets, to simulate larger RTT.
# send_delay = 0
# Delay request messages.
# send_delay_request = yes
# Delay response messages.
# send_delay_response = yes
# Specific IKEv2 message type to delay, 0 for any.
# send_delay_type = 0
# Send strongSwan vendor ID payload
# send_vendor_id = no
# Whether to enable Signature Authentication as per RFC 7427.
# signature_authentication = yes
# Whether to enable constraints against IKEv2 signature schemes.
# signature_authentication_constraints = yes
# The upper limit for SPIs requested from the kernel for IPsec SAs.
# spi_max = 0xcfffffff
# The lower limit for SPIs requested from the kernel for IPsec SAs.
# spi_min = 0xc0000000
# Number of worker threads in charon.
# threads = 16
# Name of the user the daemon changes to after startup.
# user =
crypto_test {
# Benchmark crypto algorithms and order them by efficiency.
# bench = no
# Buffer size used for crypto benchmark.
# bench_size = 1024
# Number of iterations to test each algorithm.
# bench_time = 50
# Test crypto algorithms during registration (requires test vectors
# provided by the test-vectors plugin).
# on_add = no
# Test crypto algorithms on each crypto primitive instantiation.
# on_create = no
# Strictly require at least one test vector to enable an algorithm.
# required = no
# Whether to test RNG with TRUE quality; requires a lot of entropy.
# rng_true = no
}
host_resolver {
# Maximum number of concurrent resolver threads (they are terminated if
# unused).
# max_threads = 3
# Minimum number of resolver threads to keep around.
# min_threads = 0
}
leak_detective {
# Includes source file names and line numbers in leak detective output.
# detailed = yes
# Threshold in bytes for leaks to be reported (0 to report all).
# usage_threshold = 10240
# Threshold in number of allocations for leaks to be reported (0 to
# report all).
# usage_threshold_count = 0
}
processor {
# Section to configure the number of reserved threads per priority class
# see JOB PRIORITY MANAGEMENT in strongswan.conf(5).
priority_threads {
}
}
# Section containing a list of scripts (name = path) that are executed when
# the daemon is started.
start-scripts {
}
# Section containing a list of scripts (name = path) that are executed when
# the daemon is terminated.
stop-scripts {
}
tls {
# List of TLS encryption ciphers.
# cipher =
# List of TLS key exchange methods.
# key_exchange =
# List of TLS MAC algorithms.
# mac =
# List of TLS cipher suites.
# suites =
}
x509 {
# Discard certificates with unsupported or unknown critical extensions.
# enforce_critical = yes
}
}
In addition to this, I've also changed my mobileconfig to change the <key>LifeTimeInMinutes</key> from <integer>20</integer> to <integer>1440</integer> in both places where the field appears.
@digeratus Maybe you want to test this config out? I'd like to know if this config plays well with non-Apple clients.
@TC1977 I just got back. Will test today
Still no further "policy failed" errors after another week, running with rekey=yes, uniqueids=yes, and increased times on the iOS mobileconfig side. The received unencrypted informational: messages are still coming through occasionally but don't cause any orphaned policies.
The changes are available at TC1977/algo, with one exception. The /etc/strongswan.d/charon.conf setting changes aren't included because I have no idea how to create that file and refer to it during installation. (I really don't know what I'm doing here.)
ubuntu@ip-172-16-254-163:~$ journalctl -u strongswan|grep failed
Nov 18 21:24:55 ip-172-16-254-163 charon[871]: 10[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:24:58 ip-172-16-254-163 charon[871]: 06[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:01 ip-172-16-254-163 charon[871]: 08[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:04 ip-172-16-254-163 charon[871]: 14[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 10[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 06[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 08[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 18 21:25:08 ip-172-16-254-163 ipsec[782]: 14[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 25 17:00:46 ip-172-16-254-163 charon[871]: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 26 09:21:36 ip-172-16-254-163 ipsec[782]: 05[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 29 09:30:23 ip-172-16-254-163 charon[871]: 15[IKE] INFORMATIONAL request with message ID 0 processing failed
Nov 29 09:31:30 ip-172-16-254-163 ipsec[782]: 15[IKE] INFORMATIONAL request with message ID 0 processing failed
ubuntu@ip-172-16-254-163:~$ journalctl -u strongswan|grep unable
@TC1977 Seems like things are ok on this front. Has anyone else tested?
Most helpful comment
@QuentinMoss This solved the connection issues for me as well... great find!