Core: PPPOE reconnect loops due to buggy LCP packet bypass

Created on 18 Mar 2018  路  55Comments  路  Source: opnsense/core

I have an issue with a stable DSL connection using opnsense 18.1.x. After a reboot everything works fine. In case of reconnection (e.g. forced connection drop by my ISP after 24h, errors etc.) my system runs into a reconnection loop. Depending on the opnsense version (18.1.2 is the first release I ever used) the system behaviour looks differently:

  • v.18.1.2: the connection comes up again and runs fine for 90 seconds and drops again
  • v.18.1.4. the connection does not come up again

The problem is that after re-establishing the PPPoE connection PPP LCP packets sent out on netgraph hook _mpd32168-lso:b0_ (received on _mpd32168-wan:bypass_) are not (or only partly) forwarded via hook _mpd32168-wan:link0_ (see attached netgraph diagram).

In case of version 18.1.2 after re-establishing the PPPoE connection PPP LCP configuration packets are successfully exchanged, but responses to periodically incoming PPP LCP Echo requests (sent every 30 seconds by my ISP) are not forwarded. After 3 unanswered echo requests (3*30 = 90seconds) the ISP drops the connection. In case of version 18.1.4 even the exchange of PPP LCP configuration packets fails.

The bug report is related to the discussion in the opnsense forum: https://forum.opnsense.org/index.php?topic=7270.0

Neither manual reconnect nor restart of the mpd5 daemon works, the only solution is a reboot of the opnsense system.

netgraph_ix0

bug

All 55 comments

I am also interested in the evolution of PPPoE with OPNsense, so signing up for this one.

Thank you for reporting the issue.

I haven't tried this patch. I think, the patch will not solve the provlem because the observed issues reside on PPP and LCP layer.

Since it was mentioned that it is "stuck" and only reboot helps the following could help since it is related to the pppoe interface in the netgraph kernel module:

https://github.com/opnsense/core/commit/64057c136f4d#diff-c5f4a3581fb76adc9e03ceaee62ceb86R12

Oh sorry, I have overlooked the details of your patch 馃槷 and its relation to netgraph. I will test it tomorrow.

Much appreciated, thanks! :)

I tested the patch, but my system shows the same behaviour like before.

I think the next step should be debugging the mpd5 daemon to see whether the corresponding socket of netgraph hook _mpd32168-wan:bypass_ receive and process the LCP packets. Maybe, I can find some time the next weeks to check this.

Sometimes this issue also occurs after rebooting the machine, not only after pppoe reconnect

I'm experiencing the same issue. I have to reboot my OPNsense multiple times (2 - 3x) per day.
mpd5 stops answering LCP Echo Requests and the connection is subsequently terminated:
https://forum.opnsense.org/index.php?topic=7270.msg35642#msg35642

This is reliable reproduceable by forcing a PPPoE reconnect, by either side.

Can I do anything to help you fix this issue?

Edit: @fichtner 64057c1 did not fix this issue for me either.
My initial installation (18.1.3) was running fine, I think the issue was introduced in either .4 or .5 and was not fixed with .6

Your help is highly appreciated. The next steps I like to try when I have some free time:

  1. get more reference logs from working installations (for comparison)
  2. check whether _LCP echo response_ packets sent out by _mpd32168-lso:b0_ are successfully received by hook _mpd32168-wan:bypass_ (see netgraph above)
  3. check the obscure memory allocator within _mpd5_
  4. debug the mpd daemon (could be hard without test env and real debugger)

Problem identified :-)

As already mentioned my ISP sends LCP echo request every 10 seconds and terminates the ppp/pppoe session after three unanswered packets. Since the ISP does not receive LCP echo replies my opnsense enters an infinite reconnection loop until I reboot the system. The cause is a timing issue in OpnSense (mpd daemon in conjunction with scripts) which affects all users applying to pppoe dialin in OpnSense.

After negotiating and establishing the ppp session the mpd daemon spawns a new child process for executing the ppp-linkup script by calling the blocking system() function call. This results in blocking the single-threaded mpd instance until the ppp-linkup script finishes. That time all incoming packets for managing the ppp session are queued and not processed.

On my system the ppp-linkup script consumes too much time, so that the mpd daemon is not able to reply the LCP echo request packets in time. This is also the cause why the system runs fine after reboot but not after reestablishing the pppoe session (after reboot the script consumes around 25 seconds to execute in other ways much more time).

In summary, it can be the stated that the problem results in a design flaw. It is not a good idea to process time critical actions within the mpd5 daemon while running a blocking system call without knowing how much time it consumes. To my mind in the long term the architecture of the mpd daemon needs a change by spawning the child process in a separate thread (or all time critical actions).

As a short term solution all actions which are not related to setting up the interface (all but ip addresses, dns servers and routes) should be sourced out to synchronized parallel processes.

@somova thanks for this analysis, we can try a thing or two with this info :)

@fichtner: thanks for the fast reply. Currently I am testing a dirty work around as short term solution (see here)

@somova I was thinking something similar, but needed to check the code first to see if this was a very early regression when we removed pfSctl, but it does not seem so. How about a54cf79 (untested, extra branch for now).

@fichtner: I applied the patch, the ppp-linkup works fine after reboot. But after initiating a reconnect I don't have internet access anymore. It seems that not all necessary services are restarted and the default routes are not setup properly.

Log after reboot (excerpt):
May 6 15:28:46 opnsense: /usr/local/etc/rc.bootup: Dynamic DNS: (Success) IP Address Updated Successfully!
May 6 15:28:46 opnsense: /usr/local/etc/rc.bootup: Dynamic DNS: updating cache file /var/cache/dyndns[...]
May 6 15:28:45 opnsense: /usr/local/etc/rc.bootup: Dynamic DNS: (Success) IP Address Updated Successfully!
May 6 15:28:45 opnsense: /usr/local/etc/rc.bootup: Dynamic DNS: updating cache file /var/cache/dyndns[...]
[...]
May 6 15:27:46 opnsense: /usr/local/etc/rc.bootup: ROUTING: setting IPv6 default route to fe80::9203:25ff:fe74:4dfe
May 6 15:27:46 opnsense: /usr/local/etc/rc.bootup: ROUTING: setting IPv4 default route to 62.214.63.89
May 6 15:27:46 opnsense: /usr/local/etc/rc.bootup: ROUTING: no IPv6 default gateway set, trying wan on 'pppoe0' (fe80::9203:25ff:fe74:4dfe)
May 6 15:27:46 opnsense: /usr/local/etc/rc.bootup: ROUTING: no IPv4 default gateway set, trying wan on 'pppoe0' (62.214.63.89)

Log after manually initiated reconnect (excerpt):
May 6 15:33:24 kernel: ix0: promiscuous mode disabled
May 6 15:32:37 kernel: ng_pppoe[10]: no matching session
May 6 15:32:34 kernel: ng_pppoe[10]: no matching session
May 6 15:32:34 kernel: ng_pppoe[10]: no matching session
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: skipping IPv4 default route to wan
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: skipping IPv4 default route to wan
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: no IPv6 default gateway set, trying wan on 'pppoe0' ()
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: no IPv4 default gateway set, trying wan on 'pppoe0' ()
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: removing /tmp/pppoe0_defaultgwv6
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: removing /tmp/pppoe0_defaultgw
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'opt3'
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: On (IP address: 192.168.100.2) (interface: WAN_GUESTNET[opt3]) (real interface: ix0).
May 6 15:32:33 opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ix0'
May 6 15:32:33 opnsense: /usr/local/etc/rc.linkup: Hotplug event detected for WAN_GUESTNET(opt3) but ignoring since interface is configured with static IP (192.168.100.2 ::)
May 6 15:32:33 kernel: ix0_vlan7: link state changed to UP
May 6 15:32:33 kernel: ix0: link state changed to UP
May 6 15:32:24 opnsense: /usr/local/etc/rc.linkup: Hotplug event detected for WAN_GUESTNET(opt3) but ignoring since interface is configured with static IP (192.168.100.2 ::)
May 6 15:32:24 kernel: ng0: changing name to 'pppoe0'
May 6 15:32:23 kernel: ix0_vlan7: link state changed to DOWN
May 6 15:32:23 kernel: ix0: link state changed to DOWN
May 6 15:32:23 opnsense: /status_interfaces.php: The command '/sbin/ifconfig 'pppoe0' inet6 -accept_rtadv' returned exit code '1', the output was 'ifconfig: interface pppoe0 does not exist'
May 6 15:32:23 opnsense: /status_interfaces.php: The command `/sbin/ifconfig -m 'pppoe0'' failed to execute ifconfig: interface pppoe0 does not exist
May 6 15:32:19 kernel: ix0: promiscuous mode enabled
May 6 15:32:16 kernel: ix0: promiscuous mode disabled
May 6 15:32:09 sshlockout[83567]: sshlockout/webConfigurator v3.0 starting up
May 6 15:32:09 ppp-linkdown: Removing stale PPPoE gateway 62.214.63.89 on inet

looks like ppp-linkup is started but pppoe0 is not configured? oO

@somova could you also post the interfaces: ppp: log file output ?

@somova and one more question... when the manual reconnect is done, will this fix it?

# configctl interface newip pppoe0

It looks like there is something wrong with the config.xml ... reading this pppoe0 is on top of ix0_vlan7 is on top of ix0 ? where would a static IP of 192.168.100.2 on ix0 come from if not from the config.xml ?!

Due to problems with the patch I switched back to my work around. But this morning again no internet connection :cry:. The process list contained some hanging python processes which looked like remains from yesterday's tests.

root 47440 100.0 0.1 43644 10004 - Rs 15:32 860:46.39 /usr/local/bin/python2.7 /usr/local/opnsense/service/configd_ctl.py -m 'dns reload' 'interface newip pppoe0'
root 40075 83.2 0.1 43644 9876 - Rs 15:27 829:08.92 /usr/local/bin/python2.7 /usr/local/opnsense/service/configd_ctl.py -m 'dns reload' 'interface newip pppoe0'
root 36610 80.0 0.1 43644 9868 - Rs 15:27 827:26.13 /usr/local/bin/python2.7 /usr/local/opnsense/service/configd_ctl.py -m 'dns reload' 'interface newip pppoe0'
root 44841 76.2 0.1 43644 10016 - Rs 15:32 801:07.49 /usr/local/bin/python2.7 /usr/local/opnsense/service/configd_ctl.py -m 'dns reload' 'interface newip pppoe0'
root 20349 0.0 0.1 1110628 6468 - Ss 05:53 0:00.02 /usr/local/sbin/mpd5 -b -k -d /var/etc -f mpd_wan.conf -p /var/run/pppoe_wan.pid -s ppp pppoeclient

@somova could you also post the interfaces: ppp: log file output ?
@somova and one more question... when the manual reconnect is done, will this fix it?

Ok, I will test the patch again and post the results.

It looks like there is something wrong with the config.xml ... reading this pppoe0 is on top of ix0_vlan7 is on top of ix0 ? where would a static IP of 192.168.100.2 on ix0 come from if not from the config.xml ?!

  1. The pppoe0 is on top of ix0
  2. The ix0_vlan7 is on top of ix0 (This one was only for testing purposes with fritzbox but not in use anymore).

The static ip on ix0 is for the IP related communication with my fritzbox 7412 (192.168.100.1). This box is not configured as a real bridge modem but forwards incoming pppoe packets to the DSL interface and vice versa. In this configuration the box is still able to offer all its standard functionality to me (e.g. WLAN [as my guest WLAN] etc.)

After applying the patch again and performing a reconnect the two spawned daemon processes (one for IPv6 and IPv4) hang with 100 percent cpu load.

root 94073 100.0 0.1 43644 10012 - Rs 21:21 3:00.70 /usr/local/bin/python2.7 /usr/local/opnsense/service/configd_ctl.py -m 'dns reload' 'interface newip pppoe0'
root 97704 100.0 0.1 43644 10008 - Rs 21:21 3:00.65 /usr/local/bin/python2.7 /usr/local/opnsense/service/configd_ctl.py -m 'dns reload' 'interface newip pppoe0'

# configctl interface newip pppoe0

This cmd line has no effect.

_Edit:_
I forgot to add the ppp log :open_mouth:
@fichtner: If you still need the ppp log? I can reapply the patch for testing. Please let me know.

Will this issue be solved in milestone 18.7 or do need some more beta testers?

I have had a similar (or this issue, i am not sure).
I use a VDSL 50 (German Telekom).
This happens since the upgrade to 19.1 (and now 19.1.1 also). pppoe stops working, interface is red and gateway is down also (of course). Reconnect does not work. I have done a reboot, this does not work also, rebooted after that the modem (vigor130) , then a new connection is established. It seems this lasts around 3-7 days and happens again, will keep an eye on it, to try to get a really better detailed view on whats going on.
My ppps.log is somewhat suspicios (the last 10-20 lines contain older log entries (4 hours back) and the log is somewhat crippeld at the end (many @@@@)

Feb 12 20:38:04 OPNsense ppp: [wan] IPCP: state change Ack-Sent --> Opened
Feb 12 20:38:04 OPNsense ppp: [wan] IPCP: LayerUp
Feb 12 20:38:04 OPNsense ppp: [wan]   xxx.xxx.xxx.xxx -> yyy.yyy.yyy.yyy
ection timeout after 9 seconds
Feb 12 16:50:19 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:50:19 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:50:19 OPNsense ppp: [wan_link0] Link: reconnection attempt 2212 in 4 seconds
Feb 12 16:50:23 OPNsense ppp: [wan_link0] Link: reconnection attempt 2212
Feb 12 16:50:23 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:50:32 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:50:32 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:50:32 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:50:32 OPNsense ppp: [wan_link0] Link: reconnection attempt 2213 in 4 seconds
Feb 12 16:50:36 OPNsense ppp: [wan_link0] Link: reconnection attempt 2213
Feb 12 16:50:36 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:50:45 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:50:45 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:50:45 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:50:45 OPNsense ppp: [wan_link0] Link: reconnection attempt 2214 in 2 seconds
Feb 12 16:50:47 OPNsense ppp: [wan_link0] Link: reconnection attempt 2214
Feb 12 16:50:47 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:50:56 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:50:56 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:50:56 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:50:56 OPNsense ppp: [wan_link0] Link: reconnection attempt 2215 in 1 seconds
Feb 12 16:50:57 OPNsense ppp: [wan_link0] Link: reconnection attempt 2215
Feb 12 16:50:57 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:51:06 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:51:06 OPNsense ppp: [wan_link0] Link: DOWN event
[...]

Feb 12 16:58:47 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:58:47 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:58:47 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:58:47 OPNsense ppp: [wan_link0] Link: reconnection attempt 2256 in 1 seconds
Feb 12 16:58:48 OPNsense ppp: [wan_link0] Link: reconnection attempt 2256
Feb 12 16:58:48 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:58:57 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:58:57 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:58:57 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:58:57 OPNsense ppp: [wan_link0] Link: reconnection attempt 2257 in 4 seconds
Feb 12 16:59:01 OPNsense ppp: [wan_link0] Link: reconnection attempt 2257
Feb 12 16:59:01 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:59:10 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:59:10 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:59:10 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:59:10 OPNsense ppp: [wan_link0] Link: reconnection attempt 2258 in 2 seconds
Feb 12 16:59:12 OPNsense ppp: [wan_link0] Link: reconnection attempt 2258
Feb 12 16:59:12 OPNsense ppp: [wan_link0] PPPoE: Connecting to ''
Feb 12 16:59:21 OPNsense ppp: [wan_link0] PPPoE connection timeout after 9 seconds
Feb 12 16:59:21 OPNsense ppp: [wan_link0] Link: DOWN event
Feb 12 16:59:21 OPNsense ppp: [wan_link0] LCP: Down event
Feb 12 16:59:21 OPNsense ppp: [wan_link0] Link: reconnection attempt 2259 in 1 seconds
Feb 12 16:59:22 OPNsense ppp: [wan_link0] Link: reconnection attempt 2259
Feb 12 16:59:22 OPNsense ppCLOG^A^@^@^@7~^G^@<EC><CD>^G^@^@^@^@^@

I am still experiencing this issue. After an update of opnsense I have manually to modify some files to get the system running again. This is a little bit annoying as regular updates (especially security related) are important.

How can I help to get this issue solved? Thanks.

Which files did you edit?

After every update, I have to do two steps:

  1. I copy the file "/usr/local/sbin/ppp-linkup" to "/usr/local/sbin/ppp-linkup2"
  2. I replace the file "/usr/local/sbin/ppp-linkup" with the following content:
#! /bin/sh

nohup /usr/local/sbin/ppp-linkup2 "${1}" "${2}" "${3}" "${4}" "${5}" "${6}" "${7}" "${8}" "${9}" "${10}" &
exit 0

This moves all linkup-stuff to a background task and works fine.

But, this is a workaround, later integration in the production series need possibly some kind of locks to prevent interferences between linkup and linkdown scripts.

I鈥檒l merge most of
https://github.com/opnsense/core/tree/ppp-timeout next week and we can start on this fresh. I don鈥檛 think we need an explicit lock, configd is what will serialise it anyway although it was reported to not work properly on the first try.

In any case we need user updates and prodding to push through this. No core dev uses a PPPoE connection unfortunately.

That sounds good. I was not aware that the core developers do not use PPPoE.

But, I can help with tests and reports to get PPPoE running flawlessly again. Do I have to migrate to version 19.1 before I can start testing the next time (I am still using v.18.7.10_4-amd64)?

Wait for 19.1.2 next week, then update if you can do it. But we could also work on 18.7 for the time being, although it won鈥檛 receive official updates so 19.1.x would be our target in any case...

Today, I have migrated my system to opnsense version 19.1.2. So we can go ahead with this version.

I am now on 19.1.4 , today it happend again, that pppoe was down, i did not noticed it (as it is my 2nd internet line), this time i rebooted the dsl modem and it worked again. I think the magic trigger ist, that the lan port on the opnsense gets also down/up with this, will check next time this.

I am a bit lost on the shell for debugging this more clearly, any hints on this, whats the best way for pppoe interfaces ? The /var/log/ppps1.log or(or similar) file, did not show anything valuable, syslog also nothing, i am a bit lost here .. the box was up and running since the 19.1.4 release, so it does not happen that often. I have a daily dsl reconnect job at 2:15 (but the issue today was early in the morning around 7:25, so that does not match with the reconnect.

Happend again, i pulled the cable from the link to the dsl modem and waited some seconds and pluged it in, but that does not help, rebooted the modem and it worked again. The modem was up and synced, no special error in the modem log.

Here is the logfile /var/log/ppps.log from today (ips and stuff is redacted, hope thats ok)

ppps.log.txt

You need to use "clog" to read the log files. Can you give the clear dump?

Basic question while we're here and it was never mentioned as far as I can see: are you guys are using hardware or VMs?

https://www.reddit.com/r/vmware/comments/8nznva/pfsense_pppoe_vswitch_unable_to_establish/

I have edited my last post and removed the loglines and added the logfile.

I use an APU2D4 System and an Draytek Vigor 130 VDSL Modem (this modems works flawlessly on my linux server / router, but it will seperate the firewall / router part from it :)

@rcmcronny: This issue is about LCP echo response messages which cannot sent out via the interface in time due to a script which blocks the mpd daemon. So, the ISP thinks the client is not responding anymore and the connection goes on-hook.

It seems that your issue is another one. Your logfile shows that reconnection timed out. So, I recommend you to track this down using wireshark packet analysis. And create a new bug report in case the issue is not the same.

@fichtner: I am now on version 19.1.4. How can I support the development team with tests to solve the issue?

I've added a few commits in the last weeks and thinking about how to debug this further, but I'm not entirely sure which part hangs other than "everything". The best I can come up with is log process ID date and operation to be performed and do a mini log file for ppp linkup/linkdown?

It's a pitty that mpd is single-threaded and blocks until the script is completed. What is about moving the scripts to the background (like in my dirty workaround)? In the latter case we should consider what happens if the ppp-linkdown script is executed while the ppp-linkup script is still executing.

Anybody? I have patches to try for the willing on top of 19.1.7...

@fichtner Yes please!

You can try a development lock removal in DNS code:

# opnsense-patch 5f4315c

You can try the old experimental patch for doing all reloading in the background:

# opnsense-patch 2806a0c

On top of this we could then try removing DNS reload on ppp-linkdown or having a log file writing actions and timestamps to see where the linkup blocks...

@Alphakilo sorry, missed you on IRC yesterday... should be around all week to discuss if you want :)

@fichtner # opnsense-patch 5f4315c is applied. 2806a0c is prefetched and cached, in case things go south.

patches are always cached nowadays, but good thinking nevertheless 馃憤

So, now I have switched to 19.1.7 and can test the patches. Before I start, I need to rollback my workaround before applying the patches. In case something goes wrong and opnsense disconnects from the internet, it's better to have physical access to the machine. Thus, I postpone testing to next monday.

Okay, let's see. I've installed the patch May 13th. Since then I had 8 reconnects. I'd have expected 4 reconnects, one every 24 hours.

# grep "LCP: rec'd Terminate Request #\|IPCP: state change Ack-Sent --> Opened" /var/log/ppps.log | grep -v "May 13"
May 14 19:12:27 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 14 19:12:42 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 14 19:13:21 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 14 19:13:34 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 15 19:13:32 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 15 19:13:45 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 15 19:14:55 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 15 19:15:11 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 15 19:15:50 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 15 19:16:05 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 16 19:16:03 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 16 19:16:18 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 17 19:16:17 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 17 19:16:32 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened
May 17 19:17:42 gateway ppp: [wan_link0] LCP: rec'd Terminate Request #3 (Opened)
May 17 19:18:00 gateway ppp: [wan] IPCP: state change Ack-Sent --> Opened

The problem is still there, just not as prevalent as it used to be.

So, I have performed the tests with Opnsense 19.1.7 and the following steps:

  1. Dropped the running PPPoE session
  2. Did a rollback of my workaround
  3. Dumped the network traffic while establishing a new PPPoE session and saw delayed answers to "PPP LCP Echo Requests" (as seen in the past)
  4. Applied both patches (5f4315c and 2806a0c)
  5. Dropped the running PPPoE session
  6. Dumped the network traffic while establishing a new PPPoE session

Now "PPP LCP Echo Requests" are immediately answered and my ISP does not drop the PPP connection anymore :-). So, the patches look good, but I need some more time for testing the stability.

:+1:

@Alphakilo @somova thanks for testing... more or less on the right track. there's still a little contention around linkup and linkdown we could try to sidestep by refusing to reconfigure dns on network linkdown. thoughts? :)

reconfigure dns on network linkdown

DynDNS? If OPNsense doesn't reconfigure that, and there is a failover, the DynDNS records won't be updated, right? That'd be.. not good.

More context: configd could block there to reconfigure DNS while doing something else, but reconfiguring DNS (resolv.conf really) serves no purpose if a link goes down as connectivity is broken anyway.

2806a0c tip-toes around this by doing fire and forget, but it doesn't seem reasonable to call it in the first place.

@fichtner : Is it safe to update to opnsense v.19.1.8 after applied the patches in v.19.1.7?

@Alphakilo : I am not sure your issue is the same like mine. Can you please check (with wireshark) whether unanswered LCP echo request packets are the cause that your ISP terminates the connection.

After updating my opnsense to version19.1.9 and applying patch 2806a0c PPP LCP echo requests are still immediately answered :smile:

@somova thanks for bearing with us 鉂わ笍 2806a0c is now on master as e9f2862

Was this page helpful?
0 / 5 - 0 ratings