Meshcentral: Agent not reconnecting after reboot automatically in Windows

Created on 29 Sep 2019  路  13Comments  路  Source: Ylianst/MeshCentral

I've been testing a Mesh Central server with agents on 3 machines and it's all been working perfectly but I've tried adding a 4th system and after rebooting the agent doesn't automatically reconnect to the server unless I restart the service manually.

The 4th system is a Windows 10 system (as are the other 3) and I tried to enable the agent logging but this is all I see:

[2019-09-29 03:38:12 PM] Attempting to connect to Server...
[2019-09-29 03:38:20 PM] Attempting to connect to Server...
[2019-09-29 03:38:33 PM] Attempting to connect to Server...
[2019-09-29 03:38:47 PM] Attempting to connect to Server...
[2019-09-29 03:39:08 PM] Attempting to connect to Server...
[2019-09-29 03:39:31 PM] Attempting to connect to Server...

It just does this over and over until I restart the service and then it instantly reconnects to the server again.

This is all running in LAN mode and 3 other systems are working perfectly through reboots.

If I run the agent the status window tells me the current agent status is "Running" and Windows also says the service is running after reboot, plus I see it in task manager.

Is there any more advanced logging I can enable, or how can I check what could be stopping it connecting?

Thanks

bug

All 13 comments

Agent connection logging
I unfortunately don't use the windows agents, but that said, as per your request, if you suspect it's a connection issue, you may want to enable connection logging as described here: https://github.com/Ylianst/MeshCentral/issues/474 (controlChannelDebug=1 in the .msh file) to see if that yields any additional information in the log file. (You probably want to turn this off (controlChannelDebug=) once you are done)

User Permissions
Other than connection debugging, you may want to verify who the service runs as (In the "Log On" tab of the service) in the working agents, and compare that to the non working agents to see if there's a discrepancy. (I suspect both would be running in the same way, but you never know)

OS
I'm assuming that these are all running on Windows 10, with the same service patch, if not, it might be something to note in the unlikely situation where the patch version might matter. (For example, maybe you observe that this only affects machines without the Creators update?)

If nothing else, providing these additional bits of data might give the dev's some more data points to help resolve the issue.

I've done the controlChannelDebug=1 config which is what gave me the log above but it just says attempting to connect to server again and again until I restart the service.

I've just checked the user and on all of the systems the agent is set to run as "Local System", I can't see any differences in how the agent services are installed.

All systems were mostly up to date, I've just run updates on all and it looks like there's a few bits recently released which are being installed now but doesn't look like a major update. I've also disabled firewalls on all systems to rule that out.

It's so strange because just a simple restart of the service has it connected instantly and working perfectly again until reboot.

I'll do some testing when I get in the office in Monday. It looks like you are running the agent in a mode where it doesn't know the server url and must multicast to find it. I'll check the retry logic to see how often it retries the multicast.

But it looks like based on your log, it says it's attempting to connect, but it doesn't yet have a valid url to connect to.

Yea in the MeshAgent.msh it has MeshServer=local so I assume it has to find the server itself.

If I want to set it statically I assume I could just edit that file to be:

MeshServer=wss://servername:443/agent.ashx

Is there a setting to change in the server somewhere so that all agents downloaded automatically have the IP fixed like that, or should you manually edit every agents configuration file?

I've attached the MeshAgent.log though I'm not sure if it's useful. All the times in the log where it suddenly connects are where I restarted the service in Windows manually. One thing I did notice is sometimes it connects to the IPv4 address of the server and some times the IPv6 address, could that make any difference?

I also restarted the server and all other agents connected back up automatically apart from this one problem system.

MeshAgent.log

The problem system isn't connected via wireless is it?

Ok I went ahead and changed the MeshServer setting to the second line and specified the IP and now it's working perfectly through reboots. I then put it back to local and the issue comes back.

So the agent service is definitely starting every time, there just seems to be some issue specific to this PC that means if the agent starts at Windows boot up it's unable to find the server by itself even if you leave it running for hours, but if you then later manually restart the service it starts working instantly.

So it's not really fixed, I can work around it this way but someone else may run into the same issue I guess. It would also be nice to specify each downloaded agent should be configured with this fixed IP so I don't have to manually configure them.

No it's not wireless, all the agent systems are wired and connected to the same switch.

So I now see that in WAN mode it probably does what I was asking as you specify a server name/IP so I assume the agents then have it coded in automatically, I'll probably use that going forward as it makes more sense for me.

However FYI in LAN mode I did some more testing and changed the agent service from Automatic to Automatic (Delayed Start) which adds a 120 second delay and now it works when set to MeshServer=local.

I can only assume that on this problem system the network connection takes longer to come up and the agent was starting before there was a connection available. I'm not sure why it then wouldn't subsequently connect once the connection was available until a manual restart of the agent though. But it seems to be related to the agent starting too early anyway.

@pbo10 Does the problem system have multiple network interfaces, and, is it possible to reorder them or disable the unused ones as a troubleshooting step to see if that impacts the behavior?

Reorder NICs in Windows 10
https://www.windowscentral.com/how-change-priority-order-network-adapters-windows-10

Nope, it only has one network interface.

It definitely seems to be around the timing of that network connection being active.

I think probably the multicast message being sent to search for available servers must be sent before the network connection is active and it's not set to send again for a very long time or at all?

I've deleted the rest of the log now, but the one I uploaded shows it was running from midnight until 3pm without being able to connect, so a second multicast search for the server didn't happen at least in all those hours.

Just to confirm, when looking at the meshagent.msh with a text editor, is the agent that is not reconnecting have:

MeshServer=local

or

MeshServer=wss://servername:443/agent.ashx

This makes a huge different in how we debug this. If the server is set to "local", the agent will multicast to find the server which is completely different code. Let us know which one you have, thanks.

Please check if you have proxy setting and if that proxy is actually not needed to access Meshcentral. Meshagent is now very smart and it is checking user environment for potential proxy. It will then use it to connect to Meshcentral. I have a case where my Linux agent could not connect to my internal Meshcentral because it tried to connect via proxy (which will never work). Remove that proxy setting and see if meshagent is now connected correctly.

The issue occurs when the line is set to:

MeshServer=local

It works perfectly on reboot if I set it to:

MeshServer=wss://192.168.1.22:443/agent.ashx

At the moment I've just left it set to local but left the service set to Automatic (Delayed Start) and it still works fine that way, albeit with a 2 minute delay before reconnecting on every reboot.

I don't think I have a proxy set anywhere, there's no proxy specified in meshagent.msh and it has a line saying:

ignoreProxyFile=1

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MailYouLater picture MailYouLater  路  4Comments

unguzov picture unguzov  路  3Comments

penguinthingie picture penguinthingie  路  4Comments

M1CK431 picture M1CK431  路  3Comments

hellofaduck picture hellofaduck  路  3Comments