Hi, I'll preface this with saying that I haven't determined if this is a Traefik issue or a Meshcentral issue. I've requested a response on the Traefik slack as well, but no response from them as of yet.
Traefik is a docker and cluster aware reverse proxy, with the ability to dynamically create and monitor let's encrypt certificates on container load without needing to change the config files.
It appears that Traefik forwards traffic to the MeshCentral web interface just fine, but when agents connect via websockets it does not get forwarded. Traefik provides the following error code whenever an agent tries connecting:
2019-02-10T00:50:54Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from client to backend: websocket: incorrect mask flag
Traefik does support websockets from other products, does this indicate that MeshCentral is providing an incorrect flag when initiating a websockets connection?
As an optimization, I think MeshCentral doesn't mask websockets when it's carried over TLS. I wonder if that's causing an issue with your reverse proxy. In the morning I can force it to always mask websockets to see if that resolves your issue.
It sounds like doing so will resolve the problem, I'll give it a test when you update and see how it goes.
It would be nice if Traefik supported web sockets that do not use masks. Marks are useful with HTTP, but with HTTPS I don't see any upside and it just cuts down on a little on server performance. This said, I guess we would set a mask of all zero bits, that should get around the issue.
Hi Ylianst. Would you be willing to submit an issue on traefik's github? If not I'll be happy to do it, however my knowledge of websockets is quite basic and I might not correctly describe the issue.
I can do it. Doing it now, hold on...
Progress report on this one. Bryan (krayon007) added support in the latest agent to perform full websocket masking, but it's not enabled by default. I will be adding a server-side flag to enable this. Once enabled, when you download the agent, the server will give you an agent that has full masking enabled. So, I will report back when it's done.
Just published MeshCentral v0.2.8-v with support for adding extra configuration parameters that will be inserted into the MeshAgent and .msh file when someone downloads it. So you can now do this:
{
"settings": {
"port": 443
},
"domains": {
"": {
"Title": "MyServer",
"AgentConfig": [ "webSocketMaskOverride=1" ]
}
}
}
Note that the string "webSocketMaskOverride=1" is case sensitive, so type it exactly. You need to add "AgentConfig" to the domain you want it to take effect (often the default "" domain). After that, reset the server and each agent downloaded with have TLS masking enabled.
Hopefully that will fix the Traefik problem. Hope it helps - Ylian
Hi Yilan,
Thanks for the update. Good News: I can now route through traefik, and the agent gets detected properly. Hooray!
Bad news: I don't seem to actually gain control. Remote desktop/remote console do not appear as options, and commands sent to the remote computer (wake up, shutdown, reset) do not seem to respond. It correctly detects when the computer is connected, but that's about as far as it goes. Neither the traefik or meshcentral console (via docker) are saying there are issues.
That being said, this may just be a configuration issue on my end. I'll see if I can set up a similar config using nginx later and see if it has the same issues with masking on.
@Ylianst I think your issue on the Traefik GitHub has gone unanswered because of the missing issue template. I just created a new issue that follows their template: https://github.com/containous/traefik/issues/4513
Apologies for the delay, I will get on this in the next few days. We just have to give you a sample web socket client that does not do masking.
The issue is now discussed at: https://github.com/containous/traefik/issues/4487. The "webSocketMaskOverride=1" workaround does fix it, however, once the agent is installed without masking, there is no way for IT to switch to using Traefik without reinstalling all agents.
Got the same problem as @routerino . Clients are showing up but Remote Desktop does not appear. When installing a new client I got the following error in my Traefik logs:
vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF
So I think it's not an configuration problem.
If you are using Traefik, did you put this line in your config.json domain section?
"AgentConfig": [ "webSocketMaskOverride=1" ]
Also, you may want to try again with the latest server, I made a fix there user notification on Linux and MacOS would cause a black screen.
Yes, at first I got no connetction at all between my server and clients. After putting this line in my config.json there is a connection, but console, remote desktop etc. aren麓t showing up.
Yes, at first I got no connetction at all between my server and clients. After putting this line in my config.json there is a connection, but console, remote desktop etc. aren麓t showing up.
Can confirm, I am still experiencing the same issue (issue didn't go away, just had to put meshcentral testing on hiatus for a while). Config.json can be found here:
Error within traefik is as follows:
time="2019-05-13T01:21:46Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from client to backend: websocket: close 1006 (abnormal closure): unexpected EOF"
How it is viewed in the interface:
https://imgur.com/YLC1Rnz
Note how the remote viewing tab doesn't exist at all.
Arg. This is not good. This may not be related to the web socket masking at all and would explain why the config.json line has no effect. The reported error does not look like the masking problem. I will need to install Traefik and try it myself.
Arg. This is not good. This may not be related to the web socket masking at all and would explain why the config.json line has no effect. The reported error does not look like the masking problem. I will need to install Traefik and try it myself.
Hi ylianst.
You'll have to set up for your own instance, but you can find my docker-compose for traefik here:
https://pastebin.com/YSmyRYAU
Here is a corresponding toml config file for traefik
https://pastebin.com/v3HH3SDe
Finally, docker-compose for meshconnect that I'm using (you already have the config file)
https://pastebin.com/j0bHLC3B
depending on what distro you are running docker on, cockpit with cockpit-docker is great for checking errors in docker-containers and current status. For ubuntu server, you install the whole stack with "sudo apt-get install cockpit cockpit-docker docker docker-compose"
I can put my configs online as well if you like. I麓m checking the logs of the containers with Portainer.
What version of Traefik are you using?
I am not familiar with Traefik at all... I just downloaded the Windows version and trying to make a small config file that routes port localhost:444 to localhost:443. So far, no luck. I have no idea what configuration is ok with that version of Traefik. If I could make this work, I could easily add this to my regular test runs. Do you know what I am doing wrong below? I am trying on both 1.7 and 2.0.0-alpha4.
When I hit "https://localhost:444" I currently see "404 page not found".
[entryPoints]
[entryPoints.http]
address = ":81"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":444"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "webserver-cert-public.crt"
keyFile = "webserver-cert-private.key"
[frontends]
[frontends.frontend1]
backend = "backend1"
[frontends.frontend1.routes.test_1]
rule = "Host:localhost"
[backends]
[backends.backend1]
[backends.backend1.servers.server1]
url = "http://127.0.0.1:443"
weight = 1
Note: I played around with "healtcheck" and added "/health" in the next version of MeshCentral that returns "200 OK". Traefik seems to be polling this well.
Nevermind, I got it working!! I can now start testing...
[entryPoints]
[entryPoints.http]
address = ":81"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":444"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "webserver-cert-public.crt"
keyFile = "webserver-cert-private.key"
[file]
[backends]
[backends.backend1]
[backends.backend1.healthcheck]
path = "/health"
interval = "30s"
[backends.backend1.servers.server1]
url = "http://127.0.0.1:443"
weight = 1
[frontends]
[frontends.frontend1]
entryPoints = ["https"]
backend = "backend1"
passHostHeader = true
[frontends.frontend1.routes]
[frontends.frontend1.routes.main]
rule = "Host:devbox.mesh.meshcentral.com"
#rule = "Path:/"
[api]
entryPoint = "traefik"
dashboard = true
#address = "localhost:8089"
So, I did some work with Traefik 1.7 and it worked perfectly for me. I also wrote a new section in the MeshCentral User's Guide 0.2.3 on how to setup Traefik.
I now realize that if you setup a MeshAgent and see "websocket: close 1006 (abnormal closure)". It's probably because Traefik is presenting a TLS certificate that MeshCentral is not aware of and so, MeshCentral and the MeshAgent think there is a man-in-the-middle attack.
You can debug this by running "node ./node_modules/meshcentral --debug". You can fix this by adding this to the domain section of config.json:
"certUrl": "1.2.3.4"
There 1.2.3.4 is the IP address where MeshCentral can do a HTTPS connection and load the TLS certificate that is presented externally to the agents that are connecting. You can also do add the following in the settings section:
"ignoreagenthashcheck": true
This is not recommended, but will tell the server not to check for TLS man-in-the-middle attacks. This is a good way to see if this is the problem since everything will work immidiatly. Hope it helps.

Hi again,
I agree, the issues I've been having related to traefik were cert related. I've since rebuilt again from scratch and the EOF/socket mask issues have gone away. It's safe to say that traefik isn't the cause of the below issue.
I apologise about misdiagnosing the problem, though it's good that it's involved in traefik being investigated as a proxy.
That being said, I'm still encountering this issue of no remote desktop tab. Something very interesting about this fault though: If I go into the console, and upload the recovery core, I gain the "files" and "terminal" tabs. These appear to be functioning. Still no remote desktop.
So maybe this is an independent issue. It appears to be agent related. Tried on both windows 1809 and windows 1803.
I added in "dupagents" command in the server console a few days back. You go in "My Server" tab, go in "Console" sub-tab and type "dupagents". It gives you a list of what agents cause the duplicates and from what IP address. I am noticing a lot of these "duplicates" on my server and working on this now. To my surprise, it's all over the place but I got good data for investigating this now.
As an update, I've since set up a new instance completely separate from docker, inside of a virtual machine. This way I could access the instance directly via LAN, or via WAN in traefik.
From my testing, the lack of a remote desktop/files/terminal tab still appears to be a traefik related problem. It may be indirectly related to the socket mask filtering setting (since you get no connectivity at all via traefik without it).
However, LAN connectivity works flawlessly.
I've dug around some open issues in traefik related to web sockets, and it appears there may be further issues with their implementation. For now, I'll probably switch to nginx or haproxy (which is unfortunate because traefik's docker integration makes it very useful).
Interesting update to this problem. I've been using meshcentral directly for a while (though in LAN only since I can't provision its own IP for it yet). I updated meshcentral and traefik and put it back in the old configuration, just in case things have improved.
All of the existing agents can connect just fine, even through the proxy (with the certurl set up on meshcentral). Any new agents I try to add have the issue described above (works without remote desktop in recovery core mode, does not respond in normal core mode).
So, what's special about the handshaking that only affects newly added clients?
EDIT: Mask socket problem still needs an overrride, but the fact about the existing clients working is certainly odd.
I've created a short video of the problem. I created a temporary client, and connected it directly, no proxy on. Works fine. I then set up the proxy, changed DNS, reconnected the agent (this is where the video starts). Again, no problem. I can disconnect and reconnect fine. I then (in the video) deleted the db file and the mesh file on the client side, and attempted to reconnect. I then get the above described problem. It seems that if the DB file is already there, I'm not having an issue. If the DB file needs creating, there's something missing out of the DB file that's stopping all the action.
I can also confirm identical results on ubuntu linux.
Hi @Ylianst ,
I suspect this is not going to be solved easily (or ever), but I have found a workaround.
What I have done is exposed meshcentral directly on a different port, and set up the whitelist within the settings to only allow meshcentral agents through the exposed port. I then whitelist the traefik server IP. This allows users to log in through traefik (and the benefits that provides), while allowing the agents to connect to the server directly, bypassing any problems with reverse proxies completely.
The end result is the agents connect to www.contosso.com:1234 fine, while users will get blocked if accessing on that port. However, users can connect via reverse proxy on mesh.cotosso.com.
@routerino: That's a good workaround, but really is a workaround. Perhaps you could compare the two meshagent.db files to see what the difference between them is.
@Ylianst / @krayon007: So that @routerino can compare the files properly, what format is the meshagent.db file in?
Most helpful comment
Just published MeshCentral v0.2.8-v with support for adding extra configuration parameters that will be inserted into the MeshAgent and .msh file when someone downloads it. So you can now do this:
Note that the string "webSocketMaskOverride=1" is case sensitive, so type it exactly. You need to add "AgentConfig" to the domain you want it to take effect (often the default "" domain). After that, reset the server and each agent downloaded with have TLS masking enabled.
Hopefully that will fix the Traefik problem. Hope it helps - Ylian