Telegraf: Windows: deploy via ansible rc: 0 while "The service process could not connect to the service controller." in eventlog

Created on 13 Sep 2016  路  13Comments  路  Source: influxdata/telegraf

System info:

Win2012R2

Steps to reproduce:

Running in powershell/console works OK (both interactive and installing the service)
Deploying via winrm (ansible) on the same box rc:0, no stdout or stderr output + eventlog error.

  1. execute telegraf.exe via ansible's raw module (powershell via winrm)
  2. check output (rc:0, no err/stdout) + eventID3 in eventlog

Expected behavior:

installing/running telegraf
rc:0, stderr NOT "" and/or stdout NOT "" and no eventID

Actual behavior:

"rc": 0, "stderr": "", "stdout": "", "stdout_lines": [] + eventID:3 "The service process could not connect to the service controller."

Use case:

testing telegraf on windows works fine, mass-deploying it using ansible (or any other deployment method that uses winrm) doesn't, and this is a blocker for using telegraf on windows.

bug help wanted platforwindows

All 13 comments

I don't quite understand what the issue is here, is it because telegraf is not sending anything to the event log?

@one1zero1one would you happen to be able to try it out via the old NSSM installation method? see https://github.com/influxdata/telegraf/blob/9320a6e115b0bc2d7a832ae56ef0c8329df9db79/docs/WINDOWS_SERVICE.md

cc @butitsnotme

@sparrc - I've managed to register the service via ansible with chocolatey->nssm - but that's a lot of overheads and workarounds for registering a simple metric agent service...

Back to your first question the issue here is that, with the same admin user, running from console/powershell telegraf.exe works (test and registering service) - however, when running it over WinRM, I get (rc:0) nothing on stderr/stdout, but "The service process could not connect to the service controller." in windows event log.

yep, it's not a permanent workaround but just to try to diagnose what part of the stack is failing.

Do you have any other debug information that may be of use? Do you know what commands ansible is running to install the service? which user it's running under? Is it specific to ansible or to WinRM in general?

I've also opened an issue with the service installation library here: https://github.com/kardianos/service/issues/72

@sparrc ansible is using winrm and powershell. So it actually runs the exact same command using the exact same user as in console (where it works) - only it does it over winrm.
I'll look into it further during the weekend to see if I can rule ansible out for this behaviour.

@sparrc I jumped to conclusion saying that deploying via nssm works.
It does register the service,

C:\telegraf>nssm install Telegraf c:\telegraf\telegraf.exe -config c:\telegraf\telegraf.config
Service "Telegraf" installed successfully!

However, the service registered points to "C:\ProgramData\chocolatey\lib\NSSM\Tools\nssm-2.24\win64\nssm.exe" (http://imgur.com/gBWu0rE) instead of telegraf :/

So - currently the only way that work to use/install telegraf as a service on windows2012r2 is to do it interactively in console.

I've seen from the issue you opened with the service installation library that it could be a permission issue, however - my issue is that whatever path I take to automate (other than running in console), there is no error/output from telegraf.exe to give some kind of clue what's up. Using full admin everywhere for this tests.

@sparrc I finally got it to work using sc.exe under winrm (from ansible).

EXEC sc.exe create telegraf binpath= "C:\telegraf\telegraf.exe -config c:\telegraf\telegraf.conf"
WINRM RESULT u' Response code 0, out "[SC] CreateService S", err "" '

I held back from trying sc.exe in the begining because of the whole debate here https://github.com/influxdata/telegraf/issues/860 - however I'm happy it finally works, we can automate its deployment now.

Not sure about this bug, -service install still won't work from winrm - however I'm happy with sc.exe if it holds up in time. Thanks!

glad you found a workaround, I'll leave this open until https://github.com/kardianos/service/issues/72 is solved, as it's not ideal

We ran into the same issue with telgraf not being able to be installed using Ansible. The problem is the way that the code handles interactive and non-interactive sessions.

The library for the service wrapper and reloadLoop combine together in a way that if you are in a non-interactive session, it assumes you are running as a service and are performing only a start or stop.

We had to modify the code to move the handling of the -service flags and then managed to get it working as intended, we could provide this change as a PR if desired.

The second issue is that there is no logging what so ever when running under Windows, which is a real issue as you cannot debug a broken config.

@peter-murray yes please submit a PR!

logging is another issue that I'm also not quite sure how to handle, can you open a separate issue for that?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

robert-gomes picture robert-gomes  路  3Comments

Xiol picture Xiol  路  3Comments

corentingi picture corentingi  路  3Comments

yn1v picture yn1v  路  3Comments

efficks picture efficks  路  3Comments