For confirmed bugs, please report:
I'm using 5.0.0-alpha4 and I noticed that on some users the service was not able to start up. The following error was in the log file:
2016-08-18T18:22:56-07:00 CRIT Exiting: yaml: control characters are not allowed
I noticed that the C:\ProgramData\winlogbeat\winlogbeat.yml file was blank with all zeroes.
# xxd winlogbeat.yml 0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
This is affecting tens of hosts out of a few hundred. Original forum post here https://discuss.elastic.co/t/corrupt-winlogbeat-yml-checkpoint-file/58417
:+1:
I have noticed this as well. It originally started with a handful of hosts but lately it seems to be spreading to more of them over time.
To remediate I just delete the file to make the service come back up, but eventually the hosts will revert back to this state. Can also confirm that this effects Windows 10 as well.
I was able to reproduce this by powering off a Windows 2012 VM running in VirtualBox. It only occurred while I had lots of events being read, which causes the registry to be updated more often.
I also noticed that my log file exhibited similar behavior and was full of 0's at the end.
After some brief investigation, I think the problem is caused by the file cache in Windows. The file cache does lazy writes unless specifically configured to write-through to the disk. So I think the problem is occurring when we lose power and the cache hasn't been flushed.
So when we create the file we need to use the FILE_FLAG_WRITE_THROUGH flag, but Go doesn't expose the flag so we'll have to do our own syscall.
File Caching in Windows
StackOverflow - cause of corrupted file contents
I opened PR #2434 for 5.X to add the FILE_FLAG_WRITE_THROUGH. I think this should address the problem, but it's hard to say with 100% confidence. Hopefully once it's merged and released you guys can test it on your fleet of machines and provide feedback on whether the problem has been resolved.
Closing this as #2434 was merged.
Most helpful comment
Closing this as #2434 was merged.