last_run_message in scheduled tasks appears to be encoded incorrectly. You can see the string is captured as narrow: https://github.com/osquery/osquery/blob/master/osquery/tables/system/windows/scheduled_tasks.cpp#L116
But despite being multibyte isn't converted correctly.
osquery> SELECT version, build, platform FROM os_version;
+----------+-------+----------+
| version | build | platform |
+----------+-------+----------+
| 6.3.9600 | 9600 | windows |
+----------+-------+----------+
osquery> SELECT version FROM osquery_info;
+---------+
| version |
+---------+
| 4.2.0 |
+---------+
Language pack: Simplified Chinese (zh_cn)
Run select last_run_message from scheduled_tasks LIMIT 1; on a multibyte platform
osquery> select last_run_message from scheduled_tasks LIMIT 1;
+------------------+
| last_run_message |
+------------------+
| 操作成功完成。 |
+------------------+
osquery> select last_run_message from scheduled_tasks LIMIT 1;
+------------------+
| last_run_message |
+------------------+
| ²Ù×÷³É¹¦Íê³É¡£ |
+------------------+
Heads up @farfella, I'm guessing we'll need to audit a lot of tables to make wide strings work correctly.
Looking at this a bit closer the issue is more interesting than the encoding being incorrect. It's actually just the encoding isn't what I would expect.
Running:
C:\Users\Administrator\Desktop>osquery.exe --allow_unsafe --json "select last_ru
n_message from scheduled_tasks LIMIT 1" > c:\test.txt
The contents of test.txt is:
[
{"last_run_message":"操作成功完成。"}
]
Which is correct however 操作成功完成。is encoded as GB2312 which then cannot be interpreted correctly.
I have made a change to capture the wide version and then convert it. I'll raise a pull request.
@theopolis-- We will have missed some, like this TCHAR ErrorMessage() case. It may be easier to wait for users to find these. :-)
@Breakwell-- The implementation in your PR #6310 deviates from what _com_error::ErrorMessage() can return. :-(
My recommendation* to fix this is to update the build configuration so UNICODE and _UNICODE are #defined at the project level. This way, all TCHARs will be wchar_ts instead of chars, and it will handle all these errors in one-shot.
@Breakwell Any thoughts? I can implement the project-level UNICODE/_UNICODE change, as well. But it would be sometime after next week.
@theopolis Yeah... defining UNICODE and _UNICODE pops up several cases where folks just called functions without the *A or *W suffix: Those cases translated to the *A function equivalents (since UNICODE and _UNICODE were not defined), which assume localized strings. For example, CertNameToStr, GetSystemDirectory, CM_Get_Device_ID, etc. I'll look at them after next week, unless someone fixes them before then. :-)
Good find, no rush!
@farfella defining UNICODE is definitely the way to go. Should I just close my PR?
@Breakwell Yes. If you would like to add the UNICODE and _UNICODE preprocessor definitions for cmake and update the other cpp files which also exhibit this issue in a new PR, you are welcome to it. :-) I can review your changes. Otherwise, I'll work on these next week. Up to you.
@farfella I'll leave it to you :) I'll close the PR. Thanks.
@farfella Is there an open issue for the UNICODE change? I'd like to add it as a comment in the PR I am closing.
@Breakwell I think #2569 is the main issue.... But yikes. This is from 2016, it seems.
I only joined a few months back started fixing these cases when I saw issue #6160. :-)
The relevant PRs are: #6187, #6190, and #6291.
I think this has been addressed with all of the recent UNICODE support work (thanks!). Let us know if it's still an issue.