Osquery: Host identifier discussion

Created on 2 Dec 2016  路  19Comments  路  Source: osquery/osquery

After https://github.com/facebook/osquery/pull/2522/files, an osqueryd UUID is no longer unique in the case where the same host is running many containers, all of which run osqueryd. Although I assume the intention of #2522 was to make the behavior of UUID more similar to the behavior on macOS, I assert that this is actually less desirable on Linux.

RFC question

All 19 comments

Host UUIDs are unique to hosts. Containers run on the same host, so the uuid in system_info is valid and the result is equivalent regardless of the OS. To differentiate containers we should consider adding container UUIDs (though the container-specific APIs) or instance UUIDs.

I created a workaround by adding an entrypoint like this to the containers

if [ ! -f /.osquery-uuidgen ]; then
    uuidgen > /sys/class/dmi/id/product_uuid
    touch /.osquery-uuidgen
fi
exec "$@"

but it's not great, because /sys is a readonly filesystem in a container, so you must mount product_uuid as a volume explicitly in order to override it at runtime.

-v $(pwd)/product_uuid:/sys/class/dmi/id/product_uuid

That seems dangerous, perhaps extend the system_info table to include a machine_id column backed by: https://www.freedesktop.org/software/systemd/man/machine-id.html and use that to differentiate containers? Unless /etc is bind mounted too :/

I think that the original motivation for the UUID was to allow each instance of osqueryd to present a unique ID. I don't think having a UUID that is per host is of much value if you're running osqueryd in containers. The previous implementation of UUID as a random ID per instance was ideal, the new implementation breaks enrollment for us pretty badly.

If the "point" of UUID somehow stopped being "globally unique ID for all instances of osquery" and became "kind of unique ID that is unique per hosts, but won't be if you run more than one osquery on the same host", then #2522 represents a breaking change IMO and should be abstracted behind a feature flag like --uuid_unique_by_host or something. It should NOT be the default.

Or I could also support using /sys/class/dmi/id/product_uuid as hardware_uuid and keeping uuid as a randomly generated per-instance ID.

@marpaia and I were discussing this, and we see value in having both a hardware backed UUID as well as an ID that is unique to each installation of osquery. Maybe you want to know that a host that re-enrolled is the same host that was reformatted and had a new OS installed -- The hardware UUID can tell you that. Maybe you just want to know every unique osquery installation -- The UUID (perhaps generated_uuid?) can tell you that.

I'd support adding a new field (a-la generated_uuid as @zwass is mentioning), as our analysts make heavy use of the concept that a UUID is _unique per host_. Having an indicator of a specific host in a world of ever changing/shifting host based identifiers pretty nice in incident triage.

Yeah, I see the value in per-host UUIDs, but we still need a globally unique (best effort given the keyspace of course) ID as host_identifier during enrollment given that a lot of our infrastructure depends on there being a unique key per instance of osquery supplied by osquery.

@marpaia for sure. I can see the value in having a unique _per instance_ identifier, but what's wrong with adding another generated field?

Proposed flag change here: https://github.com/facebook/osquery/pull/2830

I think we need more discussion before this change can be included in a release. The current documentation suggests the --host_identifier options apply to the way you identify a host. They are hostname and host UUID. Additionally, the project has always communicated that hostname and host UUID are the methods to identify a host running osquery. The proposed change is altering the meaning of the flag when set to =uuid to become "instance" identifier, where instance applies to the running osquery process.

If we are going to change the meaning of =uuid we need a few things:

  • Documentation of the new meaning, and option, within the wiki.
  • A buffer of time between merge to communicate to infrastructures which had assumed =uuid means host UUID.

There another's "add on" around the UUID management I'd like to append. We should stop "caching" the host UUID in the persistent storage. This only makes sense if the UUID was accessibly at one time, but in the future is not. That's a weird case that we should not try to solve.

Alternatively, we can introduce a new option for --host_identifier that adopts the UUID-per-instance identification method; something like: instance or ephemeral? If we did not have to change the meaning of =uuid then a quick wiki update would make this super easy!

If we are going to change the meaning of =uuid we need a few things:

  • Documentation of the new meaning, and option, within the wiki.

That's cool, I can take a hack at drafting up some prose for the wiki about how this all works, in detail (what guarantees we make, etc.). Should I add that directly to #2830?

  • A buffer of time between merge to communicate to infrastructures which had assumed =uuid means host UUID.

That makes sense, for sure.

There another's "add on" around the UUID management I'd like to append. We should stop "caching" the host UUID in the persistent storage. This only makes sense if the UUID was accessibly at one time, but in the future is not. That's a weird case that we should not try to solve.

For vocabulary, I assume that "host UUID" is the per-device hardware identifier that is CURRENTLY being used as the host_identifier value for uuid. I think not caching it is fine if we have sufficient trust in the fact that it's not going to change. I'm not sure if the performance of retrieving it from RocksDB greatly outweighs the performance of querying it directly all the time, given that it is accessed rather often.

I do think, however, that the "new" UUID should be cached in local storage. Obviously a truly unique identifier must be generated and perhaps the specific guarantee we make is that a UUID is per-instance, for the life of the instance, unless you delete or re-create the database?

Alternatively, we can introduce a new option for --host_identifier that adopts the UUID-per-instance identification method; something like: instance or ephemeral? If we did not have to change the meaning of =uuid then a quick wiki update would make this super easy!

Right now my PR adds a single option:

  • hostname: stays the same, obviously
  • hardware_id: the current UUID: the hardware specific identifier
  • uuid: each "instance" of osqueryd (as defined by some criteria) has it's own universally unique identifier

I could see adding another one:

  • ephemeral: similar to uuid, but the UUID is never cached and thus every execution of the process presents a different id.

I could also see a "read the uuid from a file on the filesystem" host_identifier. Should the host identifier feature be a registry-powered plugable feature? I could see that being pretty cool.

What do you think about the following for host_identifier:

  • hostname (or name) uses the machine or network assigned hostname, fetched at process start.
  • hostuuid (or uuid) uses the platform (DMTF) host UUID, fetched at process start.
  • instance uses an instance-unique UUID generated at process start, persisted in the backing store.
  • ephemeral uses an instance-unique UUID generated at process start, not persisted.

This allows us to support almost all configurations and is 100% backward compatible, no API change, only enhancement.

The plugin concept is intriguing, but there's not yet enough reason to use that hammer. I recommend waiting until someone has a real-life need for a custom identifier, via extension or something requiring significant code.

@theopolis that works for me. I would opt to keep hostname and uuid the same instead of name or hostuuid. I'm not 100% sure that we're using the term "UUID" correctly with this application, but as long as we document what each option does, I think it should be sufficient.

Agree, we could hide the legacy "name" and "uuid" (or just hide "uuid") but detect and handle the value appropriately. They'd be hidden legacy supported options essentially.

So you're saying change hostname to name with hostname being a silently supported legacy option? And leaving uuid the same?

Ah, no, just "uuid" I suppose.

I would then say:

  • hostname uses the machine or network assigned hostname, fetched at process start.
  • uuid uses the platform (DMTF) host UUID, fetched at process start.
  • instance uses an instance-unique UUID generated at process start, persisted in the backing store.
  • ephemeral uses an instance-unique UUID generated at process start, not persisted.

And opt to ditch adding name and hostuuid to the lexicon.

This sounds like a good scheme to me.

Was this page helpful?
0 / 5 - 0 ratings