Currently metrics are exposed, by default, on port 8888 to be scraped by prometheus. We should just run internal metrics through our own pipeline via a diagnostic source type, allowing the user to connect it to the output of their choice.
Just noting, this will be closed once we fully define the remaining events within Vector and implement them. The remaining work is being defined in #3192.
I've changed my mind. Let's close this and represent the remaining work with other issues.
That's ok, but this issue is still referenced in https://vector.dev/docs/administration/monitoring/#metrics and also
https://github.com/timberio/vector/issues/1538 mentions
[in-progress] Vector Observability (milestone #24)
but that milestone is 404.
sooo? how should i check if vector alive or not :) ?
@freeseacher, we're actively working on exposing internal metrics. This will likely be surfaced in two ways:
/health endpoint, to determine a Vector instance is alive.Details/ETA TBA, but it's a priority and something we're working on now. We'll update issues with the relevant details as it happens.
but the issue remain to be closed yep ?
This issue is specific to running internal metrics through the Vector pipeline, which is limited in scope vs. the overarching goal of enabling fuller observability of a Vector instance.
https://github.com/timberio/vector/issues/3225 and https://github.com/timberio/vector/issues/3211 tackle it more comprehensively; I'd recommend subscribing to those issues to get better visibility into how this effort is coming along.
I think this issue is somewhat different than the work represented in #3225 and #3211 . In my mind, it'd still be useful to pipe vector's internal metrics to arbitrary sinks so that people can publish them to Datadog, prometheus, statsd, etc..
@binarylogic do we have this work represented somewhere else? I couldn't find it. I'd advocate for reopening this issue otherwise.
AFAIK, we currently have an undocumented internal_metrics source that solves this ticket.
It's undocumented because there was (and still is) quite a lot of churn w.r.t. internal events/metrics. @binarylogic created an issue ~2 weeks ago to make sure we document this, and it's currently planned to be tackled within our current sprint (next two weeks).
Note that this is basically _a third_ way to expose internal events/metrics, in addition to the ones @leebenson mentioned we're currently working on.
In essence, the plumbing exists for us to emit metrics, and an undocumented source exists to expose those metrics, now we're working on 1) documenting that source, and 2) adding more ways to expose those metrics (vector top and a web UI).
Here's an example config:
(generated using vector generate stdin,internal_metrics/json_parser/blackhole,console with some manual changes)
data_dir = "/var/lib/vector/"
[sources.source0]
max_length = 102400
type = "stdin"
[sources.source1]
type = "internal_metrics"
[transforms.transform0]
inputs = ["source0"]
drop_field = true
drop_invalid = false
type = "json_parser"
[sinks.sink0]
healthcheck = true
inputs = ["transform0"]
type = "blackhole"
print_amount = 1000
[sinks.sink0.buffer]
type = "memory"
max_events = 500
when_full = "block"
[sinks.sink1]
healthcheck = true
inputs = ["source1"]
type = "console"
encoding.codec = "json"
[sinks.sink1.buffer]
type = "memory"
max_events = 500
when_full = "block"
vector --config vector.toml
Sep 15 15:45:34.456 INFO vector: Log level "info" is enabled.
Sep 15 15:45:34.458 INFO vector: Loading configs. path=["vector.toml"]
Sep 15 15:45:34.460 INFO vector: Vector is starting. version="0.10.0" git_version="v0.9.0-377-g0f0311a" released="Wed, 22 Jul 2020 19:34:29 +0000" arch="x86_64"
Sep 15 15:45:34.460 INFO vector::topology: Running healthchecks.
Sep 15 15:45:34.460 INFO vector::sources::stdin: Capturing STDIN.
Sep 15 15:45:34.460 INFO vector::topology: Starting source "source1"
Sep 15 15:45:34.460 INFO vector::topology::builder: Healthcheck: Passed.
Sep 15 15:45:34.460 INFO vector::topology::builder: Healthcheck: Passed.
Sep 15 15:45:34.460 INFO vector::topology: Starting source "source0"
Sep 15 15:45:34.460 INFO vector::topology: Starting transform "transform0"
Sep 15 15:45:34.460 INFO vector::topology: Starting sink "sink1"
Sep 15 15:45:34.460 INFO vector::topology: Starting sink "sink0"
hello
Sep 15 15:45:37.273 WARN transform{name=transform0 type=json_parser}: vector::internal_events::json: Event failed to parse as JSON field=message self.error=expected value at line 1 column 1 rate_limit_secs=30
outputs these events (to the console sink in this case):
{"name":"events_processed","timestamp":"2020-09-15T13:45:38.461847Z","tags":{"component_kind":"transform","component_type":"json_parser"},"kind":"absolute","counter":{"value":1.0}}
{"name":"bytes_processed","timestamp":"2020-09-15T13:45:38.461881Z","tags":{"component_kind":"sink","component_type":"blackhole"},"kind":"absolute","counter":{"value":5.0}}
{"name":"events_processed","timestamp":"2020-09-15T13:45:38.461884Z","tags":{"component_kind":"sink","component_type":"blackhole"},"kind":"absolute","counter":{"value":1.0}}
{"name":"processing_error","timestamp":"2020-09-15T13:45:38.461886Z","tags":{"component_kind":"transform","component_type":"json_parser","error_type":"failed_parse"},"kind":"absolute","counter":{"value":1.0}}
Linking to other issues that took the place of this one:
Most helpful comment
I think this issue is somewhat different than the work represented in #3225 and #3211 . In my mind, it'd still be useful to pipe vector's internal metrics to arbitrary sinks so that people can publish them to Datadog, prometheus, statsd, etc..
@binarylogic do we have this work represented somewhere else? I couldn't find it. I'd advocate for reopening this issue otherwise.