I think it's time we add instrumentation rules for Vector itself in the CONTRIBUTING.md file. We purposefully deferred this to see what kind of questions come up, but we're starting to fully instrument Vector and consistency will be important.
Additional rule: visibility of internal_events -- pub vs pub(crate)?
Also, as @fanatid mentioned in a PM, there is the question of using consts for common strings.
We need to make an official decision on naming before accepting PR #3264. Most of our "error" events are named without an Error suffix, but the name makes it obvious that it represents an error state (ElasticSearchMissingKeys, JsonFailedParse, KubernetesLogsEventAnnotationFailed, etc), but some have the suffix (LuaScriptError, UnixSocketError, etc). I think we should follow the former pattern (they force more descriptive names), but we need to decide either way.
The only reason I prefer the standardization is because we plan to document all of our errors, what they mean, etc. So I assume this will make it easier to parse these out of the code, but I'm sure there is a better way to go about this.
About #3264, Error in HTTPBadRequestError is redundant so HTTPBadRequest might be a better name.
I think we should follow the former pattern (they force more descriptive names), but we need to decide either way.
I agree and prefer the non-Error naming for the same reason. I also want to be careful of lumping everything into a generic "error" bucket. Some are clearly errors, while others are a little more context-dependent and could be expected.
I think we should also decide how to split responsibility between a component and a shared code that it uses.
For example, we have socket and vector sources that emit events_processed and delegate errors, that they can, to shared tcp code which then emits those errors as TcpConnectionError. And on the other hand #3264 makes the shared http code emit both events_processed and errors under HTTPBadRequest event.
I think it's fine delegating errors that we can to their shared code for it to emit, but I'm not so sure for events_processed. If metrics crate could collect span information, it could be fine, but otherwise we would lose information of it's real source, and by extension all events that pass through sources, backed by such shared code, would get aggregated into a single counter.
With #2660 this isn't so important.
Most helpful comment
We need to make an official decision on naming before accepting PR #3264. Most of our "error" events are named without an
Errorsuffix, but the name makes it obvious that it represents an error state (ElasticSearchMissingKeys,JsonFailedParse,KubernetesLogsEventAnnotationFailed, etc), but some have the suffix (LuaScriptError,UnixSocketError, etc). I think we should follow the former pattern (they force more descriptive names), but we need to decide either way.