Vector: Improve error type handling

Created on 26 Jun 2019  Â·  21Comments  Â·  Source: timberio/vector

Many of Vector's methods use a Result<_, String> return type. This is adequate for textual errors, but it makes it impossible for users to pattern match on the error type as well as dropping any cause information encapsulated by some error types. We should switch to proper error types (similar to std::io::Error) probably aided by the failure crate.

References:

tech debt

Most helpful comment

I would say we should take a quick inventory of components still using String errors and either add a checklist to this issue or break them out into individual issues so we can track overall progress (probably just a checklist here).

All 21 comments

We most likely want to use Box<dyn std::error::Error + Send + Sync + 'static> as the error type throughout the whole crate. This would allow us to downcast_ref. Leaf level errors would provide an enum and implement std::error::Error.

Would using a Box for every error return be too expensive?

I suggested the failure crate as it helps wrap many of the conversion issues and provides custom derives for error (Fail) types, as well as catch the back trace. This may be overkill, but it's a light library and we're already pulling it through one of the dependencies.

FTR I'm not stuck on using failure, it would just be good to avoid rolling our own solution given how many other quality solutions exist to either derive complete error types or handle error return conversions and encapsulation.

It shouldn't be the box should only happen at the leaf level and get passed up the tree. Generally, you're able to use bounds like SinkError: Into<Box<std::Error...> where when this is already a box the into is the identity transform and thus does nothing. I believe failure is actually outdated at this point. Pretty much all of rusts error handling will move to this box style in the future. With leaf level errors being an enum that impls std::error::Error.

This is definitely something we've punted on for a little too long. That said, the error handling situation is a little in flux right now (see here and here) and I'm not sure there's an obviously correct direction for us to take.

My understanding is that failure has been "downgraded" to something of an experiment, so I'm not sure we'd want to go that route. If we do bring in a crate, it seems like snafu is probably the best maintained. I think the main benefits of an external crate seem to be convenience of defining custom error types and the ability to easily attach backtraces and context.

As a first step we should probably pick a component or two currently using String errors and work through converting them to real error types as simply as possible. We use errors differently in different components (e.g. collected and presented to the user during topology building, quickly caught and logged with in each task for source/transforms/sinks, etc), so getting a representative sample would be good. Then we can try a crate or two to see what feels best for our needs. I imagine we won't need much more than a convenient way to define our types and implement std::error::Error.

So now, one error return has been converted to an actual error type. Where should this issue go from here?

I would say we should take a quick inventory of components still using String errors and either add a checklist to this issue or break them out into individual issues so we can track overall progress (probably just a checklist here).

A quick grep shows at least the following modules:

  • [ ] src/buffers/disk.rs
  • [ ] src/region.rs
  • [ ] src/sinks/aws_cloudwatch_logs/mod.rs
  • [ ] src/sinks/aws_cloudwatch_metrics.rs
  • [ ] src/sinks/aws_kinesis_streams.rs
  • [ ] src/sinks/aws_s3.rs
  • [ ] src/sinks/blackhole.rs
  • [ ] src/sinks/clickhouse.rs
  • [ ] src/sinks/console.rs
  • [ ] src/sinks/elasticsearch.rs
  • [ ] src/sinks/http.rs
  • [ ] src/sinks/kafka.rs
  • [ ] src/sinks/mod.rs
  • [ ] src/sinks/prometheus.rs
  • [ ] src/sinks/splunk_hec.rs
  • [ ] src/sinks/tcp.rs
  • [ ] src/sinks/util/http.rs
  • [ ] src/sinks/vector.rs
  • [ ] src/sources/file.rs
  • [ ] src/sources/statsd/mod.rs
  • [ ] src/sources/statsd/parser.rs
  • [ ] src/sources/stdin.rs
  • [ ] src/sources/syslog.rs
  • [ ] src/sources/tcp.rs
  • [ ] src/sources/udp.rs
  • [ ] src/sources/util/tcp.rs
  • [ ] src/sources/vector.rs
  • [ ] src/topology/config/mod.rs
  • [ ] src/transforms/add_fields.rs
  • [ ] src/transforms/coercer.rs
  • [ ] src/transforms/field_filter.rs
  • [ ] src/transforms/grok_parser.rs
  • [ ] src/transforms/json_parser.rs
  • [ ] src/transforms/log_to_metric.rs
  • [ ] src/transforms/lua.rs
  • [ ] src/transforms/regex_parser.rs
  • [ ] src/transforms/remove_fields.rs
  • [ ] src/transforms/sampler.rs
  • [ ] src/transforms/tokenizer.rs
  • [ ] tests/crash.rs

Do we want this as separate PR for each refactoring (ex all the transforms have to be done at once) or one big PR when complete?

@bruceg one commit per sink/source/transform works and then bunch them into their own PR.

I've done all the (for example) transforms as one commit. Are you saying you would rather break them into separate commits per file (which won't actually compile until the last commit) or am I misunderstanding?

@bruceg honestly, whatever is easier for you, I'm happy to review it either way.

Yeah, please feel free to break it up however is easiest for you.

AFAICT there is one last module that uses String error types, and that is the configuration parsing framework.

Do you mean just here? That one is so small and isolated I don't think it's worth the effort to do anything fancier.

No, I mean the whole stack starting (roughly) here. There are a lot of routines that return Result<..., Vec<String>> that I didn't catch before.

I see, those evaded my grep as well. I wouldn't say they're high priority, and they might be a little different since we're collecting multiple errors. I'd say we can take a quick stab at them, but be ready to punt if it becomes much of a burden. There's some value to be had there, but not a ton.

We can also change the signature to Result<..., Vec<crate::Error>>.

@bruceg just checking in on this. Do you mind letting us know what's left? So we can document it here. If nothing is left, feel free to close.

The whole configuration parsing stack still uses String error returns, mostly wrapped into Vec<String>. I have started working on this but pushed it aside to work on higher priority issues (TLS mostly). It was turning out to be less of a straightforward conversion that the sinks and sources were.

@lukesteensen @bruceg would it make sense to introduce a custom ConfigError(Vec<Box<dyn Error>>) struct which impl's Error and Display and allows pushing more errors onto its list?

That way, we can still have all config-related functions return the default boxed error trait object, but we can either a) downcast at the leaf to capture all individual errors, or b) have the Display impl print the individual errors however we want them to be represented.

I think the preferred path would be to introduce a enum ConfigError or equivalent and turn all those returns into Result<…, Vec<ConfigError>>. At some layer, they may have to be boxed, but at the point they are emitted they can be enumerated and structured. Wrapping the error vector in a newtype as you describe would be a useful additional abstraction, but since they are all config errors, they can stay enumerated there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

LucioFranco picture LucioFranco  Â·  3Comments

jhgg picture jhgg  Â·  4Comments

LucioFranco picture LucioFranco  Â·  3Comments

binarylogic picture binarylogic  Â·  3Comments

LucioFranco picture LucioFranco  Â·  3Comments