Vector: validate command creates files, which sometimes leads to permission isuses

Created on 16 Sep 2020  Â·  17Comments  Â·  Source: timberio/vector

Vector Version

vector 0.10.0 (g0f0311a x86_64-unknown-linux-gnu 2020-07-22)

Vector Configuration File

[sources.test_logs]
type = "file"
include = ["/var/log/*.log"]

Expected Behavior

We're running vector validate /tmp/vector-new.toml before installing the new configuration file. This is done through Ansible, which always runs these commands as root, as opposed to the vector user

We were expecting that this wouldn't be a problem.

Actual Behavior

When running vector validate, files in /var/lib/vector get messed with:

root@7351cdd950a9:/# vector validate
√ Loaded "/etc/vector/vector.toml"

Topology errors
---------------
x No sinks defined in the config.

√ Component configuration
root@7351cdd950a9:/# ls -l /var/lib/vector/
total 4
drwxr-xr-x 2 root root 4096 Sep 16 12:50 test_logs

This causes vector to not be able to start. So far I found this behaviour with the file and journald sources, but there might be others.

cli bug

All 17 comments

@anlutro This isn't a bug, by default validate command will check that the components can be started. For example file source won't be able to start if it doesn't have write permissions for /var/lib/vector directory.

If you want to validate as much as possible while not being in the runtime environment use --no-environment flag, that will disable starting of components so vector won't mess with the files.

Validate docs https://vector.dev/docs/administration/validating/

@ktff is there not a way we can test this without placing a test file? And if not, shouldn't we remove the test file immediately afterward?

@binarylogic it's up to the components to do what they need to function, and if we want to avoid writing those files then those components would need to be changed. For example, I'm expecting that the files being created are checkpoint files of file source.

shouldn't we remove the test file immediately afterward

Yes, we should. We can take a note of what files are present in data_dir, and if it's empty we can clean the directory afterwards. This would solve the issue for the given scenario.

And if it's not empty we should deny executing validate command with environment. Currently this case is more problematic since we can loose messages with validate command in some scenarios. I'll open a separate issue for this one after I think about/confirm it some more.

I see - to me it sounds like the components could have some sort of pre-start hook where they do some checks without doing any IO. For example, it's easy to check if a directory is writeable or not.

That being said I understand this is a lot of overhead to compensate for what I honestly think is a flaw in Ansible so perfectly fine if this is closed as wontfix.

this is a lot of overhead

@anlutro yep, that's how I also see it, but there are other things that we can do on Vector side. Although I'm also still unsure how to tackle this as all fixes that I see come with a cost. So it will take some time to figure it out.

There are a few things that can write to the filesystem:

  • file source
  • kubernetes_logs source, creates directory for file source
  • journald source
  • file sink
  • wasm transform
  • disk buffers (leveldb)

Solution

validate command should create new folder "validate_tmp" in data_dir directory and change data_dir path to "validate_tmp", and after completition the command should delete "validate_tmp".

This would solve the issue for all of the above except for wasm transform which uses different directory for artifact cache, so validate should also do the same process for artifact_cache, that means creating tmp directory in that one and so on.

As with wasm transform, the same procedure should be performed for other outliers.

@Hoverbear could the artifact cache of wasm transform be moved into data_dir?

Yes it would just require knowledge of where this directory is.

@Hoverbear few more questions. How long is an average compilation of wasm modules, and how much disk memory do they occupy?

For context, I'm trying to decide to either leave the tmp cache directory or delete it between validate commands.

Depends on the module complexity, in release under a minute.

That's to long, if it isn't a few seconds, then it's better to leave the cache so to speed up subsequent validates so to shorten update-validate-debug loop. One drawback is that if validate is used, we would en up with two copies of the cache.

@Hoverbear what do you think? Would this be an issue?

I could add an explicit validate feature that doesn't create a cache?

It seems reasonable that validate would skip the cache creation, and just check for existing, and make sure things are set up.

If we wouldn't miss any check while skipping cache creation, it's an option, although it sounds pretty intrusive.

I don't think it would be too bad, I think making this section toggleable via like a validate: bool option would be fine:

https://github.com/timberio/vector/blob/47e709bb8243355f60deca53dac92fd07ca2c694/src/wasm/mod.rs#L74-L86

You can add it to the config: https://github.com/timberio/vector/blob/47e709bb8243355f60deca53dac92fd07ca2c694/src/wasm/mod.rs#L52-L59

@Hoverbear But we would lose on the compile validation and directory access validation.

So, what about a middle ground. Create validate_wasm_cache in actual /tmp directory and check manually that we have adequate access to the original directory? We would cover all checks, the validation can reuse cache over multiple invocations, and the extra cache will be eventually deleted.

Okay!

The remaining issue is the scenario of validate + uncompatible users + wasm transform for which the fix isn't quite worth it, considering the lasting maintainance burden and that this will be resolved on it's own once wasm transform transitions to wasmtime.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

LucioFranco picture LucioFranco  Â·  3Comments

kaarolch picture kaarolch  Â·  3Comments

a-rodin picture a-rodin  Â·  3Comments

LucioFranco picture LucioFranco  Â·  3Comments

binarylogic picture binarylogic  Â·  4Comments