Regardless of what I set batch.max_bytes, Vector writes the default 10 MB sized file to the GCP Cloud Storage (GCS) bucket. Currently I have batch.max_bytes = 104857600, which should be 100 MB. But, when I send a 45 MB file to Vector via the syslog source, 5 files are written to GCS bucket: four 10 MB files and one 5 MB file. What's more confusing is that batch.timeout_secs is honored by Vector.
My vector.toml is below:
[sources.my_source_id]
type = "syslog" # required
address = "0.0.0.0:514" # required, required when mode = "tcp" or mode = "udp"
mode = "tcp" # required
[sinks.gcs_id]
type = "gcp_cloud_storage" # required
inputs = ["my_source_id"] # required
bucket = "my-bucket" # required
compression = "none" # optional, default
credentials_path = "/full/path/to/creds.json" # optional, no default
healthcheck = true # optional, default
encoding.codec = "text" # required
key_prefix = "my-bucket/date=%F/" # optional, default
batch.max_bytes = 104857600 # batch
batch.timeout_secs = 60 # max age of batch before flush
Thanks for reporting @gteshome, we'll take a look and see what's going on.
Thanks for the quick action @binarylogic. I'd be glad to be available for any testing that is needed.
We've made some significant changes in the batch buffering part of the system quite recently. So we're clear, what version of Vector are you trying this with?
We're using v0.9.2.
Make sure you upgrade to v0.10.0, which we just released last week. It includes a large change to this area that should resolve the issue for you.
I should be able to that today. I'll post again with details.
Thanks, upgrading to v0.10.0 fixed the issue. I'll close the issue.
Most helpful comment
Make sure you upgrade to v0.10.0, which we just released last week. It includes a large change to this area that should resolve the issue for you.