Hi,
We use vector 0.9 and during machine re-provisioning sometimes we need to restart vector to load the new configuration. Sometimes process panic during shutdown:
-- Logs begin at Thu 2020-05-07 05:44:49 UTC. --
May 11 10:06:46 vector[2718]: May 11 10:06:46.440 INFO vector::topology: Removing transform "haproxy_add_fields"
May 11 10:06:46 vector[2718]: May 11 10:06:46.440 INFO vector::topology: Removing transform "ec2_metadata_haproxy"
May 11 10:06:46 vector[2718]: May 11 10:06:46.440 INFO vector::topology: Removing transform "haproxy_socket_message"
May 11 10:06:46 vector[2718]: May 11 10:06:46.440 INFO vector::topology: Removing transform "haproxy_blacklist_message"
May 11 10:06:46 vector[2718]: May 11 10:06:46.440 INFO vector::topology: Removing transform "haproxy_available_message"
May 11 10:06:46 vector[2718]: May 11 10:06:46.440 INFO vector::topology: Removing transform "haproxy_nobackend_message"
May 11 10:06:46 vector[2718]: thread 'main' panicked at 'no entry found for key', /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/collections/hash/map.rs:1025:9
May 11 10:06:46 vector[2718]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
May 11 10:06:46 systemd[1]: vector.service: Main process exited, code=exited, status=101/n/a
May 11 10:06:46 systemd[1]: vector.service: Failed with result 'exit-code'.
We got the same message on multiple nodes.
@ktff would you mind taking a look here? Thanks.
Hi @kaarolch, could you share with us your config file so we can reproduce the issue. We only need the topology information [sources.*], [transforms.*], [sinks.*], and their type=* and inputs=*, so if you have some sensitive information in the config, feel free to delete it. It would really help.
Sure:
data_dir = "{{ vector.datadir }}"
[sources.internal_metrics]
type = "internal_metrics"
#
# Add EC2 metadata into Nginx access logs
#
[transforms.ec2_metadata_internal_metrics]
type = "aws_ec2_metadata" # required
inputs = ["internal_metrics"] # required
[transforms.tags_internal_metrics]
# General
type = "add_tags" # required
inputs = ["ec2_metadata_internal_metrics"] # required
#
# Nginx
#
[sources.syslog_nginx_access]
# General
type = "syslog" # required
[sources.syslog_nginx_error]
# General
type = "syslog" # required
#
# Haproxy
#
[sources.syslog_haproxy_traffic]
# General
type = "syslog" # required
[sources.syslog_haproxy_internal]
# General
type = "syslog" # required
[sources.syslog_haproxy_healthchecks]
# General
type = "syslog" # required
#
# Nginx
#
[transforms.nginx_json_parse]
type = "json_parser" # required
inputs = ["syslog_nginx_access"] # required
[transforms.nginx_json_parse_message]
type = "json_parser" # required
inputs = ["nginx_json_parse"] # required
#
# Add EC2 metadata into Nginx access logs
#
[transforms.ec2_metadata_nginx_access]
type = "aws_ec2_metadata" # required
inputs = ["nginx_json_parse_message"] # required
[transforms.nginx_error_json_parse]
type = "json_parser" # required
inputs = ["syslog_nginx_error"] # required
[transforms.nginx_error_json_parse_message]
type = "json_parser" # required
inputs = ["nginx_error_json_parse"] # required
[transforms.ec2_metadata_nginx_error]
type = "aws_ec2_metadata" # required
inputs = ["nginx_error_json_parse_message"] # required
[transforms.nginx_error_add_fields]
# General
type = "add_fields" # required
inputs = ["ec2_metadata_nginx_error"] # required
# Nginx access split status string codes if any
[transforms.nginx_access_split_statuss_1]
type = "split" # required
inputs = ["ec2_metadata_nginx_access"] # required
[transforms.nginx_access_split_status]
type = "split" # required
inputs = ["nginx_access_split_statuss_1"] # required
# Nginx access addd fields
[transforms.nginx_add_fields]
# General
type = "add_fields" # required
inputs = ["nginx_access_split_status"] # required
[transforms.nginx_rename_fields]
# General
type = "rename_fields" # required
inputs = ["nginx_add_fields"] # required
#
# Haproxy
#
#
# Parse traffic logs
#
[transforms.haproxy_traffic_parse_message]
type = "json_parser" # required
inputs = ["syslog_haproxy_traffic"] # required
[transforms.haproxy_traffic_parse]
type = "json_parser" # required
inputs = ["haproxy_traffic_parse_message"] # required
#
# Parse healthchecks logs
#
[transforms.haproxy_healthchecks_parse_routing]
type = "swimlanes"
inputs = ["syslog_haproxy_healthchecks"]
[transforms.haproxy_healthchecks_parse_routing.lanes.backend_status]
type="check_fields"
"message.starts_with" = "Server"
[transforms.haproxy_healthchecks_parse_routing.lanes.health_checks]
type="check_fields"
"message.starts_with" = "Health"
[transforms.haproxy_healthchecks_parse_routing.lanes.no_backend]
type="check_fields"
"message.ends_with" = "available!"
[transforms.haproxy_healthchecks_parse_message_1]
type = "regex_parser"
inputs = ["haproxy_healthchecks_parse_routing.backend_status"]
[transforms.haproxy_healthchecks_parse_message_2]
type = "regex_parser"
inputs = ["haproxy_healthchecks_parse_routing.health_checks"]
[transforms.haproxy_healthchecks_parse_message_3]
type = "regex_parser" # required
inputs = ["haproxy_healthchecks_parse_message_1", "haproxy_healthchecks_parse_message_2"] # required
[transforms.haproxy_healthchecks_parse_message_4]
type = "regex_parser"
inputs = ["haproxy_healthchecks_parse_routing.no_backend"]
[transforms.haproxy_healthchecks_route_serverity]
type = "swimlanes"
inputs = ["haproxy_healthchecks_parse_message_3"]
[transforms.haproxy_healthchecks_route_serverity.lanes.ok]
type="check_fields"
"status.regex" = "succeeded|UP"
[transforms.haproxy_healthchecks_route_serverity.lanes.warning]
type="check_fields"
"status.regex" = "failed|DOWN"
[transforms.haproxy_healthchecks_modify_serverity_1]
type = "add_fields"
inputs = ["haproxy_healthchecks_route_serverity.warning"]
[transforms.haproxy_healthchecks_modify_serverity_2]
type = "add_fields"
inputs = ["haproxy_healthchecks_parse_message_4"]
[transforms.haproxy_healthchecks_parse]
type = "json_parser" # required
inputs = ["haproxy_healthchecks_modify_serverity_1","haproxy_healthchecks_modify_serverity_2","haproxy_healthchecks_route_serverity.ok"] # required
#
# Parse internal logs
#
[transforms.haproxy_internal_parse_routing]
type = "swimlanes"
inputs = ["syslog_haproxy_internal"]
[transforms.haproxy_internal_parse_routing.lanes.parsing_problem]
type="check_fields"
"message.starts_with" = "parsing"
[transforms.haproxy_internal_parse_routing.lanes.stopping]
type="check_fields"
"message.starts_with" = "Stopping"
[transforms.haproxy_internal_parse_routing.lanes.proxy_stop]
type="check_fields"
"message.starts_with" = "Proxy"
[transforms.haproxy_internal_parse_routing.lanes.missing_backend]
type="check_fields"
"message.contains" = "use_backend"
[transforms.haproxy_internal_parse_message_1]
type = "regex_parser"
inputs = ["haproxy_internal_parse_routing.parsing_problem"]
[transforms.haproxy_internal_parse_message_2]
type = "regex_parser"
inputs = ["haproxy_internal_parse_routing.stopping"]
[transforms.haproxy_internal_parse_message_3]
type = "regex_parser"
inputs = ["haproxy_internal_parse_routing.proxy_stop"]
[transforms.haproxy_internal_parse_message_4]
type = "regex_parser"
inputs = ["haproxy_internal_parse_routing.missing_backend"]
[transforms.haproxy_internal_modify_serverity]
type = "add_fields"
inputs = ["haproxy_internal_parse_message_1","haproxy_internal_parse_message_4"]
[transforms.haproxy_internal_parse]
type = "json_parser" # required
inputs = ["haproxy_internal_parse_message_2","haproxy_internal_parse_message_3","haproxy_internal_modify_serverity"] # required
#
# Add EC2 metadata into Haproxy traffic logs
#
[transforms.haproxy_traffic_ec2_metadata]
type = "aws_ec2_metadata" # required
inputs = ["haproxy_traffic_parse"] # required
#
# Add EC2 metadata into Haproxy healthchecks logs
#
[transforms.haproxy_healthchecks_ec2_metadata]
type = "aws_ec2_metadata" # required
inputs = ["haproxy_healthchecks_parse"] # required
#
# Add EC2 metadata into Haproxy internal logs
#
[transforms.haproxy_internal_ec2_metadata]
type = "aws_ec2_metadata" # required
inputs = ["haproxy_internal_parse"] # required
#
# Split fields for traffic logs
#
[transforms.haproxy_traffic_split_fields]
type = "split" # required
inputs = ["haproxy_traffic_ec2_metadata"] # required
#
# Split & parse healthcheck fields
#
[transforms.haproxy_healthchecks_split_fields]
type = "split" # required
inputs = ["haproxy_healthchecks_ec2_metadata"]
#
# Split fields for internal haproxy logs
#
[transforms.haproxy_internal_split_fields]
type = "split" # required
inputs = ["haproxy_internal_ec2_metadata"] # required
#
# Convert time to seconds
#
[transforms.haproxy_traffic_time_conversion]
type = "lua" # required
inputs = ["haproxy_traffic_split_fields"] # required
version = "2" # required
[transforms.haproxy_healthchecks_time_conversion]
type = "lua" # required
inputs = ["haproxy_healthchecks_split_fields"] # required
version = "2" # required
[transforms.haproxy_internal_time_conversion]
type = "lua" # required
inputs = ["haproxy_internal_split_fields"] # required
version = "2" # required
#
# Add required fields
#
[transforms.haproxy_traffic_add_fields]
# General
type = "add_fields" # required
inputs = ["haproxy_traffic_time_conversion"] # required
[transforms.haproxy_healthchecks_add_fields]
# General
type = "add_fields" # required
inputs = ["haproxy_healthchecks_time_conversion"] # required
[transforms.haproxy_internal_add_fields]
# General
type = "add_fields" # required
inputs = ["haproxy_internal_time_conversion"] # required
#
# Rename fields
#
[transforms.haproxy_rename_fields]
# General
type = "rename_fields" # required
inputs = ["haproxy_internal_add_fields", "haproxy_healthchecks_add_fields"] # required_names
[transforms.haproxy_traffic_rename_fields]
# General
type = "rename_fields" # required
inputs = ["haproxy_traffic_add_fields"] # required
[sinks.s3_archives_nginx]
inputs = ["syslog_nginx_access","syslog_nginx_error"] # don't sample
type = "aws_s3"
[sinks.s3_archives_haproxy]
inputs = ["syslog_haproxy_traffic","syslog_haproxy_healthchecks","syslog_haproxy_internal"] # don't sample
type = "aws_s3"
[sinks.datadog_logs_nginx]
# General
type = "datadog_logs" # required
inputs = ["nginx_rename_fields","nginx_error_add_fields"] # required
[sinks.datadog_logs_haproxy]
# General
type = "datadog_logs" # required
inputs = ["haproxy_rename_fields","haproxy_traffic_rename_fields"] # required
Let me know if you need to know something more.
Well, this was a head spinner. Here is a minimal config for reproduction:
[sources.input]
type = "stdin"
[transforms.trans1]
type = "json_parser"
inputs = ["input"]
# Delete this
drop_field = false
# Delete this
[transforms.trans2]
type = "json_parser"
inputs = ["trans1"]
[sinks.output1]
inputs = ["trans1"]
type = "blackhole"
print_amount = 1000
# Delete this
[sinks.output2]
inputs = ["trans2"]
type = "blackhole"
print_amount = 1000
Steps to reproduce:
-w watch feature.It happens during output wiring of a changed transform to inputs of other transform. In it, if we find a transform in the old config with the input of the changed transform, we assume it's present. Which this example shows is not true.
Before v0.9.1 this was only caused with (transform,transform) pair, but with v0.9.1 (#2449) this also became problem with (transform,sink) pair so the following config would also panic:
[sources.input]
type = "stdin"
[transforms.trans]
type = "json_parser"
inputs = ["input"]
# Delete this
drop_field = false
# Delete this
[sinks.output2]
inputs = ["trans"]
type = "blackhole"
print_amount = 1000
[sinks.output1]
inputs = ["trans"]
type = "blackhole"
print_amount = 1000
There are two ways to fix it:
I am for the simpler version + documenting that running topology doesn't correspond to the config during this period.
I am also in favor of the simple solution for right now. Ideally, we can address this case, add a test to make sure it doesn't happen again , and then think about more "correct" management of this complexity in the future.
Also, awesome job tracking this down @ktff! Topology changes can be very difficult to wrap your head around :smile:
Yea, I've putted it to strongly. It isn't more correct, just plain more complex and not needed at the moment.