Using metricbeat's system.socket module, I would like to see the state of the connection. Example would be whether its ESTABLISHED or TIME_WAIT. Ideally we could graph the box, and see the state of all connections. High TIME_WAIT can indicate an issue.
A second part of this would be to include the keepalive timers for individual TCP connections (netstat -not).
To clarify a bit more, the breakdown with how SNMP does this is handy. It aggregates the counts, which would reduce the amount of data in ES. Its good to have both the aggregation (count) as well as the per connection stats.
I've been reading the code:
From what I see we can get both state, keep-alive retransmits & timer.
Would that be enough?
I've added some code so we can see (or test) what we are talking about: https://github.com/elastic/beats/pull/6663
@exekias sounds reasonable yeah. The other one would be a rollup or aggregated connection counts (point in time).
I wrote some code addressing this: https://github.com/elastic/beats/issues/4474
It's outputs aggregated counts for different socket types such as total connections, connections in established state, connections in listening state. (It's not PR ready yet :sweat_smile: )
Currently I have it as a module, but I feel it should go inside system.socket.
@exekias What are your views on it?
@agathver Sounds promising, could you share a link to your branch?
@robgil We normally create 2 metricsets if one is the summary of the other. Like this you could enable socket_summary and would only have the summary data or if you enable socket you get all of it and could do the aggregation on the Elasticsearch side. Would that work for you?
To make the call if it should be 1 or 2 metricsets I think I need to see the branch from @agathver ;-)
@ruflin Here it is: https://github.com/agathver/beats/tree/module-ss/metricbeat/module/ss/ss
@ruflin yeah, the separate metricset seems fine to me. I imagine that for general monitoring, we'll probably just run with the summarized stats, and when we want _debug_, we'll turn up the full socket metricset.
@agathver Thanks for sharing. I wonder if where these values should go. If we follow the process model it would be under system.socket.summary.*. An alternative would be to have it under system.socket.tcp.* as we don't use these namespaces yet.
For the name of the metricset I would follow the convention to call it socket_summary.
@agathver Do you want to open a PR for the summary metricset?
@ruflin, I'll open a PR later, may be in the next week. Currently busy with some academic stuff.
Hi,
Do we have any option to get total number of opened tcp connections between client and server using metric beat?
I tried using properties:
system.socket.remote.ip
system.socket.remote.port
system.socket.local.ip
system.socket.local.port
Not able to use the above to get what i intended?
Please guide
Do we know if there is any plan to address this? Having those metrics are useful for systems that do a lot of TCP traffic.
I would like to double down on this.
TCP state and keepalive timers per socket connection are very helpful to understand what is the network state in transport layer, but also very indicative if something bad (e.g packet loss) is happening in lower layers (IP).
It would be a very handy tool for every SRE.
cc: @ruflin @pmoust