The 0.16.0 release is going to create a large amount of metric name breakage. Let's provide a sample metric_relabel_config so users have the option to duplicate legacy names to allow for smoother dashboard transitions.
That would be highly helpful indeed 👍
Update: In case there's already work done in this direction, having a pointer to a branch could be useful 👍
ps: I just stumbled on this issue, after I found out that (metric names must have changed since) our existing node exporter dashboards just showed empty charts.
old_name,new_name, so users can easily update their existing configs containing metric names (dashboard jsons, incl. the public ones, alert rules configs, etc).A final suggestion:
I guess I was assigned to this issue as I need to do the relabel config anyway for SoundCloud. However, I'm not familiar with the changes, and it would indeed be a great help if those that changed the metrics could provide a simple old→new mapping as suggested by @lhoss above.
I can produce a diff of the end-to-end output, that should help.
does that include all the affected collectors?
On Mon, Feb 19, 2018, 17:39 Ben Kochie notifications@github.com wrote:
I can produce a diff of the end-to-end output, that should help.
—
You are receiving this because you were assigned.Reply to this email directly, view it on GitHub
https://github.com/prometheus/node_exporter/issues/830#issuecomment-366745715,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAICBiWUVqIIc3-GgSN64Xbr58GYVWKNks5tWaO4gaJpZM4SIo-0
.
@matthiasr I've sorted the list roughly by collector.
node_bcache_cache_read_races_total{uuid=""},node_bcache_cache_read_races{uuid=""}
node_boot_time,node_boot_time_seconds
node_buddyinfo_count{node=""},node_buddyinfo_blocks{node=""}
node_context_switches,node_context_switches_total
node_cpu{cpu=""},node_cpu_seconds_total{cpu=""}
node_disk_bytes_read{device=""},node_disk_read_bytes_total{device=""}
node_disk_bytes_written{device=""},node_disk_written_bytes_total{device=""}
node_disk_io_time_ms{device=""},node_disk_io_time_seconds_total{device=""}
node_disk_io_time_weighted{device=""},node_disk_io_time_weighted_seconds_total{device=""}
node_disk_reads_completed{device=""},node_disk_reads_completed_total{device=""}
node_disk_reads_merged{device=""},node_disk_reads_merged_total{device=""}
node_disk_read_time_ms{device=""},node_disk_read_time_seconds_total{device=""}
node_disk_writes_completed{device=""},node_disk_writes_completed_total{device=""}
node_disk_writes_merged{device=""},node_disk_writes_merged_total{device=""}
node_disk_write_time_ms{device=""},node_disk_write_time_seconds_total{device=""}
node_forks,node_forks_total
node_infiniband_port_data_received_bytes{device=""},node_infiniband_port_data_received_bytes_total{device=""}
node_infiniband_port_data_transmitted_bytes{device=""},node_infiniband_port_data_transmitted_bytes_total{device=""}
node_interrupts{CPU=""},node_interrupts_total{CPU=""}
node_intr,node_intr_total
node_memory_Active,node_memory_Active_bytes
node_memory_Active_anon,node_memory_Active_anon_bytes
node_memory_Active_file,node_memory_Active_file_bytes
node_memory_AnonHugePages,node_memory_AnonHugePages_bytes
node_memory_AnonPages,node_memory_AnonPages_bytes
node_memory_Bounce,node_memory_Bounce_bytes
node_memory_Buffers,node_memory_Buffers_bytes
node_memory_Cached,node_memory_Cached_bytes
node_memory_CommitLimit,node_memory_CommitLimit_bytes
node_memory_Committed_AS,node_memory_Committed_AS_bytes
node_memory_DirectMap2M,node_memory_DirectMap2M_bytes
node_memory_DirectMap4k,node_memory_DirectMap4k_bytes
node_memory_Dirty,node_memory_Dirty_bytes
node_memory_HardwareCorrupted,node_memory_HardwareCorrupted_bytes
node_memory_Hugepagesize,node_memory_Hugepagesize_bytes
node_memory_Inactive,node_memory_Inactive_bytes
node_memory_Inactive_anon,node_memory_Inactive_anon_bytes
node_memory_Inactive_file,node_memory_Inactive_file_bytes
node_memory_KernelStack,node_memory_KernelStack_bytes
node_memory_Mapped,node_memory_Mapped_bytes
node_memory_MemFree,node_memory_MemFree_bytes
node_memory_MemTotal,node_memory_MemTotal_bytes
node_memory_Mlocked,node_memory_Mlocked_bytes
node_memory_NFS_Unstable,node_memory_NFS_Unstable_bytes
node_memory_PageTables,node_memory_PageTables_bytes
node_memory_Shmem,node_memory_Shmem_bytes
node_memory_Slab,node_memory_Slab_bytes
node_memory_SReclaimable,node_memory_SReclaimable_bytes
node_memory_SUnreclaim,node_memory_SUnreclaim_bytes
node_memory_SwapCached,node_memory_SwapCached_bytes
node_memory_SwapFree,node_memory_SwapFree_bytes
node_memory_SwapTotal,node_memory_SwapTotal_bytes
node_memory_Unevictable,node_memory_Unevictable_bytes
node_memory_VmallocChunk,node_memory_VmallocChunk_bytes
node_memory_VmallocTotal,node_memory_VmallocTotal_bytes
node_memory_VmallocUsed,node_memory_VmallocUsed_bytes
node_memory_Writeback,node_memory_Writeback_bytes
node_memory_WritebackTmp,node_memory_WritebackTmp_bytes
node_network_receive_bytes{device=""},node_network_receive_bytes_total{device=""}
node_network_receive_compressed{device=""},node_network_receive_compressed_total{device=""}
node_network_receive_drop{device=""},node_network_receive_drop_total{device=""}
node_network_receive_errs{device=""},node_network_receive_errs_total{device=""}
node_network_receive_fifo{device=""},node_network_receive_fifo_total{device=""}
node_network_receive_frame{device=""},node_network_receive_frame_total{device=""}
node_network_receive_multicast{device=""},node_network_receive_multicast_total{device=""}
node_network_receive_packets{device=""},node_network_receive_packets_total{device=""}
node_network_transmit_bytes{device=""},node_network_transmit_bytes_total{device=""}
node_network_transmit_compressed{device=""},node_network_transmit_compressed_total{device=""}
node_network_transmit_drop{device=""},node_network_transmit_drop_total{device=""}
node_network_transmit_errs{device=""},node_network_transmit_errs_total{device=""}
node_network_transmit_fifo{device=""},node_network_transmit_fifo_total{device=""}
node_network_transmit_frame{device=""},node_network_transmit_frame_total{device=""}
node_network_transmit_multicast{device=""},node_network_transmit_multicast_total{device=""}
node_network_transmit_packets{device=""},node_network_transmit_packets_total{device=""}
node_nfs_net_connections{protocol=""},node_nfs_connections_total
node_nfs_net_reads{protocol=""},node_nfs_packets_total{protocol=""}
node_nfs_procedures{procedure=""},node_nfs_requests_total{method=""}
node_nfs_rpc_authentication_refreshes,node_nfs_rpc_authentication_refreshes_total
node_nfs_rpc_operations,node_nfs_rpcs_total
node_nfs_rpc_retransmissions,node_nfs_rpc_retransmissions_total
@SuperQ What's publish plan for release 0.16.0 or 0.15.3 ?
The current plan is to publish a release candidate today or tomorrow.
Foolishly, we didn't take into account that relabeling will only mutate metrics, rather than duplicate them. I cannot really see a straight forward way of creating a smooth transition right now. What you can do is to update node exporters but leave the metrics at their old name via metric relabeling. But eventually you have to migrate so that's not a big help. It just makes the "big bang" migration a bit more atomic (by removing the relabeling).
Non-straight-forward idea:
Needless to say, this is expensive and dirty. I think it makes more sense to go for another migration plan. Affected alerting rules are easy to find and can be adjusted to work with both versions. (In most cases, duplicating the alert with each metric name version will do the trick. Aggregation is a bit harder.) For dashboards, going for a big-bang migration might be the most reasonable option.
I'll close this as it doesn't appear to be the way to go. If you have better ideas, feel free to update here or discuss on any of the community channels.
Needless to say, this is expensive and dirty.
It'd also cause issues with up being duplicated.
I still think it might be useful to have metric_relabel_configs, as you can have your dashboards/alerts on the new names while you are still in the process of upgrading your node exporters.
I don't see how metric_relabel_configs help here. Our main requirement is to not break our dashboards for historical data. For that we need to be able to access metrics stored in the old and new format for at least the retention time / one month.
So far I see two main ways to achieve that:
OR operators at the right place. Given we have tens of dashboards and hundreds of expressions, I don't see how to easily achieve that.For the second approach, I see two ways:
The recording rules would be my preference. This is the approach I plan to use for GitLab.
Just for the record: node_filesystem_* to ..._bytes change isn't in the list above.
Just to add another annoying scenario. I'm in the process of dragging my org into the 21st century with hopes of moving away from graphite. We've got a blend of 2.6 kernels and 3.4. The 2.6 kernel boxen can't run this latest exporter due to go1.1. I figured I'd run this new exporter version on our 3.4 kernels and eventually drain out all of the 2.6 boxes, but having nodes straddle is pretty unworkable given that no dashboards will work for all nodes in the interim.
I'd be happy to be able to lean on a relabel config just to make the switch once we get our act together and can kill all these ancient kernels.
@grobie Do you have any time to add the --include-old-metric-names flag?
I'm trying to get the new names for these metrics recorded now so we can build up enough history to have a seamless transition.
This is basically just copied from the list above provided by @SuperQ
groups:
- name: node-exporter-16.rules
rules:
- record: node_bcache_cache_read_races
expr: node_bcache_cache_read_races_total
- record: node_boot_time_seconds
expr: node_boot_time
- record: node_buddyinfo_blocks
expr: node_buddyinfo_count
- record: node_context_switches_total
expr: node_context_switches
- record: node_cpu_seconds_total
expr: node_cpu
- record: node_disk_read_bytes_total
expr: node_disk_bytes_read
- record: node_disk_written_bytes_total
expr: node_disk_bytes_written
- record: node_disk_io_time_seconds_total
expr: node_disk_io_time_ms
- record: node_disk_io_time_weighted_seconds_total
expr: node_disk_io_time_weighted
- record: node_disk_reads_completed_total
expr: node_disk_reads_completed
- record: node_disk_reads_merged_total
expr: node_disk_reads_merged
- record: node_disk_read_time_seconds_total
expr: node_disk_read_time_ms
- record: node_disk_writes_completed_total
expr: node_disk_writes_completed
- record: node_disk_writes_merged_total
expr: node_disk_writes_merged
- record: node_disk_write_time_seconds_total
expr: node_disk_write_time_ms
- record: node_forks_total
expr: node_forks
- record: node_infiniband_port_data_received_bytes_total
expr: node_infiniband_port_data_received_bytes
- record: node_infiniband_port_data_transmitted_bytes_total
expr: node_infiniband_port_data_transmitted_bytes
- record: node_interrupts_total
expr: node_interrupts
- record: node_intr_total
expr: node_intr
- record: node_memory_Active_bytes
expr: node_memory_Active
- record: node_memory_Active_anon_bytes
expr: node_memory_Active_anon
- record: node_memory_Active_file_bytes
expr: node_memory_Active_file
- record: node_memory_AnonHugePages_bytes
expr: node_memory_AnonHugePages
- record: node_memory_AnonPages_bytes
expr: node_memory_AnonPages
- record: node_memory_Bounce_bytes
expr: node_memory_Bounce
- record: node_memory_Buffers_bytes
expr: node_memory_Buffers
- record: node_memory_Cached_bytes
expr: node_memory_Cached
- record: node_memory_CommitLimit_bytes
expr: node_memory_CommitLimit
- record: node_memory_Committed_AS_bytes
expr: node_memory_Committed_AS
- record: node_memory_DirectMap2M_bytes
expr: node_memory_DirectMap2M
- record: node_memory_DirectMap4k_bytes
expr: node_memory_DirectMap4k
- record: node_memory_Dirty_bytes
expr: node_memory_Dirty
- record: node_memory_HardwareCorrupted_bytes
expr: node_memory_HardwareCorrupted
- record: node_memory_Hugepagesize_bytes
expr: node_memory_Hugepagesize
- record: node_memory_Inactive_bytes
expr: node_memory_Inactive
- record: node_memory_Inactive_anon_bytes
expr: node_memory_Inactive_anon
- record: node_memory_Inactive_file_bytes
expr: node_memory_Inactive_file
- record: node_memory_KernelStack_bytes
expr: node_memory_KernelStack
- record: node_memory_Mapped_bytes
expr: node_memory_Mapped
- record: node_memory_MemFree_bytes
expr: node_memory_MemFree
- record: node_memory_MemTotal_bytes
expr: node_memory_MemTotal
- record: node_memory_Mlocked_bytes
expr: node_memory_Mlocked
- record: node_memory_NFS_Unstable_bytes
expr: node_memory_NFS_Unstable
- record: node_memory_PageTables_bytes
expr: node_memory_PageTables
- record: node_memory_Shmem_bytes
expr: node_memory_Shmem
- record: node_memory_Slab_bytes
expr: node_memory_Slab
- record: node_memory_SReclaimable_bytes
expr: node_memory_SReclaimable
- record: node_memory_SUnreclaim_bytes
expr: node_memory_SUnreclaim
- record: node_memory_SwapCached_bytes
expr: node_memory_SwapCached
- record: node_memory_SwapFree_bytes
expr: node_memory_SwapFree
- record: node_memory_SwapTotal_bytes
expr: node_memory_SwapTotal
- record: node_memory_Unevictable_bytes
expr: node_memory_Unevictable
- record: node_memory_VmallocChunk_bytes
expr: node_memory_VmallocChunk
- record: node_memory_VmallocTotal_bytes
expr: node_memory_VmallocTotal
- record: node_memory_VmallocUsed_bytes
expr: node_memory_VmallocUsed
- record: node_memory_Writeback_bytes
expr: node_memory_Writeback
- record: node_memory_WritebackTmp_bytes
expr: node_memory_WritebackTmp
- record: node_network_receive_bytes_total
expr: node_network_receive_bytes
- record: node_network_receive_compressed_total
expr: node_network_receive_compressed
- record: node_network_receive_drop_total
expr: node_network_receive_drop
- record: node_network_receive_errs_total
expr: node_network_receive_errs
- record: node_network_receive_fifo_total
expr: node_network_receive_fifo
- record: node_network_receive_frame_total
expr: node_network_receive_frame
- record: node_network_receive_multicast_total
expr: node_network_receive_multicast
- record: node_network_receive_packets_total
expr: node_network_receive_packets
- record: node_network_transmit_bytes_total
expr: node_network_transmit_bytes
- record: node_network_transmit_compressed_total
expr: node_network_transmit_compressed
- record: node_network_transmit_drop_total
expr: node_network_transmit_drop
- record: node_network_transmit_errs_total
expr: node_network_transmit_errs
- record: node_network_transmit_fifo_total
expr: node_network_transmit_fifo
- record: node_network_transmit_frame_total
expr: node_network_transmit_frame
- record: node_network_transmit_multicast_total
expr: node_network_transmit_multicast
- record: node_network_transmit_packets_total
expr: node_network_transmit_packets
- record: node_nfs_connections_total
expr: node_nfs_net_connections
- record: node_nfs_packets_total
expr: node_nfs_net_reads
- record: node_nfs_requests_total
expr: node_nfs_procedures
- record: node_nfs_rpc_authentication_refreshes_total
expr: node_nfs_rpc_authentication_refreshes
- record: node_nfs_rpcs_total
expr: node_nfs_rpc_operations
- record: node_nfs_rpc_retransmissions_total
expr: node_nfs_rpc_retransmissions
I know this will be missing the node_filesystem_* to ..._bytes metrics @discordianfish mentioned and possibly more. It would be great if we could get a golden copy of these rules from someone who knows that they are doing.
We now have documentation and an example rules file. Improvements are welcome, but I think this is now covered.
Here is an other list with some metrics that changed the name or disappear in the new version:
node_memory_Inactive_file -> node_memory_Inactive_file_bytes
node_memory_Inactive_anon -> node_memory_Inactive_anon_bytes
node_memory_Active_file -> node_memory_Active_file_bytes
node_memory_Active_anon -> node_memory_Active_anon_bytes
node_memory_Writeback_bytesTmp -> node_memory_WritebackTmp_bytes
node_vmstat_pgdeactivate - lost
node_vmstat_pgfree - lost
node_vmstat_pgactivate - lost
node_vmstat_kswapd_inodesteal - lost
node_vmstat_pginodesteal - lost
node_vmstat_pageoutrun - lost
node_vmstat_allocstall - lost
node_vmstat_zone_reclaim_failed - lost
node_vmstat_drop_pagecache - lost
node_vmstat_drop_slab - lost
node_vmstat_slabs_scanned - lost
node_vmstat_unevictable_pgs_cleared - lost
node_vmstat_unevictable_pgs_culled - lost
node_vmstat_unevictable_pgs_mlocked - lost
node_vmstat_unevictable_pgs_munlocked - lost
node_vmstat_unevictable_pgs_rescued - lost
node_vmstat_unevictable_pgs_scanned - lost
node_vmstat_unevictable_pgs_stranded - lost
node_vmstat_pgalloc_dma - lost
node_vmstat_pgalloc_dma32 - lost
node_vmstat_pgalloc_movable - lost
node_vmstat_pgalloc_normal - lost
node_vmstat_pgrefill_dma - lost
node_vmstat_pgrefill_dma32 - lost
node_vmstat_pgrefill_movable - lost
node_vmstat_pgrefill_normal - lost
node_vmstat_pgsteal_direct_dma - lost
node_vmstat_pgsteal_direct_dma32 - lost
node_vmstat_pgsteal_direct_movable - lost
node_vmstat_pgsteal_direct_normal - lost
node_vmstat_pgsteal_kswapd_dma - lost
node_vmstat_pgsteal_kswapd_dma32 - lost
node_vmstat_pgsteal_kswapd_movable - lost
node_vmstat_pgsteal_kswapd_normal - lost
node_vmstat_pgscan_direct_dma - lost
node_vmstat_pgscan_direct_dma32 - lost
node_vmstat_pgscan_direct_movable - lost
node_vmstat_pgscan_direct_normal - lost
node_vmstat_pgscan_direct_throttle - lost
node_vmstat_pgscan_kswapd_dma - lost
node_vmstat_pgscan_kswapd_dma32 - lost
node_vmstat_pgscan_kswapd_movable - lost
node_vmstat_pgscan_kswapd_normal - lost
node_vmstat_compact_free_scanned - lost
node_vmstat_compact_isolated - lost
node_vmstat_compact_migrate_scanned - lost
node_vmstat_compact_fail - lost
node_vmstat_compact_stall - lost
node_vmstat_compact_success - lost
node_vmstat_kswapd_high_wmark_hit_quickly - lost
node_vmstat_kswapd_low_wmark_hit_quickly - lost
node_vmstat_htlb_buddy_alloc_fail - lost
node_vmstat_htlb_buddy_alloc_success - lost
node_vmstat_numa_foreign - lost
node_vmstat_numa_hit - lost
node_vmstat_numa_interleave - lost
node_vmstat_numa_local - lost
node_vmstat_numa_miss - lost
node_vmstat_numa_other - lost
node_vmstat_numa_pages_migrated - lost
node_vmstat_pgmigrate_fail - lost
node_vmstat_pgmigrate_success - lost
node_vmstat_numa_hint_faults - lost
node_vmstat_numa_hint_faults_local - lost
node_vmstat_numa_pte_updates - lost
node_vmstat_numa_huge_pte_updates - lost
node_vmstat_thp_split - lost
node_vmstat_workingset_activate - lost
node_vmstat_workingset_nodereclaim - lost
node_vmstat_workingset_refault - lost
node_vmstat_thp_collapse_alloc - lost
node_vmstat_thp_collapse_alloc_failed - lost
node_vmstat_thp_zero_page_alloc - lost
node_vmstat_thp_zero_page_alloc_failed - lost
node_vmstat_thp_fault_alloc - lost
node_vmstat_thp_fault_fallback - lost
node_vmstat_nr_active_anon - lost
node_vmstat_nr_active_file - lost
node_vmstat_nr_inactive_anon - lost
node_vmstat_nr_inactive_file - lost
node_vmstat_nr_slab_reclaimable - lost
node_vmstat_nr_slab_unreclaimable - lost
node_vmstat_nr_free_pages - lost
node_vmstat_nr_written - lost
node_vmstat_nr_dirty - lost
node_vmstat_nr_bounce - lost
node_vmstat_nr_unevictable - lost
node_vmstat_nr_mlock - lost
node_vmstat_nr_shmem - lost
node_vmstat_nr_mapped - lost
node_vmstat_nr_kernel_stack - lost
node_vmstat_nr_writeback - lost
node_vmstat_nr_writeback_temp - lost
node_vmstat_nr_file_pages - lost
node_vmstat_nr_dirty_background_threshold - lost
node_vmstat_nr_dirty_threshold - lost
node_vmstat_nr_unstable - lost
node_vmstat_nr_dirtied - lost
node_vmstat_nr_page_table_pages - lost
node_vmstat_nr_alloc_batch - lost
node_vmstat_nr_isolated_anon - lost
node_vmstat_nr_isolated_file - lost
node_vmstat_nr_anon_pages - lost
node_vmstat_nr_anon_transparent_hugepages - lost
node_vmstat_nr_free_cma - lost
node_vmstat_nr_vmscan_write - lost
node_vmstat_nr_vmscan_immediate_reclaim - lost
node_disk_sectors_read - lost
node_disk_sectors_written - lost
node_network_transmit_multicast_total - lost (node_network_receive_multicast_total present)
node_network_transmit_frame_tota - lost (node_network_receive_frame_total present)
node_netstat_Ip_InReceive - lost
node_netstat_Ip_DefaultTTL - lost
node_netstat_Ip_InDelivers - lost
node_netstat_Ip_OutRequests - lost
node_netstat_IpExt_InBcastPkt - lost
node_netstat_IpExt_OutBcastPkts - lost
node_netstat_IpExt_InBcastOctets - lost
node_netstat_IpExt_OutBcastOctets - lost
node_netstat_IpExt_InMcastPkts - lost
node_netstat_IpExt_OutMcastPkts - lost
node_netstat_IpExt_InMcastOctets - lost
node_netstat_IpExt_OutMcastOctets - lost
node_netstat_Ip_ForwDatagrams - lost
node_netstat_Ip_FragCreates - lost
node_netstat_Ip_FragFails - lost
node_netstat_Ip_FragOKs - lost
node_netstat_IpExt_InCEPkts - lost
node_netstat_IpExt_InECT0Pkts - lost
node_netstat_IpExt_InECT1Pkts - lost
node_netstat_IpExt_InNoECTPkts - lost
node_netstat_Ip_ReasmFails - lost
node_netstat_Ip_ReasmOKs - lost
node_netstat_Ip_ReasmReqds - lost
node_netstat_Ip_ReasmTimeout - lost
node_netstat_Ip_InDiscards - lost
node_netstat_Ip_InHdrErrors - lost
node_netstat_Ip_InUnknownProtos - lost
node_netstat_Ip_OutDiscards - lost
node_netstat_Ip_OutNoRoutes - lost
node_netstat_IpExt_InNoRoutes - lost
node_netstat_IpExt_InCsumErrors - lost
node_netstat_IpExt_InTruncatedPkts - lost
node_netstat_Ip_InAddrErrors - lost
node_netstat_Tcp_InCsumErrors - lost
node_netstat_Tcp_InSegs - lost
node_netstat_Tcp_OutRsts - lost
node_netstat_Tcp_OutSegs - lost
node_netstat_Tcp_MaxConn - lost
node_netstat_TcpExt_TCPAbortOnClose - lost
node_netstat_TcpExt_TCPAbortOnData - lost
node_netstat_TcpExt_TCPAbortOnLinger - lost
node_netstat_TcpExt_TCPAbortOnMemory - lost
node_netstat_TcpExt_TCPAbortOnTimeout - lost
node_netstat_TcpExt_TCPAbortFailed - lost
node_netstat_TcpExt_TCPTimeouts - lost
node_netstat_TcpExt_DelayedACKLocked - lost
node_netstat_TcpExt_DelayedACKLost - lost
node_netstat_TcpExt_DelayedACKs - lost
node_netstat_TcpExt_TCPSYNChallenge - lost
node_netstat_TcpExt_TCPChallengeACK - lost
node_netstat_TcpExt_TCPLossFailures - lost
node_netstat_TcpExt_TCPLossProbeRecovery - lost
node_netstat_TcpExt_TCPLossProbes - lost
node_netstat_TcpExt_TCPLossUndo - lost
node_netstat_TcpExt_TCPLostRetransmit - lost
node_netstat_TcpExt_LockDroppedIcmps - lost
node_netstat_TcpExt_TCPDeferAcceptDrop - lost
node_netstat_TcpExt_TCPBacklogDrop - lost
node_netstat_TcpExt_OutOfWindowIcmps - lost
node_netstat_TcpExt_TCPMinTTLDrop - lost
node_netstat_TcpExt_TCPForwardRetrans - lost
node_netstat_TcpExt_TCPSlowStartRetrans - lost
node_netstat_TcpExt_TCPSynRetrans - lost
node_netstat_TcpExt_TCPSpuriousRTOs - lost
node_netstat_TcpExt_TCPSpuriousRtxHostQueues - lost
node_netstat_TcpExt_TCPFullUndo - lost
node_netstat_TcpExt_TCPRetransFail - lost
node_netstat_TcpExt_TCPPartialUndo - lost
node_netstat_TcpExt_PruneCalled - lost
node_netstat_TcpExt_RcvPruned - lost
node_netstat_TcpExt_OfoPruned - lost
node_netstat_TcpExt_OfoPruned - lost
node_netstat_TcpExt_TCPDirectCopyFromBacklog - lost
node_netstat_TcpExt_TCPDirectCopyFromPrequeue - lost
node_netstat_TcpExt_TW - lost
node_netstat_TcpExt_TWKilled - lost
node_netstat_TcpExt_TWRecycled - lost
node_netstat_TcpExt_TCPTimeWaitOverflow - lost
node_netstat_TcpExt_PAWSActive - lost
node_netstat_TcpExt_PAWSEstab - lost
node_netstat_TcpExt_PAWSPassive - lost
node_netstat_TcpExt_TCPSackRecovery - lost
node_netstat_TcpExt_TCPSackRecoveryFail - lost
node_netstat_TcpExt_TCPSackShiftFallback - lost
node_netstat_TcpExt_TCPSackShifted - lost
node_netstat_TcpExt_TCPSACKDiscard - lost
node_netstat_TcpExt_TCPSackFailures - lost
node_netstat_TcpExt_TCPSackMerged - lost
node_netstat_TcpExt_TCPSACKReneging - lost
node_netstat_TcpExt_TCPSACKReorder - lost
node_netstat_TcpExt_TCPDSACKIgnoredOld - lost
node_netstat_TcpExt_TCPDSACKOfoRecv - lost
node_netstat_TcpExt_TCPDSACKOfoSent - lost
node_netstat_TcpExt_TCPDSACKOldSent - lost
node_netstat_TcpExt_TCPDSACKRecv - lost
node_netstat_TcpExt_TCPDSACKUndo - lost
node_netstat_TcpExt_TCPDSACKIgnoredNoUndo - lost
node_netstat_TcpExt_TCPFastOpenActive - lost
node_netstat_TcpExt_TCPFastOpenActiveFail - lost
node_netstat_TcpExt_TCPFastOpenCookieReqd - lost
node_netstat_TcpExt_TCPFastOpenListenOverflow - lost
node_netstat_TcpExt_TCPFastOpenPassive - lost
node_netstat_TcpExt_TCPFastOpenPassiveFail - lost
node_netstat_TcpExt_TCPFastRetrans - lost
node_netstat_TcpExt_TCPHPAcks - lost
node_netstat_TcpExt_TCPHPHits - lost
node_netstat_TcpExt_TCPHPHitsToUser - lost
node_netstat_TcpExt_TCPToZeroWindowAdv - lost
node_netstat_TcpExt_TCPWantZeroWindowAdv - lost
node_netstat_TcpExt_TCPFromZeroWindowAdv - lost
node_netstat_TcpExt_TCPFACKReorder - lost
node_netstat_TcpExt_TCPTSReorder - lost
node_netstat_TcpExt_TCPRenoFailures - lost
node_netstat_TcpExt_TCPRenoRecovery - lost
node_netstat_TcpExt_TCPRenoRecoveryFail - lost
node_netstat_TcpExt_TCPRenoReorder - lost
node_netstat_TcpExt_TCPReqQFullDoCookies - lost
node_netstat_TcpExt_TCPReqQFullDrop - lost
node_netstat_TcpExt_TCPOFODrop - lost
node_netstat_TcpExt_TCPOFOMerge - lost
node_netstat_TcpExt_TCPOFOQueue - lost
node_netstat_TcpExt_TCPMD5NotFound - lost
node_netstat_TcpExt_TCPMD5Unexpected - lost
node_netstat_TcpExt_TCPPrequeued - lost
node_netstat_TcpExt_TCPPrequeueDropped - lost
node_netstat_TcpExt_TCPRcvCoalesce - lost
node_netstat_TcpExt_TCPRcvCollapsed - lost
node_netstat_TcpExt_TCPOrigDataSent - lost
node_netstat_TcpExt_ArpFilter - lost
node_netstat_TcpExt_IPReversePathFilter - lost
node_netstat_TcpExt_TCPPureAcks - lost
node_netstat_TcpExt_TCPAutoCorking - lost
node_netstat_TcpExt_BusyPollRxPackets - lost
node_netstat_TcpExt_EmbryonicRsts - lost
node_netstat_Udp_InCsumErrors - lost
node_netstat_Udp_RcvbufErrors - lost
node_netstat_Udp_SndbufError - lost
node_netstat_UdpLite_InDatagrams - lost
node_netstat_UdpLite_OutDatagrams - lost
node_netstat_UdpLite_InCsumError - lost
node_netstat_UdpLite_RcvbufErrors - lost
node_netstat_UdpLite_SndbufErrors - lost
node_netstat_UdpLite_NoPorts - lost
node_netstat_Icmp_OutErrors - lost
node_netstat_Icmp_InDestUnreachs - lost
node_netstat_Icmp_OutDestUnreachs - lost
node_netstat_IcmpMsg_InType3 - lost
node_netstat_IcmpMsg_OutType3 - lost
node_netstat_Icmp_InCsumErrors - lost
node_netstat_Icmp_InTimeExcds - lost
node_netstat_Icmp_OutTimeExcds - lost
node_netstat_Icmp_InParmProbs - lost
node_netstat_Icmp_OutParmProbs - lost
node_netstat_Icmp_InSrcQuenchs - lost
node_netstat_Icmp_OutSrcQuenchs - lost
node_netstat_Icmp_InRedirects - lost
node_netstat_Icmp_OutRedirects - lost
node_netstat_Icmp_InTimestampReps - lost
node_netstat_Icmp_InTimestamps - lost
node_netstat_Icmp_OutTimestampReps - lost
node_netstat_Icmp_OutTimestamps - lost
node_netstat_Icmp_InEchoReps - lost
node_netstat_Icmp_InEchos - lost
node_netstat_Icmp_OutEchoReps - lost
node_netstat_Icmp_OutEchos - lost
node_netstat_Icmp_InAddrMaskReps - lost
node_netstat_Icmp_InAddrMasks - lost
node_netstat_Icmp_OutAddrMaskReps - lost
@rfrail3 We intentionally added whitelists for vmstat and netstat to reduce the number of metrics. If you know of any very useful metrics that we dropped from that list, please let us know.
As for the the two network metrics, this was actually a long standing bug, they have been renamed to be correct.
That's makes sense, they were a lot of records. In that case, I think that the --help output and the README.md needs to be modified because they still reflects both as enabled, do I need to open a new issue?
--collector.vmstat Enable the vmstat collector (default: enabled).
--collector.netstat Enable the netstat collector (default: enabled).
On the other hand, I found two records that lost their couple, both have the _receive_ but lost the _transmit_ part:
node_network_transmit_multicast_total - lost (node_network_receive_multicast_total present)
node_network_transmit_frame_tota - lost (node_network_receive_frame_total present)
And these other two are part of the disk family, no vmstat or either netstat, and useful for monitoring. It would be nice to recover them again:
node_disk_sectors_read - lost
node_disk_sectors_written - lost
Thanks,
vmstat and netstat are still enabled, it's just that the list is filtered. See the --help output:
--collector.netstat.fields="^(.*_(InErrors|InErrs)|Ip_Forwarding|Ip(6|Ext)_(InOctets|OutOctets)|Icmp6?_(InMsgs|OutMsgs)|TcpExt_(Listen.*|Syncookies.*)|Tcp_(ActiveOpens|PassiveOpens|RetransSegs|CurrEstab)|Udp6?_(InDatagrams|OutDatagrams|NoPorts))$"
Regexp of fields to return for netstat collector.
--collector.vmstat.fields="^(oom_kill|pgpg|pswp|pg.*fault).*"
Regexp of fields to return for vmstat collector.
For node network, I said above, those metrics are gone, because they were a bug, and not named correctly in the first place. Please review https://github.com/prometheus/node_exporter/pull/890
node_disk_sectors_.* were removed by https://github.com/prometheus/node_exporter/pull/787 as they duplicate node_disk_.*_bytes_total. You can simply use the bytes metric instead.
Sorry, I didn't see that new options, it's fine.
Thanks for the clarification, indeed.
Most helpful comment
@matthiasr I've sorted the list roughly by collector.