We are currently in the process of converting from Ganglia to Telegraf. (yeah!) Unfortunately, we have some existing dependence on the Ganglia cpu_speed metric. This is not found in Telegraf.
Add a cpu_speed field or equivalent to the cpu or system measurement. This would be in MHz.
This helps mostly in the capacity management area, when mapping cpu mhz of an application group that is targeted for migration to new hosts. We can get the cpu speed other ways, of course, but having it directly and natively in Telegraf would be optimal.
A quick look suggests that we could use the cpu.Info() function from gopsutil to pull in some additional cpu fields:
type InfoStat struct {
CPU int32 `json:"cpu"`
VendorID string `json:"vendorId"`
Family string `json:"family"`
Model string `json:"model"`
Stepping int32 `json:"stepping"`
PhysicalID string `json:"physicalId"`
CoreID string `json:"coreId"`
Cores int32 `json:"cores"`
ModelName string `json:"modelName"`
Mhz float64 `json:"mhz"`
CacheSize int32 `json:"cacheSize"`
Flags []string `json:"flags"`
Microcode string `json:"microcode"`
}
This reads and parses /proc/cpuinfo on Linux.
Well it depends on what we're really looking for here. Are we wanting maximum speed, or current speed? What about the max or min limits?
I was looking for just the "CPU MHz" field from the linux command 'lscpu', which appears to be the same as the "cpu MHz" field of each CPU core from 'cat /proc/cpuinfo'. This doesn't change for me and matches the CPU description, like "Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz". I'm on a VMware infrastruture, though.
I was not looking for instantaneous frequency or even max boost frequency, just the frequency that corresponds to the CPU description--which when combined with the CPU core count can help give some comparative sense of capacity among VMs and environments.
This field is the instantaneous frequency of the processor, but there is also min and max, here it is on my laptop:
CPU MHz: 499.877
CPU max MHz: 3400.0000
CPU min MHz: 400.0000
Even though min/max don't change, I can see the usefulness across a fleet of systems of collecting them. I think the main thing we should decide is if we want collecting this data to be opt-in or if it is light enough we should just add it. I think we can just add these 3 fields in as part of the standard fields collected by the cpu plugin since it should be a fairly light amount of extra load.
What about the limits? Limits might be useful on embedded (or other) systems which adjust the limits to conserve power.
Dunno if gopsutil provides them all in one spot, but they can all be obtained from /sys/devices/system/cpu/cpu*/cpufreq/
Basically all the fields and their relationships with each other are:
cpuinfo_min_freq <= scaling_min_freq <= scaling_cur_freq <= scaling_max_freq <= cpuinfo_max_freq
This feature request should probably also be reconciled with this PR:
https://github.com/influxdata/telegraf/pull/4215
I tried in the past to read lscpu and to input the data into InfluxDB using the exec input plugin.
The values were always higher than what I would get by running the same command from the command line because by the time telegraf gets to run the plugin, the CPU or kernel already increased the frequency.
I would say that the plugin makes little sense, unless it is proven to provide reliable values.
I'm running a E3-1220 v2 on Ubuntu 18.04.
I tried in the past to read lscpu and to input the data into InfluxDB using the exec input plugin.
The values were always higher than what I would get by running the same command from the command line because by the time telegraf gets to run the plugin, the CPU or kernel already increased the frequency.I would say that the plugin makes little sense, unless it is proven to provide reliable values.
I see what you mean. In usecases like mine, (having XX cores HPC machine) one could assume the noise introduced by Telegraf itself can be expected to affect just few (?) cores (?). Anyway, going to write some exec() collection of /sys/devices/system/cpu/cpuXXX/cpufreq/cpuinfo_cur_freq and keep it running for some weeks on few compute nodes to see the real-life results.
here is P-O-C graded collector meant to be used as Exec input in Telegraf:
https://github.com/jose-d/telegraf-collectors/blob/master/cpufreq-monitor/give_stats.py
at the end I collect the data from
/sys/devices/system/cpu/cpuNN/cpufreq/scaling_cur_freq as it is readable (Centos7) by non-root user.
screenshot from Grafana:
(it's actually showing the reason why this monitoring is useful for me - detecting suboptimal usage of CPU resources by $users )

Most helpful comment
here is P-O-C graded collector meant to be used as Exec input in Telegraf:
https://github.com/jose-d/telegraf-collectors/blob/master/cpufreq-monitor/give_stats.py
at the end I collect the data from
/sys/devices/system/cpu/cpuNN/cpufreq/scaling_cur_freqas it is readable (Centos7) by non-root user.screenshot from Grafana:
(it's actually showing the reason why this monitoring is useful for me - detecting suboptimal usage of CPU resources by $users )