CentOS Linux release 7.6.1810 (Core)
[Include Telegraf version, operating system name, and other relevant details]
Telegraf 1.9.2 (git: HEAD dda80799)
influxdb-1.7.2-1
Graph should be Constance

2019-01-16T13:41:10Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-16T13:41:20Z D! [outputs.influxdb] wrote batch of 28 metrics in 7.071246ms
2019-01-16T13:41:20Z D! [outputs.influxdb] buffer fullness: 0 / 10000 metrics.
2019-01-16T13:41:20Z W! [agent] input "inputs.vsphere" did not complete within its interval
[Include gist of relevant config, logs, etc.]
If i am restarting telegraf then, it is working fine for few minutes and again started with same error and no graph
Could you check if this still occurs in the nightly builds ?
@danielnelson Thank you!.... I moved to below version as per your suggestion.
Telegraf unknown (git: master e95b88e0)
All graphs are generating little constant but, issue with CPU collection only which is missing for all esxi hosts
CPU graph which is missing in between

Should work as below

It may be that you need to increase your collection interval, depending on how many values you are collecting and the amount of time it takes. One way to see how much time the plugin is taking is to enable the internal plugin and look at the internal_gather,plugin=vsphere and internal_vsphere metrics.
Can you also attach your configuration for this plugin?
Please share your configuration file with us! Shorter collection interval than 60s is generally not recommended for vsphere. You may also want to increase collection concurrency.
FYI: configured 4 VC which will have 500+ hosts all together.
Even many time observed [inputs.vsphere]: Error in plugin: While collecting host: Post https://VC/sdk: context deadline exceeded
FYI: It is not able to collect metrics where more number of hosts present. (having 2 VC where few number of hosts are present and which is showing constant graph) only issue is with larger inventory.
Okay! here is the config file.
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = true
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
urls = ["http://127.0.0.1:8086"]
database = "telegraf"
timeout = "5s"
username = "user"
password = "pass"
[[inputs.vsphere]]
vcenters = [ "VC1","VC2","VC3", VC4]
username = user
password = pass
vm_metric_exclude = [ "*" ]
host_metric_include = [
"cpu.coreUtilization.average",
"cpu.costop.summation",
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.swapwait.summation",
"cpu.usage.average",
"cpu.usagemhz.average",
"cpu.used.summation",
"cpu.utilization.average",
"cpu.wait.summation",
"mem.active.average",
"mem.latency.average",
"mem.state.latest",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"mem.totalCapacity.average",
"mem.usage.average",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"net.errorsRx.summation",
"net.errorsTx.summation",
"net.usage.average",
"power.power.average",
"storageAdapter.numberReadAveraged.average",
"power.power.average",
"storageAdapter.numberReadAveraged.average",
"storageAdapter.numberWriteAveraged.average",
"storageAdapter.read.average",
"storageAdapter.write.average",
"sys.uptime.latest",
]
cluster_metric_include = []
datastore_metric_exclude = [ "*" ]
insecure_skip_verify = true
I have the same issue with version 1.9.0.1、1.9.1.1,no problem with version 1.8.3-1.
FYI: configured 2 VC which will have less than 10 hosts all together.
Here is the config file.:
[global_tags]
[agent]
interval = "15s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
precision = ""
debug = false
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = false
[[outputs.prometheus_client]]
listen = ":9273"
[[inputs.vsphere]]
vcenters = [ "VC1","VC2" ]
username = "*"
password = ""
insecure_skip_verify = true
Here is the log:
2019-01-21T03:24:17Z I! [agent] Hang on, flushing any cached metrics before shutdown
2019-01-21T03:24:18Z I! Loaded inputs: inputs.mem inputs.system inputs.cpu inputs.diskio inputs.kernel inputs.processes inputs.swap inputs.vsphere inputs.disk
2019-01-21T03:24:18Z I! Loaded aggregators:
2019-01-21T03:24:18Z I! Loaded processors:
2019-01-21T03:24:18Z I! Loaded outputs: prometheus_client
2019-01-21T03:24:18Z I! Tags enabled: host=bocloud
2019-01-21T03:24:18Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"localhost", Flush Interval:10s
2019-01-21T03:24:18Z W! [input.vsphere] Configured max_query_metrics is 256, but server limits it to 64. Reducing.
2019-01-21T03:24:45Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:25:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:25:15Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:25:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:25:45Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:27:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:27:45Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:28:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:28:15Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:28:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:28:50Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:29:05Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:29:20Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:29:35Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:29:50Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:30:05Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:30:15Z E! [inputs.vsphere]: Error in plugin: While collecting host: ServerFaultCode: A specified parameter was not correct: querySpec.startTime, querySpec.endTime
2019-01-21T03:30:15Z E! [inputs.vsphere]: Error in plugin: While collecting vm: ServerFaultCode: A specified parameter was not correct: querySpec.startTime, querySpec.endTime
2019-01-21T03:32:30Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:32:45Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:33:00Z W! [agent] input "inputs.vsphere" did not complete within its interval
2019-01-21T03:33:15Z W! [agent] input "inputs.vsphere" did not complete within its interval
It looks like your interval is set to 15s. You need to increase that to at least 20s!
Keep in mind that vSphere data is only available every 20s, you should never specify an interval lower than that. We generally recommend you keep the collection interval to 60s due to the load you shorter intervals may put on the vCenter server. However, we have been able to successfully use a 20s interval on a vCenter managing 7000 VMs, so it is possible, albeit not recommended.
If you truly need 20s granularity on your data, I recommend you do two things:
1) Move collection of clusters, datacenters and datastores to a separate instance of the plugin. These metrics are only available on a 300s interval, so it's not useful to collect them more often than that. Also, since they're stored on disk and not in memory, they take considerably longer to fetch. Here's a writeup I did on this. You might find this helpful: http://docs-dev.wavefront.com/integrations_vsphere.html (Note to self: Add something similar to the README)
2) Once you've made the changes above, you should increase both discover_concurrency and collect_concurrency to 3. This should give you an extra performance boost!
It works, thanks a lot
No. it is not working only if including datastore metrics.
If excluding datastore metrics then, no missing values observed (means graph is stable).
As stated above, you need to declare a separate instance of the plugin and specify data stores, clusters and data centers in that instance with a 300s interval.
According to your suggestion, real-time metrics are good, but non-real-time data can not get continuous data.
How can I always get non-real-time data? Is this a bug? see #5322

What's the collection interval on the non real time metrics? It should be 300s or higher. Also, can you run with the -debug flag and send us logs?
And yes, it might be because of a bug that's scheduled to be fixed in 1.10. You may want to try the latest nightly build from master and see if it fixes the issue.
Yes!
object_discovery_interval = "300s"
collect_concurrency = 4
discover_concurrency = 4
Data-collection interval
interval = "60s"
But, below error observed in logs.
2019-01-29T06:36:03Z E! [inputs.vsphere]: Error in plugin: While collecting host: Post https://example.com/sdk: context deadline exceeded
2019-01-29T06:37:03Z E! [inputs.vsphere]: Error in plugin: While collecting host: Post https://example.com/sdk: context deadline exceeded
and during this time, data is showing no value.
Same Problem here !
I've used a release that @prydin made for another issue and worked like a charm.
Functional release
Telegraf unknown (git: prydin-scale-improvement 646c5960)
Not Functional release
telegraf-1.9.4-1.x86_64
I only get inputs-vsphere did not complete within its interval
PROBLEM
No graph is generated !
CONFIG
#[global_tags]
[agent]
interval = '300s'
round_interval = true
metric_batch_size = 10000
metric_buffer_limit = 100000
collection_jitter = '0s'
flush_interval = '10s'
flush_jitter = '0s'
precision = ''
debug = true
quiet = false
logfile = '/var/log/telegraf/telegraf.log'
hostname = ''
omit_hostname = false
[[outputs.influxdb]]
urls = ['http://foo.bar.com:8086']
database = 'telegraf'
[[inputs.vsphere]]
vcenters = [ 25 x vcenter ]
username = 'foobar'
password = 'barfoo'
vm_metric_include = [
'sys.osUptime.latest',
'cpu.usage.average',
'disk.read.average',
'cpu.usage.average',
'cpu.demand.average',
'cpu.idle.summation',
'cpu.latency.average',
'cpu.readiness.average',
'cpu.ready.summation',
'cpu.run.summation',
'cpu.usagemhz.average',
'cpu.used.summation',
'cpu.wait.summation',
'mem.active.average',
'mem.granted.average',
'mem.latency.average',
'mem.swapin.average',
'mem.swapinRate.average',
'mem.swapout.average',
'mem.swapoutRate.average',
'mem.usage.average',
'mem.vmmemctl.average',
'net.bytesRx.average',
'net.bytesTx.average',
'net.droppedRx.summation',
'net.droppedTx.summation',
'net.usage.average',
'power.power.average',
'virtualDisk.numberReadAveraged.average',
'virtualDisk.numberWriteAveraged.average',
'virtualDisk.read.average',
'virtualDisk.readOIO.latest',
'virtualDisk.throughput.usage.average',
'virtualDisk.totalReadLatency.average',
'virtualDisk.totalWriteLatency.average',
'virtualDisk.write.average',
'virtualDisk.writeOIO.latest',
'sys.uptime.latest',
]
host_metric_include = [
'cpu.coreUtilization.average',
'cpu.costop.summation',
'cpu.demand.average',
'cpu.idle.summation',
'cpu.latency.average',
'cpu.readiness.average',
'cpu.ready.summation',
'cpu.swapwait.summation',
'cpu.usage.average',
'cpu.usagemhz.average',
'cpu.used.summation',
'cpu.utilization.average',
'cpu.wait.summation',
'disk.deviceReadLatency.average',
'disk.deviceWriteLatency.average',
'disk.kernelReadLatency.average',
'disk.kernelWriteLatency.average',
'disk.numberReadAveraged.average',
'disk.numberWriteAveraged.average',
'disk.read.average',
'disk.totalReadLatency.average',
'disk.totalWriteLatency.average',
'disk.write.average',
'mem.active.average',
'mem.latency.average',
'mem.state.latest',
'mem.swapin.average',
'mem.swapinRate.average',
'mem.swapout.average',
'mem.swapoutRate.average',
'mem.totalCapacity.average',
'mem.usage.average',
'mem.vmmemctl.average',
'net.bytesRx.average',
'net.bytesTx.average',
'net.droppedRx.summation',
'net.droppedTx.summation',
'net.errorsRx.summation',
'net.errorsTx.summation',
'net.usage.average',
'power.power.average',
'storageAdapter.numberReadAveraged.average',
'storageAdapter.numberWriteAveraged.average',
'storageAdapter.read.average',
'storageAdapter.write.average',
'sys.uptime.latest',
]
datastore_metric_include = [
'datastore.numberReadAveraged.average',
'datastore.throughput.contention.average',
'datastore.throughput.usage.average',
'datastore.write.average',
'datastore.read.average',
'datastore.numberWriteAveraged.average',
'disk.used.latest',
'disk.provisioned.latest',
'disk.capacity.latest',
'disk.capacity.contention.average',
'disk.capacity.provisioned.average',
'disk.capacity.usage.average'
]
cluster_metric_include = []
datacenter_metric_exclude = [ '*' ]
collect_concurrency = 2
discover_concurrency = 1
object_discovery_interval = '1200s'
insecure_skip_verify = true
I having the same issue using this build: 1.9.4-1.x86_64.
Please help me fix the same or recommend the right telegraf build. I am having 200+ hosts with 2000+ VM's.
[agent] input "inputs.vsphere" did not complete within its interval
# Read metrics from one or many vCenters
[[inputs.vsphere]]
## List of vCenter URLs to be monitored. These three lines must be uncommented
## and edited for the plugin to work.
vcenters = [ "https://xhd/sdk" ]
username = "sdf"
password = "password"
## VMs
## Typical VM metrics (if omitted or empty, all metrics are collected)
vm_metric_include = [
"mem.usage.average",
"net.usage.average",
]
vm_metric_exclude = [
"power.power.average",
"virtualDisk.numberReadAveraged.average",
"virtualDisk.numberWriteAveraged.average",
"virtualDisk.read.average",
"virtualDisk.readOIO.latest",
"virtualDisk.throughput.usage.average",
"virtualDisk.totalReadLatency.average",
"virtualDisk.totalWriteLatency.average",
"virtualDisk.write.average",
"virtualDisk.writeOIO.latest",
"sys.uptime.latest",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"cpu.wait.summation",
"mem.active.average",
"mem.granted.average",
"mem.latency.average",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"cpu.run.summation",
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.usagemhz.average",
"cpu.used.summation",
]
## Nothing is excluded by default
# vm_instances = true ## true by default
## Hosts
## Typical host metrics (if omitted or empty, all metrics are collected)
host_metric_include = [
"cpu.usage.average",
"cpu.usagemhz.average",
"cpu.used.summation",
"cpu.utilization.average",
"cpu.wait.summation",
"mem.usage.average",
"net.usage.average",
]
host_metric_exclude = [
"power.power.average",
"storageAdapter.numberReadAveraged.average",
"storageAdapter.numberWriteAveraged.average",
"storageAdapter.read.average",
"storageAdapter.write.average",
"sys.uptime.latest",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"net.errorsRx.summation",
"net.errorsTx.summation",
"disk.deviceReadLatency.average",
"disk.deviceWriteLatency.average",
"disk.kernelReadLatency.average",
"disk.kernelWriteLatency.average",
"disk.numberReadAveraged.average",
"disk.numberWriteAveraged.average",
"disk.read.average",
"disk.totalReadLatency.average",
"disk.totalWriteLatency.average",
"disk.write.average",
"mem.active.average",
"mem.latency.average",
"mem.state.latest",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"mem.totalCapacity.average",
"cpu.coreUtilization.average",
"cpu.costop.summation",
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.swapwait.summation",
]
## Nothing excluded by default
# host_instances = true ## true by default
## Clusters
# cluster_metric_include = [] ## if omitted or empty, all metrics are collected
# cluster_metric_exclude = [] ## Nothing excluded by default
# cluster_instances = true ## true by default
## Datastores
# datastore_metric_include = [] ## if omitted or empty, all metrics are collected
# datastore_metric_exclude = [] ## Nothing excluded by default
# datastore_instances = false ## false by default for Datastores only
## Datacenters
datacenter_metric_include = [] ## if omitted or empty, all metrics are collected
datacenter_metric_exclude = [ "*" ] ## Datacenters are not collected by default.
# datacenter_instances = false ## false by default for Datastores only
## Plugin Settings
## separator character to use for measurement and field names (default: "_")
# separator = "_"
## number of objects to retreive per query for realtime resources (vms and hosts)
## set to 64 for vCenter 5.5 and 6.0 (default: 256)
# max_query_objects = 256
## number of metrics to retreive per query for non-realtime resources (clusters and datastores)
## set to 64 for vCenter 5.5 and 6.0 (default: 256)
# max_query_metrics = 256
## number of go routines to use for collection and discovery of objects and metrics
collect_concurrency = 5
discover_concurrency = 3
## whether or not to force discovery of new objects on initial gather call before collecting metrics
## when true for large environments this may cause errors for time elapsed while collecting metrics
## when false (default) the first collection cycle may result in no or limited metrics while objects are discovered
# force_discover_on_init = false
## the interval before (re)discovering objects subject to metrics collection (default: 300s)
# object_discovery_interval = "300s"
## timeout applies to any of the api request made to vcenter
timeout = "180s"
## Optional SSL Config
# ssl_ca = "/path/to/cafile"
# ssl_cert = "/path/to/certfile"
# ssl_key = "/path/to/keyfile"
## Use SSL but skip chain & host verification
insecure_skip_verify = true
@sunnybhatnagar Can you try with the latest nightly build?
@danielnelson I have the same issue; just tried the nightly build and that fixes the issue!
But now I got weird failed events in my vcenter, see:


@MartVisser Can you open a new issue for that side effect?
I'm going to close this issue, if anyone is having issues with the plugin not completing by the end of the interval please read the hints above and try again with the nightly builds. If you still have problems after that, please open a new issue.
Most helpful comment
It looks like your interval is set to 15s. You need to increase that to at least 20s!
Keep in mind that vSphere data is only available every 20s, you should never specify an interval lower than that. We generally recommend you keep the collection interval to 60s due to the load you shorter intervals may put on the vCenter server. However, we have been able to successfully use a 20s interval on a vCenter managing 7000 VMs, so it is possible, albeit not recommended.
If you truly need 20s granularity on your data, I recommend you do two things:
1) Move collection of clusters, datacenters and datastores to a separate instance of the plugin. These metrics are only available on a 300s interval, so it's not useful to collect them more often than that. Also, since they're stored on disk and not in memory, they take considerably longer to fetch. Here's a writeup I did on this. You might find this helpful: http://docs-dev.wavefront.com/integrations_vsphere.html (Note to self: Add something similar to the README)
2) Once you've made the changes above, you should increase both discover_concurrency and collect_concurrency to 3. This should give you an extra performance boost!