I am trying to minimize SNMP traffic between Telegraf agent and nodes from network. Environment where SNMP is really constrained about it network speed and cabability.
First i thought that moving gets to bulks should solve problem. However it seems current OID structure isn't good fit for it (?). I found that SNMP actually supports multiple gets ant single requests, tested with on command line and it seems to work as expected.
My real question is that is there way to configure Telegraf SNMP plugin to request multiple results at single requests?
Basically multiple requests looks like (command line request)

Instead of (telegraf request)

My example configuration:
[[inputs.snmp]]
agents = [ "172.21.103.129" ]
version = 2
community = "public"
name = "snmp_foo"
[[inputs.snmp.field]]
is_tag = true
name = "node"
oid = ".1.3.6.1.4.1.30306.1.1.1.4.1.1.28.0"
[[inputs.snmp.field]]
is_tag = true
name = "network"
oid = ".1.3.6.1.4.1.30306.1.1.1.4.1.1.1.0"
[[inputs.snmp.field]]
name = "voltage_v_est"
oid = ".1.3.6.1.4.1.30306.1.1.1.4.2.1.2.0"
[[inputs.snmp.field]]
name = "temperature"
oid = ".1.3.6.1.4.1.30306.1.1.1.4.2.1.1.0"
[[inputs.snmp.field]]
name = "ru_tx_packet"
oid = ".1.3.6.1.4.1.30306.1.1.1.4.2.1.7.0"
[[inputs.snmp.field]]
name = "ru_rx_packet"
oid = ".1.3.6.1.4.1.30306.1.1.1.4.2.1.8.0"
Use case: [Why is this important (helps with prioritizing requests)]
Get network usage as small as possible on constrained environment.
To answer your question the plugin can't be configured to do this currently.
Would be a nice feature for my environment too.
Currently some old hardware sometimes (before midnight in the example below) is not able to cope with many subsequent requests, while it does not show any problem when it must return several values to a single request.

Before querying it is not possible to determine if we should use single gets or do a walk. However, I think we might be able to solve this if we make some modifications to the way we gather tables.
Here are a couple examples to consider:
Example 1
[[inputs.snmp]]
[[inputs.snmp.table]]
oid = "IF-MIB::ifXTable"
In this config example with a specified table OID, we currently call snmptable to get the table column names and then use snmptranslate to find their OIDs. For each column a walk is done to get all of the values by index.
Could we just walk once starting at the provided table OID and use the snmptranslate info only for potentially identifying the table index (not sure why this is done) and for determining the field name/conversion? If we did this could we remove snmptable (#2276) as I'm not sure what it would do for us.
Example 2
[[inputs.snmp]]
[[inputs.snmp.table]]
name = "interface"
[[inputs.snmp.table.field]]
oid = "IF-MIB::ifHCInOctets"
[[inputs.snmp.table.field]]
oid = "IF-MIB::ifHCOutOctets"
This does two things, names the measurement and since there is no table oid it only gathers the specified field. I don't know if there is a better way to find the parent table OID, but perhaps we could just find the longest common prefix, then we could combine this into a single walk request.
Example 3
[[inputs.snmp]]
[[inputs.snmp.table]]
name = "synoSystem"
[[inputs.snmp.table.field]]
name = "systemStatus"
oid = "1.3.6.1.4.1.6574.1.1"
[[inputs.snmp.table.field]]
name = "temperature"
oid = "1.3.6.1.4.1.6574.1.2"
This example (based on #4979) is similar to example 2 but the items are not actually in a table but can still be retrieved using a single walk entry point. The expected behavior here is that you can collect all the items in a single request, and for it to act the same as snmpwalk.
I think this form could be used to solve the original reported case if we used the longest common prefix method to start the query.
TLDR; I think we could get rid of snmptable use and always walk either the table OID or, if not set, the longest field prefix. When the table OID is specified all values would be reported and if not set only the specified fields would be reported. Top level fields would remain as single value gets.
@phemmer Sorry to ping you on so many issues lately but of course I would really value your advice before taking any actions here.
Could it be possible to just perform a single "GetBulk" or "Get" with multiple OIDs?
It seems to be allowed by soniah/gosnmp.
In my opinion such a feature could address a significant number of performance issues without the need of the logic behind the decision to perform a "walk" or not.
Currently you can perform a single walk by using specifying the OID of the conceptual table (example 1), but yes, in general we can.
Here is what I see as the possibilities:
table do a single walk over non-conceptual table OIDs using longest common prefix.field. This doesn't seem very flexible as you couldn't rename the fields or do conversions.walk subtable that is similar to table but works over non-conceptual tables.I explain what I have in mind; please apologize :-)
Let us have a config like the following
[[inputs.snmp]]
[[inputs.snmp.field]]
name = "InOctets"
oid = ".1.3.6.1.2.1.31.1.1.1.6.11615"
[[inputs.snmp.field]]
name = "OutOctets"
oid = ".1.3.6.1.2.1.31.1.1.1.10.11615"
[[inputs.snmp.field]]
name = "UnicastRx"
oid = ".1.3.6.1.2.1.2.2.1.11.11615"
[[inputs.snmp.field]]
name = "UnicastTx"
oid = ".1.3.6.1.2.1.2.2.1.17.11615"
The plugin currently generates four separate sessions, each one performing one single Get.
snmpget .1.3.6.1.2.1.31.1.1.1.6.11615
snmpget .1.3.6.1.2.1.31.1.1.1.6.11615
snmpget .1.3.6.1.2.1.2.2.1.11.11615
snmpget .1.3.6.1.2.1.2.2.1.17.11615
and in my experience some devices are not able (or do not allow) to respond if the number of sessions increases.
Is it possible for the plugin to translate that kind of configuration into something like
snmpbulkget .1.3.6.1.2.1.31.1.1.1.6.11615 .1.3.6.1.2.1.31.1.1.1.6.11615 .1.3.6.1.2.1.2.2.1.11.11615 .1.3.6.1.2.1.2.2.1.17.11615
with a limit of, for example, one bulkget every 20 OIDs)?
Based on a little bit of testing, I think this is essentially the 2nd option in my last comment. But it seems that in this case ideally we would use GetRequest with multiple OIDs, since GetBulkRequest would start a walk and we would pull back a lot of unrelated data. This should result in a single request and response for all the fields at once.
This might be how we want to handle Example 2 above as well.
@danielnelson I think you are still confusing snmpbulkget with snmpbulkwalk. It perfectly makes sense to make a bulk get request for every 20 (or configurable amount) of OID's. That doesn't mean snmp will walk them, but only means that snmp will request them in 1 snmp request instead of 20.
Thanks you're right, I guess the question left is do you think we should switch to using GETBULK with the max_repetitions option always or if we need this to be user controlled? It seems like we might have a lot of overruns, so maybe we need a way to specify a group of fields to GETBULK over:
[[inputs.snmp]]
## snmpget
[[inputs.snmp.field]]
oid = "RFC1213-MIB::sysName.0"
## snmptable/snmpbulkwalk
[[inputs.snmp.table]]
oid = "IF-MIB::ifTable"
## snmpbulkget
[[inputs.snmp.bulk]]
[[inputs.snmp.field]]
name = "InOctets"
oid = ".1.3.6.1.2.1.31.1.1.1.6.11615"
[[inputs.snmp.field]]
name = "OutOctets"
oid = ".1.3.6.1.2.1.31.1.1.1.10.11615"
[[inputs.snmp.field]]
name = "UnicastRx"
oid = ".1.3.6.1.2.1.2.2.1.11.11615"
[[inputs.snmp.field]]
name = "UnicastTx"
oid = ".1.3.6.1.2.1.2.2.1.17.11615"
I would prefer to have it always. Maybe disable BULK if user sets max_repetitions to 0 or 1 (because that will be the same anyway).
How are the fields in a table now being collected? Via snmpbulkwalk or via snmpwalk?
Fields in a table are collected via bulkwalk if available (version >= 2). But the issue under discussion is for non-table fields, which are collected via unary get. Converting to a bulk get shouldn't be too difficult. The relevant section of code is here: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/snmp/snmp.go#L456-L462
^ Needs to be converted so that instead of fetching the individual field within the range t.Fields loop, that it gathers all the OIDs from t.Fields, and fetches them all at once. Will require restructuring the logic a bit, but shouldn't be too difficult. (note that this won't be me, just offering guidance)
Great! Can someone pick this up? Can the issue be assigned to someone?
To summarize, let's repect the max_repetitions setting of the plugin and use it to "bulkget" the top level fields.
Would be helpful if someone from the Telegraf community could help on this one. @Hipska any chance you would be able to work on this?
Would love to, but I have no experience with Go.
@danielnelson did this ever get a resolution? i need to pull all port information at the same time
This has not been addressed yet, top level fields are grabbed one at a time using a GET PDU. You might be able to use a table with no oid set to get this behavior, like in example 2 of https://github.com/influxdata/telegraf/issues/3784#issuecomment-440855217.
This has not been addressed yet, top level fields are grabbed one at a time using a GET PDU. You might be able to use a table with no
oidset to get this behavior, like in example 2 of #3784 (comment).
Thanks figured this out earlier today.
I only needed a few fields so was trying to pull each individually which wasted hours.
In the end pulled the whole table and dropped fields I didn't need.
To be honest, it's not very efficient way of doing things. Be nice to get a resolution on this as I believe this to CORE functionality for switch monitoring.
When testing also came across the agent bug where defining all agents
Like so
Agents = [ ip, IP]
Fails if one agent fails. This is very bad! And i still think it's a bug.
Due to this I believe I'm going to have to duplicate multiple checks and v3 password data across configs.
It would be really nice if you allowed
Agents = []
But included an option to parrelize each agent. Making each agent created their own thread without having to duplicating config. Would make using your config so much easier.
Now going to have to create an ansible template to duplicate 40 checks and security information across 400 hundred hosts.
Just to workaround the issue and possiblity of one host dying.
Not trying to put you down it works great once you work out the quirks and oddities unfortunately theirs quite a few that are not documented well
Personally I would be focusing on these:
1) documentation real world + issues like multiple agents configs
2) pulling all data at once from single oids
Again believe this is core functionality for a monitoring solution. Companies want to compare data sets fro.the exact timeframe
Eg port 1 did this as did port 2
3) parrelize agents without duplicate configs.
Save hundreds of lines of code or save ansible templating.
allowing option to turn agent list into multiple without a line of code. It also make your configuration documents easier to write too.
[Input.snmp]
@smogsy In my experience, each agent is being processed parallel. If you believe this is not the case, please create new issue for it.
Implementing SNMP BULK GET should be having higher priority.
Most helpful comment
Thanks figured this out earlier today.
I only needed a few fields so was trying to pull each individually which wasted hours.
In the end pulled the whole table and dropped fields I didn't need.
To be honest, it's not very efficient way of doing things. Be nice to get a resolution on this as I believe this to CORE functionality for switch monitoring.
When testing also came across the agent bug where defining all agents
Like so
Agents = [ ip, IP]
Fails if one agent fails. This is very bad! And i still think it's a bug.
Due to this I believe I'm going to have to duplicate multiple checks and v3 password data across configs.
It would be really nice if you allowed
Agents = []
But included an option to parrelize each agent. Making each agent created their own thread without having to duplicating config. Would make using your config so much easier.
Now going to have to create an ansible template to duplicate 40 checks and security information across 400 hundred hosts.
Just to workaround the issue and possiblity of one host dying.
Not trying to put you down it works great once you work out the quirks and oddities unfortunately theirs quite a few that are not documented well
Personally I would be focusing on these:
1) documentation real world + issues like multiple agents configs
2) pulling all data at once from single oids
Again believe this is core functionality for a monitoring solution. Companies want to compare data sets fro.the exact timeframe
Eg port 1 did this as did port 2
3) parrelize agents without duplicate configs.
Save hundreds of lines of code or save ansible templating.
allowing option to turn agent list into multiple without a line of code. It also make your configuration documents easier to write too.
[Input.snmp]