Amazon CloudWatch monitors AWS resources and applications that run on AWS. A lot of services (eg: EC2, RDS, SQS...) sends monitoring metrics to Cloudwatch periodically for users to monitor/determine the health and performance of their resources. We are in the process of adding metricsets for services but since there are so many, it will be good to have a "free-form" cloudwatch metricset to get monitoring metrics from a user-defined service even if this service is not supported by a separate metricset yet.
Basic idea for cloudwatch metricset is to read from config file aws.yml for period, start-time, end-time, and metric-data-queries parameters and then pass them into get-metric-data cloudwatch api, similar to use https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-data.html.
period is how frequent this cloudwatch metricset will start a new collection cycle.
start-time is the timestamp indicating the earliest data to be returned.
end-time is the time stamp indicating the latest data to be returned.
metric-data-queries is the location of the json file that contains what are the metric queries to be returned. For example:
[
{
"Id": "e1",
"Expression": "m1 / m2",
"Label": "ErrorRate"
},
{
"Id": "m1",
"MetricStat": {
"Metric": {
"Namespace": "MyApplication",
"MetricName": "Errors",
"Dimensions": [
{
"Name": "FunctionName",
"Value": "MyFunc"
}
]
},
"Period": 300,
"Stat": "Sum",
"Unit": "Count"
},
"ReturnData": false
},
{
"Id": "m2",
"MetricStat": {
"Metric": {
"Namespace": "MyApplication",
"MetricName": "Invocations",
"Dimensions": [
{
"Name": "FunctionName",
"Value": "MyFunc"
}
]
},
"Period": 300,
"Stat": "Sum",
"Unit": "Count"
},
"ReturnData": false
}
]
After this query succeed, some process needs to be added to convert get-metric-data output to events that eventually pushed into ES. There might be some schema/mapping needs to be read from aws.yml for this as well.
In the current existing metricsets, we are using list-metrics to get a list of metrics for a specific namespace. This list of metrics can be used as input to GetMetricData replacing metric-data-queries parameter.
This checklist is intended for Devs which create or update a module to make sure modules are consistent.
For a metricset to go GA, the following criteria should be met:
data.json exists and an automated way to generate it exists (go test -data)@roncohen and @exekias Please feel free to comment on this 馃槃 Thanks for the great idea.
I think this metricset would be really useful! I have a few questions:
Perhaps we can handle the time parameters (start-time, end-time and period) for the user? we already have a period in all metricsets. start-time and end-time would be adjusted to get the last point in each run?
Instead of using a file with the queries, it would be possible to put them in the config, as YAML (YAML is a superset of JSON). ie:
- module: aws
metricsets: ["cloudwatch"]
cloudwatch.metrics:
- namespace: MyApplication
metricname: Invocations
dimensions:
- name: FunctionName
value: MyFunc
@exekias Yeah end-time is probably not necessary at all. If we support start-time, then user can give a start-time that can be in the past. So when the first collection starts, cloudwatch metricset can pick up all the metrics from start-time to now first. But that might introduce some potential performance issues. For example, if the start-time is too old, then there might be too many metrics for the first collection.
Good suggestion on the config 馃憤
If we use list-metrics to get all metrics that are available for GetMetricData, it will save users a lot of time/effort to generate the config but maybe won't be has handy if users have specific metrics to query in mind.
list-metrics --namespace "AWS/SNS" output:
{
"Metrics": [
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "NotifyMe"
}
],
"MetricName": "PublishSize"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "CFO"
}
],
"MetricName": "PublishSize"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "NotifyMe"
}
],
"MetricName": "NumberOfNotificationsFailed"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "NotifyMe"
}
],
"MetricName": "NumberOfNotificationsDelivered"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "NotifyMe"
}
],
"MetricName": "NumberOfMessagesPublished"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "CFO"
}
],
"MetricName": "NumberOfMessagesPublished"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "CFO"
}
],
"MetricName": "NumberOfNotificationsDelivered"
},
{
"Namespace": "AWS/SNS",
"Dimensions": [
{
"Name": "TopicName",
"Value": "CFO"
}
],
"MetricName": "NumberOfNotificationsFailed"
}
]
}
It sounds to me that some people will be interested in catching all metrics from a namespace, so we could allow for full namespace retrieval when no metricname is defined, I would expect this to be more common than cherry picking just a few metrics.
- module: aws
metricsets: ["cloudwatch"]
cloudwatch.metrics:
- namespace: AWS/SNS
@exekias Yep, I agree. It will be a much easier setup for users (when extra api query/data transfer cost is not a problem).
great to get this effort started @kaiyan-sheng!
Some ideas:
Good point @roncohen! For your second idea, I ran into this problem in https://github.com/elastic/beats/pull/11798 just now. For each namespace, the collection period should be different.
Currently the configuration looks like:
- module: aws
period: 86400s
metricsets:
- "cloudwatch"
access_key_id: '${AWS_ACCESS_KEY_ID:""}'
secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
session_token: '${AWS_SESSION_TOKEN:""}'
default_region: '${AWS_REGION:us-west-1}'
cloudwatch_metrics:
- namespace: AWS/S3
First question is: if we should support cloudwatch metricset to config/collect from more than one namespace.
Then the configuration will look like:
- module: aws
period: 300s
metricsets:
- "cloudwatch"
access_key_id: '${AWS_ACCESS_KEY_ID:""}'
secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
session_token: '${AWS_SESSION_TOKEN:""}'
default_region: '${AWS_REGION:us-west-1}'
cloudwatch_metrics:
- namespace: AWS/EC2
- namespace: AWS/S3
- namespace: AWS/SQS
Second question is: if we support multiple namespaces like the config above, then we need to have default period for each namespace hardcoded in the code. For example, for AWS/EC2 it will be 300s, for AWS/S3 will be 86400s and for AWS/SQS will be 300s.
We can introduce a separate period config for each namespace but I feel that just make everything confusing. cloudwatch as a metricset should only have one period, which is the frequency that this metricset will run. @exekias What's your thought on this?
I think supporting multiple namespaces is useful in many cases, specially when the period is the same. As period is a default setting, I wouldn't introduce a new config key for this. Users can always configure several instances of the metricset when they require different periods per namespace.
btw, for readability, period should accept other units, so something like 24h should work
Most helpful comment
I think this metricset would be really useful! I have a few questions:
Perhaps we can handle the time parameters (
start-time,end-timeand period) for the user? we already have a period in all metricsets.start-timeandend-timewould be adjusted to get the last point in each run?Instead of using a file with the queries, it would be possible to put them in the config, as YAML (YAML is a superset of JSON). ie: