Beats: [Metricbeat] Adding AWS Cloudwatch Metricset

Created on 10 Apr 2019 · 10Comments · Source: elastic/beats

Amazon CloudWatch monitors AWS resources and applications that run on AWS. A lot of services (eg: EC2, RDS, SQS...) sends monitoring metrics to Cloudwatch periodically for users to monitor/determine the health and performance of their resources. We are in the process of adding metricsets for services but since there are so many, it will be good to have a "free-form" cloudwatch metricset to get monitoring metrics from a user-defined service even if this service is not supported by a separate metricset yet.

Basic idea for cloudwatch metricset is to read from config file aws.yml for period, start-time, end-time, and metric-data-queries parameters and then pass them into get-metric-data cloudwatch api, similar to use https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-data.html.
period is how frequent this cloudwatch metricset will start a new collection cycle.
start-time is the timestamp indicating the earliest data to be returned.
end-time is the time stamp indicating the latest data to be returned.
metric-data-queries is the location of the json file that contains what are the metric queries to be returned. For example:

[
    {
        "Id": "e1",
        "Expression": "m1 / m2",
        "Label": "ErrorRate"
    },
    {
        "Id": "m1",
        "MetricStat": {
            "Metric": {
                "Namespace": "MyApplication",
                "MetricName": "Errors",
                "Dimensions": [
                    {
                        "Name": "FunctionName",
                        "Value": "MyFunc"
                    }
                ]
            },
            "Period": 300,
            "Stat": "Sum",
            "Unit": "Count"
        },
        "ReturnData": false
    },
    {
        "Id": "m2",
        "MetricStat": {
            "Metric": {
                "Namespace": "MyApplication",
                "MetricName": "Invocations",
                "Dimensions": [
                    {
                        "Name": "FunctionName",
                        "Value": "MyFunc"
                    }
                ]
            },
            "Period": 300,
            "Stat": "Sum",
            "Unit": "Count"
        },
        "ReturnData": false
    }
]

After this query succeed, some process needs to be added to convert get-metric-data output to events that eventually pushed into ES. There might be some schema/mapping needs to be read from aws.yml for this as well.

In the current existing metricsets, we are using list-metrics to get a list of metrics for a specific namespace. This list of metrics can be used as input to GetMetricData replacing metric-data-queries parameter.

Metricbeat Module / Dataset release checklist

This checklist is intended for Devs which create or update a module to make sure modules are consistent.

Modules

For a metricset to go GA, the following criteria should be met:

[ ] Supported versions are documented
[ ] Supported operating systems are documented (if applicable)
[ ] Integration tests exist
[ ] System tests exist
[ ] Automated checks that all fields are documented
[ ] Documentation
[ ] Fields follow ECS and naming conventions
[ ] Dashboards exists (if applicable)
[ ] Kibana Home Tutorial (if applicable)
- [ ] Open issue in EUI repo to add icon for module if not already exists.
- [ ] Open PR against Kibana repo with tutorial. Examples can be found here.

Metricbeat module

[ ] Example data.json exists and an automated way to generate it exists (go test -data)
[ ] Test environment in Docker exist for integration tests

Integrations

Source

kaiyan-sheng

👍2

Most helpful comment

I think this metricset would be really useful! I have a few questions:

Perhaps we can handle the time parameters (start-time, end-time and period) for the user? we already have a period in all metricsets. start-time and end-time would be adjusted to get the last point in each run?
Instead of using a file with the queries, it would be possible to put them in the config, as YAML (YAML is a superset of JSON). ie:

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: MyApplication
      metricname: Invocations
      dimensions:
        - name: FunctionName
          value: MyFunc

exekias on 10 Apr 2019

👍2

All 10 comments

@roncohen and @exekias Please feel free to comment on this 😄 Thanks for the great idea.

kaiyan-sheng on 10 Apr 2019

I think this metricset would be really useful! I have a few questions:

Perhaps we can handle the time parameters (start-time, end-time and period) for the user? we already have a period in all metricsets. start-time and end-time would be adjusted to get the last point in each run?
Instead of using a file with the queries, it would be possible to put them in the config, as YAML (YAML is a superset of JSON). ie:

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: MyApplication
      metricname: Invocations
      dimensions:
        - name: FunctionName
          value: MyFunc

exekias on 10 Apr 2019

👍2

@exekias Yeah end-time is probably not necessary at all. If we support start-time, then user can give a start-time that can be in the past. So when the first collection starts, cloudwatch metricset can pick up all the metrics from start-time to now first. But that might introduce some potential performance issues. For example, if the start-time is too old, then there might be too many metrics for the first collection.

Good suggestion on the config 👍

kaiyan-sheng on 10 Apr 2019

If we use list-metrics to get all metrics that are available for GetMetricData, it will save users a lot of time/effort to generate the config but maybe won't be has handy if users have specific metrics to query in mind.
list-metrics --namespace "AWS/SNS" output:

{
    "Metrics": [
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "PublishSize"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "PublishSize"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfNotificationsFailed"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfNotificationsDelivered"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfMessagesPublished"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfMessagesPublished"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfNotificationsDelivered"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfNotificationsFailed"
        }
    ]
}

kaiyan-sheng on 10 Apr 2019

It sounds to me that some people will be interested in catching all metrics from a namespace, so we could allow for full namespace retrieval when no metricname is defined, I would expect this to be more common than cherry picking just a few metrics.

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: AWS/SNS

exekias on 11 Apr 2019

@exekias Yep, I agree. It will be a much easier setup for users (when extra api query/data transfer cost is not a problem).

kaiyan-sheng on 11 Apr 2019

great to get this effort started @kaiyan-sheng!

Some ideas:

reg. start time: could we query for the metric in ES to get the most recent datapoint and continue from there? We could default to pulling in the last 1h if there's no existing data for a metric. metricbeat might get restarted and it would be great to continue from where it got to.
could we have a list of default periods for each namespace? e.g. S3 looks like it should default to 1 day. Should be overridable in the config.

roncohen on 15 Apr 2019

Good point @roncohen! For your second idea, I ran into this problem in https://github.com/elastic/beats/pull/11798 just now. For each namespace, the collection period should be different.
Currently the configuration looks like:

- module: aws
  period: 86400s
  metricsets:
    - "cloudwatch"
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'
  default_region: '${AWS_REGION:us-west-1}'
  cloudwatch_metrics:
    - namespace: AWS/S3

First question is: if we should support cloudwatch metricset to config/collect from more than one namespace.
Then the configuration will look like:

- module: aws
  period: 300s
  metricsets:
    - "cloudwatch"
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'
  default_region: '${AWS_REGION:us-west-1}'
  cloudwatch_metrics:
    - namespace: AWS/EC2
    - namespace: AWS/S3
    - namespace: AWS/SQS

Second question is: if we support multiple namespaces like the config above, then we need to have default period for each namespace hardcoded in the code. For example, for AWS/EC2 it will be 300s, for AWS/S3 will be 86400s and for AWS/SQS will be 300s.

We can introduce a separate period config for each namespace but I feel that just make everything confusing. cloudwatch as a metricset should only have one period, which is the frequency that this metricset will run. @exekias What's your thought on this?

kaiyan-sheng on 15 Apr 2019

I think supporting multiple namespaces is useful in many cases, specially when the period is the same. As period is a default setting, I wouldn't introduce a new config key for this. Users can always configure several instances of the metricset when they require different periods per namespace.

exekias on 16 Apr 2019

👍1

btw, for readability, period should accept other units, so something like 24h should work

exekias on 16 Apr 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings