Aws-sdk-java-v2: Metrics

Created on 3 Jul 2017 · 24Comments · Source: aws/aws-sdk-java-v2

Review the inherited state of V1 metrics and determine which changes are necessary for V2.

(Feel free to comment on this issue with desired changes).

1.11.x Parity feature-request

Source

millems

Most helpful comment

We recognize this feature to be a blocker for many customers to migrate from 1.x to 2.x. We still believe this feature to be important and intend to implement it.

millems on 15 Oct 2019

👍4

All 24 comments

I'm curious if this has gotten any more thought, is there a current recommendation?

For our use-cases, we mostly care about the request volume and latency. I found that I can get most of what I care about using the pluggable http client support. I'm creating a wrapper for an SdkHttpClient or SdkAsyncHttpClient that captures basic metrics about the requests. The main gap is that I don't see a good way to get the following information that was easily accessible from the V1 metrics collector:

Service name (e.g. EC2)
Request type (e.g., DescribeInstances)
AWS error code, we get the HTTP status code, but that doesn't always map as we would expect to the AWS error

For synchronous clients we have a hacky way to get the first two by looking at the stacktrace and seeing what the entry point to the SDK was. For async clients that is not viable. Is there a better way to access that type of information at the HTTP client layer?

More generally, is there a better integration point that could be used?

brharrington on 22 Feb 2019

@brharrington

For metrics, volume and latency the ExecutionInterceptor is probably the best integration point for a custom metrics implementation.

millems on 22 Feb 2019

Thanks for the pointer @millems . Looks like it solves most of my problems.

brharrington on 23 Feb 2019

@brharrington LMK if you have any questions.

millems on 23 Feb 2019

@millems What is the right way to log latencies for a request using the Interceptor API. One way could be to add the start timestamp in a beforeExecution call to the ExecutionAttributes instance and in the afterExecution method, get the start time from the attributes and calculate the elapsed time and log the latency.

Is that the recommended approach?

swaranga on 10 Sep 2019

@swaranga That's the recommended way to log the latency for the entire execution (which includes marshalling and retries), yes.

millems on 10 Sep 2019

@millems One problem with this approach is that if I want to execute a different interceptor per request so that I can tie the metrics logging with my parent unit of work I cannot do that since there is no way to add a interceptor for a request object.

This is useful for instance when I am serving a request for my callers and as part of that I need to call an AWS API, I would like to log metrics for that particular call along with other metrics for the incoming request to my service. I could probably achieve this by using hacky ThreadLocals but that makes the whole code brittle to refactoring.

This could be addressed by adding support for per request interceptors but assuming both the SDK level interceptors and the request level interceptors are executed for a given request, for metrics logging that may not be the most optimal API design and here is why:

Consider an an application that processes messages from Kinesis using the KCL library which are then transformed and written to a different Kinesis stream. In such a case, I will want to use the same SDK instance for both the read and write to/from Kinesis as recommended by SDK docs. And since I have no way of assigning a request level interceptor for the get records call from KCL (since the call is made directly from the KCL code), I will have to add an SDK level interceptor. At the same time, I will also want to add a request level interceptor for the part of my code that does the Kinesis write so that the Kinesis write call metrics are tied to my unit of work which is the processing of each message.

In such a setup, if both the interceptors are executed, I will have incorrect metrics logged (call counts, failures, avg latency etc). I thus think the interceptor API is too low level to be useful for flexible metrics logging and I ask that you reconsider the design you had for the V1 SDK that only executed one RequestMetricCollector even if multiple collectors were configured at different levels (Http Client, SDK, Request instance - in the order of preference).

A dedicated API is definitely needed, purpose-built for metrics logging and should be addressed before the V2 SDK become more ubiquitous.

swaranga on 10 Sep 2019

We agree that the interceptor API isn't optimal for someone just caring about metrics. There will be a separate interface for metrics. Here's our latest thoughts: https://github.com/aws/aws-sdk-java-v2/pull/1304 https://github.com/aws/aws-sdk-java-v2/pull/1362

millems on 13 Sep 2019

Thank you, I have added my comments on the design PR

swaranga on 15 Sep 2019

An incremental implementation that at least adds the Java JVM Memory, JVM Threads and OS File Descriptors would be beneficial while the interfaces to the AWS specific metrics are worked out.

jeffsteinmetz on 11 Oct 2019

@jeffsteinmetz Why should the SDK provide metrics on your application's JVM memory usage, file descriptors and threads? Do you mean memory, file descriptors and threads used by the SDK itself? If so, threads might be doable but I am not sure about memory usage. For file descriptors, may be it can provide number of open Http connections from the SDK but beyond that I am not very sure this should be in scope.

swaranga on 12 Oct 2019

@swaranga, feature parity with the 1.x Java SDK to provide JVM and Machine Metrics. We use this in the 1.x java sdk,

see
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/metrics/package-summary.html

and

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/generating-sdk-metrics.html

Machine Metrics

Memory

TotalMemory - Total amount of memory currently available to the JVM for current and future objects, measured in bytes. This value may vary over time, depending on the host environment.
FreeMemory - An approximation to the total amount of memory currently available to the JVM for future allocated objects, measured in bytes.
UsedMemory - TotalMemory minus FreeMemory.
SpareMemory - The maximum amount of memory that the JVM will attempt to use, measured in bytes, minus UsedMemory.
Threads

ThreadCount - The current number of live threads including both daemon and non-daemon threads.
DeadLockThreadCount - The number of threads that are deadlocked waiting for object monitors or ownable synchronizers, if any. Threads are deadlocked in a cycle waiting for a lock of these two types if each thread owns one lock while trying to acquire another lock already held by another thread in the cycle. No metrics is generated when the value is zero.
DaemonThreadCount - The current number of live daemon threads. No metrics is generated when the value is zero.
PeakThreadCount - The peak live thread count since the JVM started or since the peak was reset.
TotalStartedThreadCount - The total number of threads created and also started since the JVM started.
File Descriptors

OpenFileDescriptorCount - Number of opened file descriptors of the operating system.
SpareFileDescriptorCount - Maximum number of file descriptors of the operating system minus OpenFileDescriptorCount.

and from the 1.x docs

The AWS SDK for Java can generate metrics for visualization and monitoring with CloudWatch that measure:

your application’s performance when accessing AWS
the performance of your JVMs when used with AWS
runtime environment details such as heap memory, number of threads, and opened file descriptors

jeffsteinmetz on 12 Oct 2019

+1, I could see it being nice to get JVM metrics into cloudwatch. It's more of a "Java on AWS" feature than a "Java SDK" feature, but it was included in the Java SDK 1.11.x, so it makes sense as a feature request.

millems on 14 Oct 2019

Understood. Although there seems to be no new development on the related PR's. Is this still a priority?

swaranga on 15 Oct 2019

We recognize this feature to be a blocker for many customers to migrate from 1.x to 2.x. We still believe this feature to be important and intend to implement it.

millems on 15 Oct 2019

👍4

Feature request from v1: Allow a user to add a JVM shutdown hook to drain metrics queue - https://github.com/aws/aws-sdk-java/issues/1296

debora-ito on 31 Dec 2019

Hello! Do you have any ETA for this? Thanks.

gigiigig on 15 May 2020

@gigiigig Sorry, we unfortunately can't provide ETAs, but we will keep this issue up to date as we progress.

millems on 15 May 2020

Hi all,

We're happy to announce that the preview of the client-side metrics feature is now released! You can try it out starting with version 2.13.52. Please note if you want to use the cloudwatch-metric-publisher, the version to use is 2.13.52-PREVIEW. For more information please take a look at our announcement blog post.

dagnir on 8 Jul 2020

From the blog post, it seems like the metrics publisher is configured at the SDK level. Can I also attach one at the per request level? The V1 SDK had that option to override it at the per request level that allowed us to publish the metrics for the request along with the current unit of work.

Is that feasible?

swaranga on 8 Jul 2020

@swaranga Yep, absolutely. You can set publishers at the request level as well using the RequestOverrideConfiguration

GetObjectRequest.builder()
                ...
                .overrideConfiguration(o -> o.addMetricPublisher(myPublisher))
                .build();

dagnir on 8 Jul 2020

👍2

Is there any plan to provide a Micrometer based implementation for the metric publisher?

gmariotti on 8 Jul 2020

Hi @gmariotti we don't have any plans currently for a Micrometer implementation.

dagnir on 9 Jul 2020

👍1

Hi everyone, the Metrics feature is now GA so I'm going to ahead and close this issue. Please check it out and feel free to open an Issue for any problems or feedback you have. https://aws.amazon.com/blogs/developer/client-side-metrics-for-the-aws-sdk-for-java-v2-is-now-generally-available/

dagnir on 2 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings