We have recently replaced azure-keyvault SDK with @azure/keyvault-keys:4.1.0 SDK. We have found when calling the sign function at a rate of 5 requests per second we are seeing a gradual memory increase that we didn't experience with the older SDK. The code that we are calling is as which follows your example:
'use strict';
const { CryptographyClient } = require('@azure/keyvault-keys');
const { DefaultAzureCredential } = require('@azure/identity');
const credentials = new DefaultAzureCredential();
const SIGNING_ALGORITHM = 'RS256';
async function sign(signData, keyId) {
try {
const hash = createHash(signData);
const cryptographyClient = new CryptographyClient(keyId, credentials);
const ciphertext = await cryptographyClient.sign(SIGNING_ALGORITHM, hash);
return {
signedData: ciphertext.result.toString('base64'),
fullyQualifiedKeyIdUri: ciphertext.keyID,
};
} catch (err) {
throw err;
}
}
function createHash(data) {
return crypto.createHash(HASHING_ALGORITHM).update(data).digest();
}
The heap dump suggests that the TCP object is being retained.
Please could you help
Many Thanks
Emer
Thanks for reporting @emcveigh!
We will look into this first thing Monday (30th Nov) morning
@emcveigh Hello! I'll be looking at this today. Thank you for your patience.
I'm working on this. I'll have an answer by tomorrow.
So far I have been able to reproduce this and I'll be working with the team on a solution.
Edit: It was a false positive.
@emcveigh concrete proof of this has been eluding me so far, over how much time do you see this increase? what's the rate of increase? if you mind sharing some more of your numbers, that would be appreciated.
@emcveigh
some other information that might help us:
Hi @sadasant here is some more information that you requested
over how much time do you see this increase? what's the rate of increase?
with a throughput of 5.2/per second the pods (depending on memory) gets shot roughly after 40minutes
Are these requests per second in parallel? or sequential?
parallel when making the keyvault calls
From your location, how much time does one request take?
on average the requests takes 826ms where previously 635ms (previously we used adalnode)
@emcveigh thank you! I'll continue investigating tomorrow.
@emcveigh So far I'm using something very similar to this code to run the study:
import { createHash } from "crypto";
import { CryptographyClient, KeyClient } from "@azure/keyvault-keys";
import { DefaultAzureCredential } from "@azure/identity";
import { delay } from "@azure/core-http";
async function main() {
const signatureValue = "signature-value";
const hash = createHash("sha256");
hash.update(signatureValue);
const digest = hash.digest();
const max = 10000;
const credential = new DefaultAzureCredential();
const keyClient = new KeyClient("vault-url", credential);
const keyVaultKey = await client.createKey(keyName, "RSA");
for (let i = 0; i < max; i++) {
console.log(`${process.memoryUsage().heapUsed}\t${i}`);
const cryptoClient = new CryptographyClient(keyVaultKey.id!, credential);
await cryptoClient.sign("RS256", digest);
await delay(200); // 5 calls/s
}
}
main().catch(e => console.error(e));
Some questions for you, in case you are able to answer them before I come back with further results:
My code is not exactly that, so please let me know if you encounter any issue running this.
It will also be useful to know the specs of the environment you're running this with. CPU, RAM, etc. Including operative system, and if possible the version of the operative system.
@emcveigh Here's one scenario I'm testing:
Taking code similar to the one I shared above, I'm calling up to 5k consecutive sign requests, with a delay of 200 ms between each call. After 5k happen, I'm waiting a minute, then I'm repeating the 5k calls again without stopping the process.
Here's one of the execution results:
Heap before the start: 84095584
Heap after 5k sign calls: 66676880
Heap a minute at rest later: 67648984
After 5k more sign calls: 71296968
Heap a minute at rest later: 72268000
After 5k more sign calls: 69709744
Heap a minute at rest later: 70706464
Heap at the end: 65898440
I have tried a similar exercise on both Windows and Linux. I've both increased and reduced the delay. I have been unable to see the memory leak.
I will keep testing, and I'll talk with my team in case I'm missing something. Please share more information with us as soon as you can. Thank you for your time!
@emcveigh Another question: what's the size of the signData that you're using?
After speaking with my team, we've come up with several more ways to test this in different scenarios. I'll report once I finish with my tests!
@emcveigh We've been able to confirm one issue so far.
Creating a CryptographyClient several times per second does create an excessive number of open sockets that might not be able to be cleaned on time. This problem gets exacerbated depending on the limitations of the environment where the code is executed. We will be studying solutions to this problem.
For the purpose of running sign requests with a large number of Key IDs on a short period of time, we'll be discussing on how to change the public API of the CryptographyClient to allow users to change the Key ID without having to instantiate a new client. I'll be proposing something like the following:
// Concept of a possible change on our API
const cryptoClient = new CryptographyClient(/* params */);
cryptoClient.sign(algorithm, data, { keyId });
While we study solutions for this, and also what other issues might be happening, let's focus on the possible solutions that are available today.
There are some questions that we wanted to ask again, that will help us possibly find other bugs, and also there's one approach that you could do today that could ease your experience with our clients.
Questions for you:
signData? At least the maximum size you're using.One available solution:
Since the sockets problem is created by the instantiation of a CryptographyClient, this problem might be resolved if the CryptographyClients are cached based on the received Key ID. An example in code follows:
// Part of the proposal:
// A representation of a local cache for the CryptographyClients
let cryptoClients = {};
async function sign(signData, keyId) {
try {
const hash = createHash(signData);
// Here is the proposal:
// Create new cryptography clients only if the KeyIDs are new.
let cryptographyClient = cryptoClients[keyId];
if (!cryptographyClient) {
cryptographyClient = new CryptographyClient(keyId, credentials);
cryptoClients[keyId] = cryptographyClient;
}
const ciphertext = await cryptographyClient.sign(SIGNING_ALGORITHM, hash);
return {
signedData: ciphertext.result.toString('base64'),
fullyQualifiedKeyIdUri: ciphertext.keyID,
};
} catch (err) {
throw err;
}
}
Let me know if this helps. I'll be updating this issue as soon as I'm able to gather more information.
I just cleaned up my previous comment. Make sure to refresh or to load the web page if you received it through email. Thank you for your time!
Update:
For a fix that doesn't require code changes on your side, we're testing and studying the following changes in our HTTP pipelines: https://github.com/Azure/azure-sdk-for-js/pull/12966
We'll come up with a decision on January. Thank you for your time again!
Thank you @sadasant. We had already implemented your method of caching the client, but were concerned that this was not the way that it was intended to be used. We still see a slight increase in memory usage using this method but it is a very slow rise.
Answers to you questions:
Thank you @emcveigh , we'll continue investigating.
@emcveigh I'll get in touch with you on January to schedule a call. I would like to test some things together, and to see the issue more directly, if that's ok.
@emcveigh
Hello again! (Happy New Year 2021)!
I have some good news that could help you considerably. Would you mind trying the following and monitoring your system to see if the problem keeps appearing?
Steps:
package-lock.json and your node_modules folder (if they happen to be there where this gets executed).@azure/core-http to your dependencies on your package.json. On the version of the package, enter dev. It should look like: "@azure/core-http": "dev",.Once your dependencies are installed again, if you search for @azure/core-http in the package-lock.json that gets generated, you should see the version 1.2.1-alpha.20210104.2 (or the latest dev version available by that time).
Using that version of core-http, please let me know if you continue to see the issue.
If there's any other question, please let me know.
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!