Aws-sdk-net: Memory leak in AmazonKinesisClient with .NET Core 3.1.2 on macOS 10.15 Catalina

Created on 2 Jun 2020  路  5Comments  路  Source: aws/aws-sdk-net

Expected Behavior



No memory leak when trying to send data using the AmazonKinesisClient.

Current Behavior






When attempting to send record(s) to a Kinesis Data Stream using the AmazonKinesisClient class, we experience progressively greater memory leakage over time. The leaks seem to be caused mainly by an SSL handshake:

STACK OF 1 INSTANCE OF 'ROOT LEAK: <SecTrust>':
42  libsystem_pthread.dylib            0x7fff6b1cbb8b thread_start + 15
41  libsystem_pthread.dylib            0x7fff6b1d0109 _pthread_start + 148
40  libcoreclr.dylib                      0x1010c12a4 CorUnix::CPalThread::ThreadEntry(void*) + 436
39  libcoreclr.dylib                      0x10126db8f ThreadpoolMgr::WorkerThreadStart(void*) + 1311
38  libcoreclr.dylib                      0x101240154 ManagedPerAppDomainTPCount::DispatchWorkItem(bool*, bool*) + 276
37  libcoreclr.dylib                      0x101249b20 ManagedThreadBase::ThreadPool(void (*)(void*), void*) + 32
36  libcoreclr.dylib                      0x101249503 ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 323
35  libcoreclr.dylib                      0x1012a3538 QueueUserWorkItemManagedCallback(void*) + 184
34  libcoreclr.dylib                      0x101288639 MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 1657
33  libcoreclr.dylib                      0x10143c8fb CallDescrWorkerInternal + 124
32  ???                                   0x109384766 0x7fffffffffffffff + 9223372041304426343
31  ???                                   0x10924ed8d 0x7fffffffffffffff + 9223372041303158158
30  ???                                   0x109ee4798 0x7fffffffffffffff + 9223372041316353945
29  ???                                   0x109ee629a 0x7fffffffffffffff + 9223372041316360859
28  ???                                   0x10937930d 0x7fffffffffffffff + 9223372041304380174
27  ???                                   0x10938423e 0x7fffffffffffffff + 9223372041304425023
26  ???                                   0x108f57eed 0x7fffffffffffffff + 9223372041300049646
25  ???                                   0x107ef0400 0x7fffffffffffffff + 9223372041282847745
24  ???                                   0x109397586 0x7fffffffffffffff + 9223372041304503687
23  ???                                   0x10939905c 0x7fffffffffffffff + 9223372041304510557
22  ???                                   0x1093992bb 0x7fffffffffffffff + 9223372041304511164
21  ???                                   0x107ef0400 0x7fffffffffffffff + 9223372041282847745
20  ???                                   0x109397586 0x7fffffffffffffff + 9223372041304503687
19  ???                                   0x10939905c 0x7fffffffffffffff + 9223372041304510557
18  ???                                   0x1093992bb 0x7fffffffffffffff + 9223372041304511164
17  ???                                   0x107ef0400 0x7fffffffffffffff + 9223372041282847745
16  ???                                   0x109397452 0x7fffffffffffffff + 9223372041304503379
15  ???                                   0x109397631 0x7fffffffffffffff + 9223372041304503858
14  ???                                   0x109397af5 0x7fffffffffffffff + 9223372041304505078
13  ???                                   0x10939894e 0x7fffffffffffffff + 9223372041304508751
12  ???                                   0x109398ac2 0x7fffffffffffffff + 9223372041304509123
11  ???                                   0x10934c526 0x7fffffffffffffff + 9223372041304196391
10  System.Security.Cryptography.Native.Apple.dylib        0x101e816ee 0x101e7d000 + 18158
9   com.apple.security                 0x7fff3da8c909 SSLHandshake + 185
8   com.apple.security                 0x7fff3da8ca29 SSLHandshakeProceed + 185
7   libcoretls.dylib                   0x7fff68609999 tls_handshake_process + 85
6   libcoretls.dylib                   0x7fff6860a0eb SSLProcessHandshakeRecordInner + 219
5   com.apple.security                 0x7fff3dcacdac tls_verify_peer_cert + 71
4   com.apple.security                 0x7fff3dcaccff sslCreateSecTrust + 47
3   libcoretls_cfhelpers.dylib         0x7fff6861c23e tls_helper_create_peer_trust + 222
2   com.apple.security                 0x7fff3da5ff43 SecTrustCreateWithCertificates + 918
1   com.apple.CoreFoundation           0x7fff310e9663 _CFRuntimeCreateInstance + 597
0   libsystem_malloc.dylib             0x7fff6b181d9e malloc_zone_malloc + 140 

and a thread start:

STACK OF 10 INSTANCES OF 'ROOT LEAK: malloc<144>':
7   libsystem_pthread.dylib            0x7fff6b1cbb8b thread_start + 15
6   libsystem_pthread.dylib            0x7fff6b1d0109 _pthread_start + 148
5   libcoreclr.dylib                      0x1010c12a4 CorUnix::CPalThread::ThreadEntry(void*) + 436
4   libcoreclr.dylib                      0x10126fcc6 ThreadpoolMgr::GateThreadStart(void*) + 118
3   libcoreclr.dylib                      0x1012e53a6 EETlsSetValue(unsigned int, void*) + 22
2   libcoreclr.dylib                      0x1011938cd CExecutionEngine::CheckThreadState(unsigned int, int) + 61
1   libcoreclr.dylib                      0x10109a698 HeapAlloc + 40
0   libsystem_malloc.dylib             0x7fff6b181d9e malloc_zone_malloc + 140 

Please note that a successful connection to AWS is not necessary for the memory leak to happen. I was able to reproduce the leak using invalid AWS credentials and a nonexistent stream name, indicating the issue is not dependent upon successful sending of data to the endpoint.

Possible Solution



Not sure - we would be open to a temporary mitigation option as well to reduce the immediate customer impact.

Steps to Reproduce (for bugs)





Here is a link to a repo containing a min-reproducible example for this bug. It creates a new AmazonKinesisClient and attempts to send a record to it every 5 seconds. Detailed reproduction steps are included in the repo's README.md file.

Context



My team develops a .NET Core application that streams data to AWS endpoints. This particular use case involves log streaming from devices to a Kinesis Data Stream. The memory leak severely affects some machines, increasing memory usage of the app to >1GB at times. The memory usage does not go down till the app is manually restarted, and then resurfaces within 1-2 days.

Your Environment

  • AWSSDK.Kinesis version used: 3.3.100.115
  • Operating System and version: macOS 10.15.5 Catalina
  • Targeted .NET platform: netcoreapp3.1

.NET Core Info

  • .NET Core version used for development: 3.1.300
  • .NET Core version installed in the environment where application runs: 3.1.300
  • Output of dotnet --info:
.NET Core SDK (reflecting any global.json):
 Version:   3.1.300
 Commit:    b2475c1295

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.15
 OS Platform: Darwin
 RID:         osx.10.15-x64
 Base Path:   /usr/local/share/dotnet/sdk/3.1.300/

Host (useful for support):
  Version: 3.1.4
  Commit:  0c2e69caa6

.NET Core SDKs installed:
  2.1.804 [/usr/local/share/dotnet/sdk]
  3.1.300 [/usr/local/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.16 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.16 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.4 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.16 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.18 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.4 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  • Contents of project.json/project.csproj:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <RootNamespace>kinesis_dotnet_macos_memoryleak</RootNamespace>
    <AssemblyName>kinesis_dotnet_macos_memoryleak</AssemblyName>
    <LangVersion>Latest</LangVersion>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="AWSSDK.Kinesis" Version="3.3.100.115" />
  </ItemGroup>

</Project>
A bug investigating modulsdk-generated queued

All 5 comments

After some more investigation we were able to trace the issue back to the AmazonServiceClient class that all the individual classes (AmazonKinesisClient, etc.) inherit from. The AmazonServiceClient uses the HttpWebRequest class to send data to AWS. However, looking at the docs for the class, Microsoft does not recommend using it for development, instead recommending the HttpClient class.

I was able to reproduce the memory leak with a sample program that just created a new HttpWebRequest every 5 seconds to http://www.contoso.com/ and got the response. However, when making this same request using a single HttpClient, I wasn't able to see the memory leak.

Would it be possible to update the SDK to use the HttpClient class instead?

Hey @rharpavat,

Thank you for bringing this to our attention! We are looking into the issue now and should be able to provide more concrete information on Monday. It does appear to be an issue with HttpWebRequest and we are looking into the possibility of using HttpClient instead.

馃樃 馃樂

@NGL321 It does appear that, for NETSTANDARD targets, the SDK should be using HttpClient, no? (see https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Core/Amazon.Runtime/AmazonServiceClient.cs#L472)
Could you confirm that the HttpRequestMessageFactory in the request pipeline is supposed to use HttpClient ?

Hi, just starting to look into this but to be clear the AWS .NET SDK uses HttpClient when targeting .NET Core. For .NET Framework the SDK uses HttpWebRequest which follows the guidance Microsoft gives for these classes.

@normj Were you able to reproduce the issue/investigate further? I did some more testing and it looks like the memory leak only happens when the request to AWS fails (due to invalid credentials/network error/something else). I have an issue open with Microsoft here where there's more detail on our findings. Would really appreciate any update on further investigation. Thanks!

Was this page helpful?
0 / 5 - 0 ratings