I was trying to analyze memory usage of my application when I noticed the following pprof output:
list RawRequestWithContext
Total: 180.79MB
ROUTINE ======================== github.com/hashicorp/vault/api.(*Client).RawRequestWithContext in github.com/hashicorp/vault/api/client.go
1.12MB 42.63MB (flat, cum) 23.58% of Total
. . 795: }
. . 796:
. . 797: var result *Response
. . 798: resp, err := client.Do(req)
. . 799: if resp != nil {
. 4.45MB 800: result = &Response{Response: resp}
. . 801: }
. . 802: if err != nil {
. . 803: if strings.Contains(err.Error(), "tls: oversized") {
. . 804: err = errwrap.Wrapf(
. . 805: "{{err}}\n\n"+
. . 806: "This error usually means that the server is running with TLS disabled\n"+
. . 807: "but the client is configured to use TLS. Please either enable TLS\n"+
. . 808: "on the server or run the client with -address set to an address\n"+
. . 809: "that uses the http protocol:\n\n"+
. . 810: " vault <command> -address http://<address>\n\n"+
. . 811: "You can also set the VAULT_ADDR environment variable:\n\n\n"+
. . 812: " VAULT_ADDR=http://<address> vault <command>\n\n"+
. . 813: "where <address> is replaced by the actual address to the server.",
. . 814: err)
. . 815: }
. . 816: return result, err
. . 817: }
. . 818:
. . 819: // Check for a redirect, only allowing for a single redirect
. . 820: if (resp.StatusCode == 301 || resp.StatusCode == 302 || resp.StatusCode == 307) && redirectCount == 0 {
. 4.32MB 821: // Parse the updated location
. . 822: respLoc, err := resp.Location()
. 3.43MB 823: if err != nil {
. . 824: return result, err
. . 825: }
. . 826:
. . 827: // Ensure a protocol downgrade doesn't happen
. . 828: if req.URL.Scheme == "https" && respLoc.Scheme != "https" {
. . 829: return result, fmt.Errorf("redirect would cause protocol downgrade")
. . 830: }
. . 831:
. . 832: // Update the request
. . 833: r.URL = respLoc
. . 834:
. . 835: // Reset the request body if any
. . 836: if err := r.ResetJSONBody(); err != nil {
. . 837: return result, err
. . 838: }
. . 839:
1.12MB 1.12MB 840: // Retry the request
. . 841: redirectCount++
. . 842: goto START
. . 843: }
. 29.31MB 844:
. . 845: if err := result.Error(); err != nil {
. . 846: return result, err
. . 847: }
. . 848:
. . 849: return result, nil
I am trying to understand why 29.31MB was accumulated at line 844. My hypothesis is that at line 846 https://github.com/hashicorp/vault/blob/master/api/client.go#L846 in function RawRequestWithContext a response object is created whose body is to be closed by the caller of this function.
However consider the case of a redirect where at line 890 the statement goto START is executed. The line 846 will be executed again creating another response object. The previously created response object's body is never closed causing a connection and memory leak for every redirect.
Hi! Thanks for the report! Can you provide further steps to reproduce this? Information such as what Vault version you're on, what API calls you're making, and any further information that might help us reproduce this locally on our own machines. Also, are you seeing ongoing growth in memory usage, or does memory tend to stabilize like this?
I am observing the same memory leak on my Vault 1.4.0 server running on Kubernetes.
You can see the memory usage graph below:

It might be useful to note that this replica became the active server some time on April 22.
I have a bunch of secret engines mounted:
And the following auth mounts:
I am also having Prometheus constantly scrape Vault server for metrics.
Is there anything more I can do to provide more information?
The pod will probably OOM soon and be killed by Kubernetes.
@lawliet89 You can run Vault debug (https://www.vaultproject.io/docs/commands/debug) which will, among other things, capture heap usage stats. We can help analyze these logs (they may be emailed or otherwise shared outside of GitHub).
Thanks @kalafut
May I know who I should email the logs to?
@lawliet89 You can send them to me (email is in Vault's commits).
@lawliet89 Thanks for sending those. There is almost 1GB in use by the write buffer for GCS storage (in a function in the GCS sdk). I'm not familiar enough with the GCS storage details to know why, however. I do wonder what effect (if any) https://www.vaultproject.io/docs/configuration/storage/google-cloud-storage#chunk_size would have.

@kalafut
I haven't really looked into it. but maybe this can be fixed with a bump to the Google Cloud SDK.
An update was recently reverted in https://github.com/hashicorp/vault/commit/ea1564b3badd27f4e02cfd870d9759be903b91de due to https://github.com/etcd-io/etcd/issues/11154 of which the fix should be released shortly.
I will try halving the chunk size to see if it makes any difference.
FWIW, I'm in the process of migrating to Raft storage. This might not be a fix for future readers, however.

Maybe this is too early to say, but halving the chunk size seems to have fixed the memory leak issue. I will report back if this is not the case next week.
EDIT: One whole week of data. Looks good.

Thanks, that is a useful datapoint.
I may also be seeing similar memory usage growth, though we've been running into memory exhaustion issues generally that make the graphs not super clear. Though I am on version 1.1.0.
@lawliet89 I assume you halved it to 4096?
@sidewinder12s Yes.

https://github.com/hashicorp/vault/blob/master/api/logical.go#L132
experienced the same issue and it was created by the write api calls. read didn't make this.
For other folks, I suspect this issue may be compounding any performance issues I'm having with GCS: https://github.com/hashicorp/vault/issues/8761
I've got 1000s of new nodes doing GCP logins daily that probably are not helping my GCS performance.
@kalafut I'm investigating a somewhat related memory leak and noticed the following using vault 1.5.4:
(pprof) list wrapGenericHandler
Total: 30.41GB
ROUTINE ======================== github.com/hashicorp/vault/http.wrapGenericHandler.func1 in vault/http/handler.go
27MB 9.16GB (flat, cum) 30.12% of Total
. . 267: maxRequestDuration = props.ListenerConfig.MaxRequestDuration
. . 268: maxRequestSize = props.ListenerConfig.MaxRequestSize
. . 269: }
. . 270: if maxRequestDuration == 0 {
. . 271: maxRequestDuration = vault.DefaultMaxRequestDuration
. 396.62MB 272: }
. . 273: if maxRequestSize == 0 {
. . 274: maxRequestSize = DefaultMaxRequestSize
. . 275: }
. . 276: return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
. . 277: // Set the Cache-Control header for all the responses returned
. . 278: // by Vault
. . 279: w.Header().Set("Cache-Control", "no-store")
. . 280:
. 184.52MB 281: // Start with the request context
. . 282: ctx := r.Context()
. . 283: var cancelFunc context.CancelFunc
. . 284: // Add our timeout, but not for the monitor endpoint, as it's streaming
8MB 50MB 285: if strings.HasSuffix(r.URL.Path, "sys/monitor") {
. . 286: ctx, cancelFunc = context.WithCancel(ctx)
19MB 60MB 287: } else {
. 512.12kB 288: ctx, cancelFunc = context.WithTimeout(ctx, maxRequestDuration)
. 407.57MB 289: }
. . 290: // Add a size limiter if desired
. . 291: if maxRequestSize > 0 {
. . 292: ctx = context.WithValue(ctx, "max_request_size", maxRequestSize)
. . 293: }
. . 294: ctx = context.WithValue(ctx, "original_request_path", r.URL.Path)
. . 295: r = r.WithContext(ctx)
. . 296: r = r.WithContext(namespace.ContextWithNamespace(r.Context(), namespace.RootNamespace))
. . 297:
. . 298: switch {
. . 299: case strings.HasPrefix(r.URL.Path, "/v1/"):
. . 300: newR, status := adjustRequest(core, r)
. . 301: if status != 0 {
. . 302: respondError(w, status, nil)
. . 303: cancelFunc()
. . 304: return
. . 305: }
. . 306: r = newR
. . 307:
. 8.09GB 308: case strings.HasPrefix(r.URL.Path, "/ui"), r.URL.Path == "/robots.txt", r.URL.Path == "/":
. . 309: default:
. . 310: respondError(w, http.StatusNotFound, nil)
. . 311: cancelFunc()
. . 312: return
. . 313: }
Seems like the string matching is very expensive here
. 184.52MB 281: // Start with the request context
The line references on mem usage like the one above seem to be off, can you do pprof with the -source_path flag and point it to the local vault repository location with the commit for 1.5.4 checked out (should be 1a730771ec70149293efe91e1d283b10d255c6d1)?
@calvn my bad, I was on master.
Total: 30.41GB
ROUTINE ======================== github.com/hashicorp/vault/http.wrapGenericHandler.func1 in vault/http/handler.go
27MB 9.16GB (flat, cum) 30.12% of Total
. . 267: maxRequestSize = DefaultMaxRequestSize
. . 268: }
. . 269: return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
. . 270: // Set the Cache-Control header for all the responses returned
. . 271: // by Vault
. 396.62MB 272: w.Header().Set("Cache-Control", "no-store")
. . 273:
. . 274: // Start with the request context
. . 275: ctx := r.Context()
. . 276: var cancelFunc context.CancelFunc
. . 277: // Add our timeout, but not for the monitor endpoint, as it's streaming
. . 278: if strings.HasSuffix(r.URL.Path, "sys/monitor") {
. . 279: ctx, cancelFunc = context.WithCancel(ctx)
. . 280: } else {
. 184.52MB 281: ctx, cancelFunc = context.WithTimeout(ctx, maxRequestDuration)
. . 282: }
. . 283: // Add a size limiter if desired
. . 284: if maxRequestSize > 0 {
8MB 50MB 285: ctx = context.WithValue(ctx, "max_request_size", maxRequestSize)
. . 286: }
19MB 60MB 287: ctx = context.WithValue(ctx, "original_request_path", r.URL.Path)
. 512.12kB 288: r = r.WithContext(ctx)
. 407.57MB 289: r = r.WithContext(namespace.ContextWithNamespace(r.Context(), namespace.RootNamespace))
. . 290:
. . 291: switch {
. . 292: case strings.HasPrefix(r.URL.Path, "/v1/"):
. . 293: newR, status := adjustRequest(core, r)
. . 294: if status != 0 {
. . 295: respondError(w, status, nil)
. . 296: cancelFunc()
. . 297: return
. . 298: }
. . 299: r = newR
. . 300:
. . 301: case strings.HasPrefix(r.URL.Path, "/ui"), r.URL.Path == "/robots.txt", r.URL.Path == "/":
. . 302: default:
. . 303: respondError(w, http.StatusNotFound, nil)
. . 304: cancelFunc()
. . 305: return
. . 306: }
. . 307:
. 8.09GB 308: h.ServeHTTP(w, r)
. . 309:
. . 310: cancelFunc()
. . 311: return
. . 312: })
. . 313:}
I'm happy to provide more debugging context and info on the issue, can I email you directly?
@elsesiy sure thing! You can send it to calvin{at}hashicorp{dot}com.