Go: Panic in crypto/tls/common.go

Created on 17 Oct 2018 · 18Comments · Source: golang/go

What version of Go are you using (`go version`)?

go version go1.9 linux/arm64

Does this issue reproduce with the latest release?

We believe so but cannot currently confirm as it is very difficult to reproduce under controlled conditions (see below).

What operating system and processor architecture are you using (`go env`)?

linux, arm64

What did you do?

We are running a proxy which performs MITM interception on behalf of the user and typically is servicing hundreds of requests in parallel. We manage the certificate store by inserting certificates in the NameToCertificate map. Maps are not concurrency safe, so if a TLS handshake is occurring while a certificate is being inserted into NameToCertificate, a panic results. We avoid this in our code by protecting NameToCertificate with a mutex, but this is not possible in the native Golang code.

This error occurs randomly and we see it about once a week but have not been able to reproduce yet.

What did you expect to see?

We would like for our program to continue running, even if the handshake crashes. This could potentially be done by adding a recover() to the code in crypto/common.go around line 737:

    if cert, ok := c.NameToCertificate[name]; ok {
            return cert, nil
    }

What did you see instead?

The program crashes. This occurs even with the initial TLS handshake() protected by a recover. Stack trace:

fatal error: concurrent map read and map write

goroutine 11597 [running]:
runtime.throw(0xb64eb7, 0x21)
/usr/local/go/src/runtime/panic.go:605 +0x70 fp=0x44204919c0 sp=0x44204919a0 pc=0x42a600
runtime.mapaccess2_faststr(0xa6a400, 0x44202c0c00, 0x4429418a80, 0x18, 0x0, 0x4)
/usr/local/go/src/runtime/hashmap_fast.go:324 +0x488 fp=0x4420491a20 sp=0x44204919c0 pc=0x40d248
crypto/tls.(Config).getCertificate(0x442023ec00, 0x4427736160, 0x2, 0x0, 0x0)
/usr/local/go/src/crypto/tls/common.go:737 +0xc4 fp=0x4420491aa0 sp=0x4420491a20 pc=0x5ab614
crypto/tls.(serverHandshakeState).readClientHello(0x4420491c38, 0x4420491c28, 0x1138740, 0x577a84)
/usr/local/go/src/crypto/tls/handshake_server.go:216 +0x370 fp=0x4420491bf0 sp=0x4420491aa0 pc=0x5bd2f0
crypto/tls.(Conn).serverHandshake(0x4428a0ce00, 0xb8ce98, 0x4428a0cf20)
/usr/local/go/src/crypto/tls/handshake_server.go:48 +0x68 fp=0x4420491d10 sp=0x4420491bf0 pc=0x5bcc88
crypto/tls.(Conn).Handshake(0x4428a0ce00, 0x0, 0x0)

FrozenDueToAge WaitingForInfo

Source

WinstonPrivacy

👎1

Most helpful comment

If you need crypto/tls to use your mutexes, the answer is to define the GetCertificate func:

https://golang.org/pkg/crypto/tls/#Config.GetCertificate

It is used before NameToCertificate and you can do any locking in there.

bradfitz on 18 Oct 2018

🎉1 👍1

All 18 comments

Are you building with a fork of the Go 1.9 source and a patch applied? Have you tried Go 1.10 or 1.11?

Also note the Config docs: https://golang.org/pkg/crypto/tls/#Config

A Config structure is used to configure a TLS client or server. After one
has been passed to a TLS function it must not be modified. A Config may be
reused; the tls package will also not modify it.

adamdecaf on 18 Oct 2018

👍2

Looks like you'll have to explicitly synchronise the insertion of a cert with the config object itself, rather than just the NameToCertificateMap:

maybe the following would solve it?

replace Config type in your go code with a wrapper

type myConfig struct {
    sync.Mutex
    tls.Config
}

have the consumers of that Lock()/defer Unlock() and have the inserts in the NameToCertificateMap do the same but use a new tls.Config.

var myConfig myConfig
myConfig.Config = *theNewConfig

I'm trying to say: maybe you can just push the mutex one level out in Go?

wsc1 on 18 Oct 2018

Hi @WinstonPrivacy,

I think the error message is clear enough. There are some concurrent map operations going on. Have you tried running your application in -race mode ? That is a very easy way to figure out where the issue is.

We avoid this in our code by protecting NameToCertificate with a mutex, but this is not possible in the native Golang code.

These 2 statements are contradicting each other. If you already protect access with a mutex, then how is it not possible in the native Go code ?

Also, please post the code so that we can reproduce the issue on our side. However, I think this is a simple map access issue which you can identify with the -race mode.

Thanks.

agnivade on 18 Oct 2018

Are you building with a fork of the Go 1.9 source and a patch applied? Have you tried Go 1.10 or 1.11?

I've succeeded in reproducing the problem in Go 1.9. I haven't tried other versions but I suspect it will be the same (need to confirm).

Also note the Config docs: https://golang.org/pkg/crypto/tls/#Config

A Config structure is used to configure a TLS client or server. After one
has been passed to a TLS function it must not be modified. A Config may be
reused; the tls package will also not modify it.

This is the problem right here. We're sending in a reference to the TLS Config. It looks like we should be copying it instead.

WinstonPrivacy on 18 Oct 2018

Looks like you'll have to explicitly synchronise the insertion of a cert with the config object itself, rather than just the NameToCertificateMap:

maybe the following would solve it?

replace Config type in your go code with a wrapper
type myConfig struct {
    sync.Mutex
    tls.Config
}
have the consumers of that Lock()/defer Unlock() and have the inserts in the NameToCertificateMap do the same but use a new tls.Config.
var myConfig myConfig
myConfig.Config = *theNewConfig
I'm trying to say: maybe you can just push the mutex one level out in Go?

This won't work because the original Golang code in tls/common.go does not use a lock. We are already locking in our code to avoid the panic but it crashes inside tls/common.go while doing the TLS handshake.

WinstonPrivacy on 18 Oct 2018

Hi @WinstonPrivacy,

I think the error message is clear enough. There are some concurrent map operations going on. Have you tried running your application in -race mode ? That is a very easy way to figure out where the issue is.

We avoid this in our code by protecting NameToCertificate with a mutex, but this is not possible in the native Golang code.

These 2 statements are contradicting each other. If you already protect access with a mutex, then how is it not possible in the native Go code ?

Also, please post the code so that we can reproduce the issue on our side. However, I think this is a simple map access issue which you can identify with the -race mode.

Thanks.

I suppose we could modify the Golang code in crypto/tls/common.go to expose a public lock. This would be complicated and terribly bad design, but it's not practical in any case because we don't want to maintain our own fork of Golang.

WinstonPrivacy on 18 Oct 2018

I now have working code which consistently reproduces the problem. I've pasted it into Go Playground but because it uses external libraries, you'll have to run it locally.

https://play.golang.org/p/RlKeIbk8H9h

WinstonPrivacy on 18 Oct 2018

The bottom of your code contains a private key. I'll take a read through, but it sounds like you should be copying the Config and not modifying it.

var CA_KEY = []byte(`-----BEGIN RSA PRIVATE KEY-----

adamdecaf on 18 Oct 2018

That's a throwaway key for unit testing only. It's never used for anything else.

WinstonPrivacy on 18 Oct 2018

I'm trying to say: maybe you can just push the mutex one level out in Go?

This won't work because the original Golang code in tls/common.go does not use a lock. We are already locking in our code to avoid the panic but it crashes inside tls/common.go while doing the TLS handshake.

From your example, as @adamdecaf noted, you're locks aren't locking the whole TestCaConfig object. I think it would work if you modified it so that every time a domain is inserted, a whole new tls.Config is created or copied from the old one.

To be more explicit, suppose we add a global variable theTlsConfig to your example, and an associated mutex theTlsMutex.

line 97 would have something like

TLSClientConfig: getTlsConfig()

where getTlsConfig is a function you define which returns the current TlsConfig with respect to all such configs that have been created (one for each insertion).

Then lines 73-75 would have to be replaced by something like

 setTlsConfig(domain, tlsc)

Then

func getTlsConfig() *tls.Config {
   theTlsMutex.Lock()
   defer theTlsMutex.Unlock()
   return &theTlsConfig
}

fun setTlsConfig(domain, tlsc) {  // sorry I didn't get the types for domain, tlsc but I think you get the point...
    theTlsMutex.Lock()
    defertheTlsMutex.Unlock()
    tmp := theTlsConfig
    tmp.NameToCertificateMap[domain] = tlsc
    theTlsConfig = tmp
}

It is awkward, but I would guess that would work for the example

disclaimer: I didn't run it or test a variation, just seems like it would work.

wsc1 on 18 Oct 2018

👍1

If you need crypto/tls to use your mutexes, the answer is to define the GetCertificate func:

https://golang.org/pkg/crypto/tls/#Config.GetCertificate

It is used before NameToCertificate and you can do any locking in there.

bradfitz on 18 Oct 2018

🎉1 👍1

Thanks. I've modified the code to manage a pool of immutable tlsconfigs, rather than having just one mutable tlsconfig with potentially thousands of hosts contained in its map. This is much cleaner.

WinstonPrivacy on 18 Oct 2018

Thanks. I've modified the code to manage a pool of immutable tlsconfigs, rather than having just one mutable tlsconfig with potentially thousands of hosts contained in its map. This is much cleaner.

Yes, my awkward fix is a memory hog. @bradfitz's solution is the correct one and doesn't have this problem, just couple a mutex with a map and use the GetCertificate attribute to coordinate.

wsc1 on 18 Oct 2018

@bradfitz's solution is a good workaround but not ideal from a performance standpoint because you're introducing locks in critical code. Keeping an immutable tls config around is far better because you avoid lock contention.

WinstonPrivacy on 18 Oct 2018

@bradfitz's solution is a good workaround but not ideal from a performance standpoint because you're introducing locks in critical code. Keeping an immutable tls config around is far better because you avoid lock contention.

Glad to hear it works.

wsc1 on 19 Oct 2018

Thanks everyone for your help. We were able to reconfigure things so that instead of using a single TLSConfig, we're now using a cache of immutable TLSConfigs for each website. This has cured the panics!

WinstonPrivacy on 20 Oct 2018

Good to hear! But with my suggestion, I hope you are making a full copy of the map, instead of what I suggested where in setTlsConfig the line tmp := theTlsConfig only copies a reference to the NameToCertificateMap

wsc1 on 5 Nov 2018

👎1

We ended up creating a map of TLSConfigs and we pass out references to those (they do not change once created). Each of those has their own NameToCertificate map.

I'm curious though why you suggest copying the TLSConfigs instead of passing a reference?

WinstonPrivacy on 6 Nov 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

cmd/go: test should print the failure cases to stderr

rakyll · 3Comments

cmd/compile: testing/quick misbehaves on Nexus 9 linux/arm64

rsc · 3Comments

cmd/go: progress bar for "go get"

longzhizhi · 3Comments

cmd/go: cross compile failed without useful error message

mingrammer · 3Comments

x/tools/cmd/gorename: gorename fails with on cgo-using files

ashb · 3Comments

Go: Panic in crypto/tls/common.go

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 18 comments

Related issues

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?