go version)?go version go1.9 linux/arm64
We believe so but cannot currently confirm as it is very difficult to reproduce under controlled conditions (see below).
go env)?linux, arm64
We are running a proxy which performs MITM interception on behalf of the user and typically is servicing hundreds of requests in parallel. We manage the certificate store by inserting certificates in the NameToCertificate map. Maps are not concurrency safe, so if a TLS handshake is occurring while a certificate is being inserted into NameToCertificate, a panic results. We avoid this in our code by protecting NameToCertificate with a mutex, but this is not possible in the native Golang code.
This error occurs randomly and we see it about once a week but have not been able to reproduce yet.
We would like for our program to continue running, even if the handshake crashes. This could potentially be done by adding a recover() to the code in crypto/common.go around line 737:
if cert, ok := c.NameToCertificate[name]; ok {
return cert, nil
}
The program crashes. This occurs even with the initial TLS handshake() protected by a recover. Stack trace:
fatal error: concurrent map read and map write
goroutine 11597 [running]:
runtime.throw(0xb64eb7, 0x21)
/usr/local/go/src/runtime/panic.go:605 +0x70 fp=0x44204919c0 sp=0x44204919a0 pc=0x42a600
runtime.mapaccess2_faststr(0xa6a400, 0x44202c0c00, 0x4429418a80, 0x18, 0x0, 0x4)
/usr/local/go/src/runtime/hashmap_fast.go:324 +0x488 fp=0x4420491a20 sp=0x44204919c0 pc=0x40d248
crypto/tls.(Config).getCertificate(0x442023ec00, 0x4427736160, 0x2, 0x0, 0x0)
/usr/local/go/src/crypto/tls/common.go:737 +0xc4 fp=0x4420491aa0 sp=0x4420491a20 pc=0x5ab614
crypto/tls.(serverHandshakeState).readClientHello(0x4420491c38, 0x4420491c28, 0x1138740, 0x577a84)
/usr/local/go/src/crypto/tls/handshake_server.go:216 +0x370 fp=0x4420491bf0 sp=0x4420491aa0 pc=0x5bd2f0
crypto/tls.(Conn).serverHandshake(0x4428a0ce00, 0xb8ce98, 0x4428a0cf20)
/usr/local/go/src/crypto/tls/handshake_server.go:48 +0x68 fp=0x4420491d10 sp=0x4420491bf0 pc=0x5bcc88
crypto/tls.(Conn).Handshake(0x4428a0ce00, 0x0, 0x0)
Are you building with a fork of the Go 1.9 source and a patch applied? Have you tried Go 1.10 or 1.11?
Also note the Config docs: https://golang.org/pkg/crypto/tls/#Config
A Config structure is used to configure a TLS client or server. After one
has been passed to a TLS function it must not be modified. A Config may be
reused; the tls package will also not modify it.
Looks like you'll have to explicitly synchronise the insertion of a cert with the config object itself, rather than just the NameToCertificateMap:
maybe the following would solve it?
replace Config type in your go code with a wrapper
type myConfig struct {
sync.Mutex
tls.Config
}
have the consumers of that Lock()/defer Unlock() and have the inserts in the NameToCertificateMap do the same but use a new tls.Config.
var myConfig myConfig
myConfig.Config = *theNewConfig
I'm trying to say: maybe you can just push the mutex one level out in Go?
Hi @WinstonPrivacy,
I think the error message is clear enough. There are some concurrent map operations going on. Have you tried running your application in -race mode ? That is a very easy way to figure out where the issue is.
We avoid this in our code by protecting NameToCertificate with a mutex, but this is not possible in the native Golang code.
These 2 statements are contradicting each other. If you already protect access with a mutex, then how is it not possible in the native Go code ?
Also, please post the code so that we can reproduce the issue on our side. However, I think this is a simple map access issue which you can identify with the -race mode.
Thanks.
Are you building with a fork of the Go 1.9 source and a patch applied? Have you tried Go 1.10 or 1.11?
I've succeeded in reproducing the problem in Go 1.9. I haven't tried other versions but I suspect it will be the same (need to confirm).
Also note the
Configdocs: https://golang.org/pkg/crypto/tls/#ConfigA Config structure is used to configure a TLS client or server. After one
has been passed to a TLS function it must not be modified. A Config may be
reused; the tls package will also not modify it.
This is the problem right here. We're sending in a reference to the TLS Config. It looks like we should be copying it instead.
Looks like you'll have to explicitly synchronise the insertion of a cert with the config object itself, rather than just the NameToCertificateMap:
maybe the following would solve it?
replace Config type in your go code with a wrapper
type myConfig struct { sync.Mutex tls.Config }have the consumers of that Lock()/defer Unlock() and have the inserts in the NameToCertificateMap do the same but use a new tls.Config.
var myConfig myConfig myConfig.Config = *theNewConfigI'm trying to say: maybe you can just push the mutex one level out in Go?
This won't work because the original Golang code in tls/common.go does not use a lock. We are already locking in our code to avoid the panic but it crashes inside tls/common.go while doing the TLS handshake.
Hi @WinstonPrivacy,
I think the error message is clear enough. There are some concurrent map operations going on. Have you tried running your application in
-racemode ? That is a very easy way to figure out where the issue is.We avoid this in our code by protecting NameToCertificate with a mutex, but this is not possible in the native Golang code.
These 2 statements are contradicting each other. If you already protect access with a mutex, then how is it not possible in the native Go code ?
Also, please post the code so that we can reproduce the issue on our side. However, I think this is a simple map access issue which you can identify with the
-racemode.Thanks.
I suppose we could modify the Golang code in crypto/tls/common.go to expose a public lock. This would be complicated and terribly bad design, but it's not practical in any case because we don't want to maintain our own fork of Golang.
I now have working code which consistently reproduces the problem. I've pasted it into Go Playground but because it uses external libraries, you'll have to run it locally.
The bottom of your code contains a private key. I'll take a read through, but it sounds like you should be copying the Config and not modifying it.
var CA_KEY = []byte(`-----BEGIN RSA PRIVATE KEY-----
That's a throwaway key for unit testing only. It's never used for anything else.
I'm trying to say: maybe you can just push the mutex one level out in Go?
This won't work because the original Golang code in tls/common.go does not use a lock. We are already locking in our code to avoid the panic but it crashes inside tls/common.go while doing the TLS handshake.
From your example, as @adamdecaf noted, you're locks aren't locking the whole TestCaConfig object. I think it would work if you modified it so that every time a domain is inserted, a whole new tls.Config is created or copied from the old one.
To be more explicit, suppose we add a global variable theTlsConfig to your example, and an associated mutex theTlsMutex.
line 97 would have something like
TLSClientConfig: getTlsConfig()
where getTlsConfig is a function you define which returns the current TlsConfig with respect to all such configs that have been created (one for each insertion).
Then lines 73-75 would have to be replaced by something like
setTlsConfig(domain, tlsc)
Then
func getTlsConfig() *tls.Config {
theTlsMutex.Lock()
defer theTlsMutex.Unlock()
return &theTlsConfig
}
fun setTlsConfig(domain, tlsc) { // sorry I didn't get the types for domain, tlsc but I think you get the point...
theTlsMutex.Lock()
defertheTlsMutex.Unlock()
tmp := theTlsConfig
tmp.NameToCertificateMap[domain] = tlsc
theTlsConfig = tmp
}
It is awkward, but I would guess that would work for the example
disclaimer: I didn't run it or test a variation, just seems like it would work.
If you need crypto/tls to use your mutexes, the answer is to define the GetCertificate func:
https://golang.org/pkg/crypto/tls/#Config.GetCertificate
It is used before NameToCertificate and you can do any locking in there.
Thanks. I've modified the code to manage a pool of immutable tlsconfigs, rather than having just one mutable tlsconfig with potentially thousands of hosts contained in its map. This is much cleaner.
Thanks. I've modified the code to manage a pool of immutable tlsconfigs, rather than having just one mutable tlsconfig with potentially thousands of hosts contained in its map. This is much cleaner.
Yes, my awkward fix is a memory hog. @bradfitz's solution is the correct one and doesn't have this problem, just couple a mutex with a map and use the GetCertificate attribute to coordinate.
@bradfitz's solution is a good workaround but not ideal from a performance standpoint because you're introducing locks in critical code. Keeping an immutable tls config around is far better because you avoid lock contention.
@bradfitz's solution is a good workaround but not ideal from a performance standpoint because you're introducing locks in critical code. Keeping an immutable tls config around is far better because you avoid lock contention.
Glad to hear it works.
Thanks everyone for your help. We were able to reconfigure things so that instead of using a single TLSConfig, we're now using a cache of immutable TLSConfigs for each website. This has cured the panics!
Good to hear! But with my suggestion, I hope you are making a full copy of the map, instead of what I suggested where in setTlsConfig the line tmp := theTlsConfig only copies a reference to the NameToCertificateMap
We ended up creating a map of TLSConfigs and we pass out references to those (they do not change once created). Each of those has their own NameToCertificate map.
I'm curious though why you suggest copying the TLSConfigs instead of passing a reference?
Most helpful comment
If you need crypto/tls to use your mutexes, the answer is to define the GetCertificate func:
https://golang.org/pkg/crypto/tls/#Config.GetCertificate
It is used before NameToCertificate and you can do any locking in there.