Go: fmt: improve or avoid panic when printing map that is changing as it is being printed

Created on 25 Jul 2019 · 29Comments · Source: golang/go

What version of Go are you using (`go version`)?

$ go version
1.12

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output



$ go env

GOARCH="amd64"

GOBIN=""

GOCACHE="/home/yaotianjia/.cache/go-build"

GOEXE=""

GOFLAGS=""

GOHOSTARCH="amd64"

GOHOSTOS="linux"

GOOS="linux"

GOPATH="/home/yaotianjia/goworkspace"

GOPROXY=""

GORACE=""

GOROOT="/usr/local/go"

GOTMPDIR=""

GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"

GCCGO="gccgo"

CC="gcc"

CXX="g++"

CGO_ENABLED="1"

GOMOD=""

CGO_CFLAGS="-g -O2"

CGO_CPPFLAGS=""

CGO_CXXFLAGS="-g -O2"

CGO_FFLAGS="-g -O2"

CGO_LDFLAGS="-g -O2"

PKG_CONFIG="pkg-config"

GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build822647916=/tmp/go-build -gno-record-gcc-switches"

What did you do?

using fmt.Sprintf("%+v",arg) to print a map, and it panics

What did you expect to see?

It should shows the content of the map

What did you see instead?

runtime panic: runtime error: index out of range

runtime panic: runtime error: index out of range
......
panic(0x320a360, 0x6a3f2b0)
    /usr/local/go/src/runtime/panic.go:522 +0x1b5
internal/fmtsort.Sort(0x31d94a0, 0xc01b3a7ab0, 0x195, 0x1)
    /usr/local/go/src/internal/fmtsort/sort.go:60 +0x34f
fmt.(*pp).printValue(0xc03c81dc80, 0x31d94a0, 0xc01b3a7ab0, 0x195, 0x76, 0x2)
    /usr/local/go/src/fmt/print.go:759 +0xd20
fmt.(*pp).printValue(0xc03c81dc80, 0x360b300, 0xc01b3a7680, 0x199, 0xc000000076, 0x1)
    /usr/local/go/src/fmt/print.go:796 +0x1b52
fmt.(*pp).printValue(0xc03c81dc80, 0x30f0ba0, 0xc01b3a7680, 0x16, 0xc000000076, 0x0)
    /usr/local/go/src/fmt/print.go:866 +0x1956
fmt.(*pp).printArg(0xc03c81dc80, 0x30f0ba0, 0xc01b3a7680, 0x76)
    /usr/local/go/src/fmt/print.go:702 +0x2ba
fmt.(*pp).doPrintf(0xc03c81dc80, 0x3757e58, 0x3c, 0xc00fa93cf8, 0x2, 0x2)
    /usr/local/go/src/fmt/print.go:1016 +0x158
fmt.Sprintf(0x3757e58, 0x3c, 0xc00fa93cf8, 0x2, 0x2, 0x1e, 0x363fa6c)
    /usr/local/go/src/fmt/print.go:214 +0x66
......

the reason of the panic

https://github.com/golang/go/blob/release-branch.go1.12/src/internal/fmtsort/sort.go#L59

the built-in function is shown as follow

func Sort(mapValue reflect.Value) *SortedMap {
    if mapValue.Type().Kind() != reflect.Map {
        return nil
    }
    key := make([]reflect.Value, mapValue.Len())
    value := make([]reflect.Value, len(key))
    iter := mapValue.MapRange()
    for i := 0; iter.Next(); i++ {
        key[i] = iter.Key()
        value[i] = iter.Value()
    }
    sorted := &SortedMap{
        Key:   key,
        Value: value,
    }
    sort.Stable(sorted)
    return sorted
}

The variable key and value is pre-calculated and then the code iterates by iter, when another goroutine add a key to map, the length of key is shorter than the iter's range. Therefore, it panics.

My solutions

I think the code should consider the range of key, maybe it's better looks like this:

for i := 0; iter.Next() && i < len(key) ; i++ {
    ...
}

FrozenDueToAge NeedsInvestigation help wanted

Source

J-CIC

Most helpful comment

@av86743 Please avoid saying things like "then you fail". That is unfriendly. The same sentiment could be phrased in a less destructive way, for example by saying "unfortunately we are unlikely to accept such a change without a test". Please see https://golang.org/conduct. Thanks.

ianlancetaylor on 26 Jul 2019

👍5

All 29 comments

When you mutate an object and at the same time access its content without any synchronization, behavior of the program is not specified.

In short: Your program does not exist.

av86743 on 25 Jul 2019

When you mutate an object and at the same time access its content without any synchronization, behavior of the program is not specified.

In short: Your program does not exist.

I know that concurrent behaviors of map result in undefined behavior, but maybe is not suitable to panic in fmt? This can be prevented easily by check whether the index i is in the range of key

J-CIC on 25 Jul 2019

Change https://golang.org/cl/187577 mentions this issue: internal/fmtsort: restrict the for-range by len(key)

gopherbot on 25 Jul 2019

If there is a race: just adding a check without synchronisation might catch some of the hazards but not all. If the map changes while reading from it, it can still panic. The solution is not to have a race which requires a syncronisation like a mutex to not change the map while printing it with fmt.

martisch on 25 Jul 2019

If there is a race: just adding a check without synchronisation might catch some of the hazards but not all. If the map changes just one instruction after i < len(key) it will still panic. The solution is not to have a race which requires a syncronisation like a mutex to not change the map while printing it with fmt.

Actually it panics not because of the race of codes, it panics because the index out of the range.

J-CIC on 25 Jul 2019

Ok. can you run your program with the race detector and report whether it passes the race detector?

The internal map code does not protect against concurrent read and writes and can still panic (it just might be harder to trigger) or give the wrong results. If we need to not concurrently change the map to fix all the issues then fmtsort does not need to prevent these either as it should not happen in well behaving programs.

martisch on 25 Jul 2019

Ok. can you run your program with the race detector and report whether it passes the race detector? The internal map code does not protect against concurrent read and writes and can still panic (it just might be harder to trigger) or give the wrong results. If we need to not concurrently change the map to fix all the issues then fmtsort does not need to prevent these either as it should not happen in well behaving programs.

I'm agree with you that it's hard to trigger, but I'm afraid that i can't run it on PROD environment to reproduce the problems with --race.

J-CIC on 25 Jul 2019

I think the fix here is not to make it harder to trigger by adding more checks to fmtsort but to fix the issue that the program should not concurrently read and write to the map.

martisch on 25 Jul 2019

https://golang.org/cl/187577

Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

av86743 on 25 Jul 2019

I think the fix here is not to make it harder to trigger by adding more checks to fmtsort but to fix the issue that the program should not concurrently read and write to the map.

I agree that we should handle correctly on concurrent behavior on map, that's how I fixed the program. But this panic message("index out of range") may confuse other programmer ?

J-CIC on 25 Jul 2019

https://golang.org/cl/187577

Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

sorry that it may be a little hard for me to generate the test

J-CIC on 25 Jul 2019

https://golang.org/cl/187577
Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

sorry that it may be a little hard for me to generate the test

Then you fail.

av86743 on 25 Jul 2019

https://golang.org/cl/187577
Missing is the test which demonstrates that problem existed before the patch is applied, and does not exist after the patch is applied.

sorry that it may be a little hard for me to generate the test

Then you fail.

It's impossible for me to apply this patch on PROD environment, so that i can't create such concurrent environment to prove it. But it happens serval times on PROD environment before I fixed it.

And the patch is just a suggestion, when you encounter with such a panic and it tells you: index out of range, it may mislead you to another way to solve it. Unless you dive down into the source code, you won't find that it's actually a concurrent read and write on a map.

What I mean is only that the panic error should be clear, we can do nothing or just check in fmtsort and if it doesn't satisfy the situation(length or other), then we panic with correct message.

J-CIC on 25 Jul 2019

... Unless you dive down into the source code, you won't find that it's actually a concurrent read and write on a map.

If you produce code which does not exist, this is an issue for a different repository.

av86743 on 25 Jul 2019

I agree that we should handle correctly on concurrent behavior on map, that's how I fixed the program. But this panic message("index out of range") may confuse other programmer ?

The main way to detect these problems is to run tests and some production workload with the race detector that is meant to find these problems. The proposed change hides this problem from the programmer as no panic would trigger anymore. I do not think hiding the problem of the race that can surface in multiple other ways would be preferred. Racy programs can fail in many ways many are confusing and actually triggering a panic is one of the better observable failures vs silent data corruption.

If you have a suggestion for an improvement that helps in a non racy program or a way to better detect races with the race detector please add a new issue. Otherwise I would suggest to close this issue as I do think It is not clear what the proposed addition would improve and its based on working around a non well formed go program.

martisch on 25 Jul 2019

👍2

I agree that we should handle correctly on concurrent behavior on map, that's how I fixed the program. But this panic message("index out of range") may confuse other programmer ?

The main way to detect these problems is to run tests and some production workload with the race detector that is meant to find these problems. The proposed change hides this problem from the programmer as no panic would trigger anymore. I do not think hiding the problem of the race that can surface in multiple other ways would be preferred. Racy programs can fail in many ways many are confusing and actually triggering a panic is one of the better observable failures vs silent data corruption.

If you have a suggestion for an improvement that helps in a non racy program or a way to better detect races with the race detector please add a new issue. Otherwise I would suggest to close this issue as I do think It is not clear what the proposed addition would improve and its based on working around a non well formed go program.

Agree

J-CIC on 25 Jul 2019

I do think there is something to be fixed here. At the very least, the panic could be improved.
I see the options as

Leave the code as is (the status quo)
Panic, but with a better error message
Don't panic, but add something to the string output to indicate a data race happened
Ignore the error and proceed

Note that we do 4 in reflect.MapKeys. Maybe that's not the right choice, but it is precedent.

I'd argue that fmt is kind of special and warrants considering doing either 3 or 4. Unlike reflect.MapKeys, for example, the results of fmt.Printf probably aren't being used programmatically, but printed or logged. In that case, it's almost certainly better to allow the print to proceed without a panic. The print/log might be happening as part of handling a panic, and triggering a panic while handling a panic is almost always a bad idea.

randall77 on 25 Jul 2019

👍2

For 4: Another option might be to append map keys and values (to a pre-alloced slice) instead of cutting of the length so the resulting slice is neither to small nor to big.

If we generally want to harden fmt against data races/panics I think it would be good to think about a more general approach to recover from panics like the handler functions in fmt do with catchPanic: https://github.com/golang/go/blob/6f51082da77a1d4cafd5b7af0db69293943f4066/src/fmt/print.go#L540

martisch on 25 Jul 2019

👍1

@randall77

... The print/log might be happening as part of handling a panic, and triggering a panic while handling a panic is almost always a bad idea.

And how it would possible to print/log map "as part of handling a panic"? As your intention concerns removal only map-related panic.

av86743 on 25 Jul 2019

And how it would possible to print/log map "as part of handling a panic"? As your intention concerns removing only map-related panic.

func f(m map[int]int) {
    defer func() {
        fmt.Printf("f failed on map %v\n", m)
    }()
    ... do some complex calculation with m...
}

Then if the complex calculation panics for whatever reason, we can print (or log) the map used for that calculation.

randall77 on 25 Jul 2019

👍1

ianlancetaylor on 26 Jul 2019

👍5

I do think there is something to be fixed here. At the very least, the panic could be improved.
I see the options as

Leave the code as is (the status quo)

Panic, but with a better error message

Don't panic, but add something to the string output to indicate a data race happened

Ignore the error and proceed

Note that we do 4 in reflect.MapKeys. Maybe that's not the right choice, but it is precedent.

I'd argue that fmt is kind of special and warrants considering doing either 3 or 4. Unlike reflect.MapKeys, for example, the results of fmt.Printf probably aren't being used programmatically, but printed or logged. In that case, it's almost certainly better to allow the print to proceed without a panic. The print/log might be happening as part of handling a panic, and triggering a panic while handling a panic is almost always a bad idea.

Actually I use fmt in logger and I think it's better not to panic explicitly in a logger. I prefer the option 3/4 for the reason that we have already noted that concurrent read and write on a map caused undefined behavior(in https://blog.golang.org/go-maps-in-action). It's programmer's duty to ensure the program won't encouter a race condition. Also it makes us upset and confused when the program panic with "index out of range" while using fmt to log something.

J-CIC on 26 Jul 2019

... It's programmer's duty to ensure the program won't encouter a race condition. ...

Please explain yourself. My google translate does not work for this.

av86743 on 26 Jul 2019

👎2

... It's programmer's duty to ensure the program won't encouter a race condition. ...

Just simply means that you should take care of the cocurrent condition.

J-CIC on 26 Jul 2019

Just simply means that you should take care of the cocurrent condition.

You are not you. You are yao.

av86743 on 26 Jul 2019

In trying to make a test for this, it's very hard to get this path to trigger. I almost always get an error from the runtime

fatal error: concurrent map iteration and map write

This is a throw, not even a panic, so it is uncatchable.

Not sure what that means for the overall issue. Possibly that any of 2,3, 4 are fine.

@J-CIC do you have a program that can reproduce this somewhat reliably?

randall77 on 27 Jul 2019

In trying to make a test for this, it's very hard to get this path to trigger. I almost always get an error from the runtime

fatal error: concurrent map iteration and map write

This is a throw, not even a panic, so it is uncatchable.

Not sure what that means for the overall issue. Possibly that any of 2,3, 4 are fine.

@J-CIC do you have a program that can reproduce this somewhat reliably?

@randall77
Yes, the code is as follow:

package main

import (
    "fmt"
    "sync"
    "time"
)

func main() {
    Info := map[string]string{}
    startSec := time.Now().UnixNano()
    //generate a large map
    for i := 0; i < 10000000; i++ {
        key := fmt.Sprintf("key%d", i)
        Info[key] = "2"
    }
    wg := sync.WaitGroup{}
    wg.Add(2)
    endSec := time.Now().UnixNano()
    fmt.Println("time used to generate large map: ", (endSec-startSec)/time.Millisecond.Nanoseconds())
    // start simulate
    go modifyMap(&wg, Info)
    go printMap(&wg, Info)
    wg.Wait()
}

func printMap(wg *sync.WaitGroup, item map[string]string) {
    _ = fmt.Sprintf("%+v\n", item)
    wg.Done()
}
func modifyMap(wg *sync.WaitGroup, item map[string]string) {
    // you may have to custom the time.Sleep Millisecond here
    time.Sleep(time.Millisecond * 10)
    item["testPanic"] = "test"
    wg.Done()
}

Expected to see

time used to generate large map:  6784
panic: runtime error: index out of range

goroutine 21 [running]:
internal/fmtsort.Sort(0x10ad440, 0xc00008c000, 0x15, 0x0)
    /usr/local/go/src/internal/fmtsort/sort.go:60 +0x34f
fmt.(*pp).printValue(0xc0047680c0, 0x10ad440, 0xc00008c000, 0x15, 0x76, 0x0)
    /usr/local/go/src/fmt/print.go:759 +0xd20
fmt.(*pp).printArg(0xc0047680c0, 0x10ad440, 0xc00008c000, 0x76)
    /usr/local/go/src/fmt/print.go:702 +0x2ba
fmt.(*pp).doPrintf(0xc0047680c0, 0x10c8d4d, 0x4, 0xc000077fb0, 0x1, 0x1)
    /usr/local/go/src/fmt/print.go:1016 +0x158
fmt.Sprintf(0x10c8d4d, 0x4, 0xc00002efb0, 0x1, 0x1, 0x0, 0x0)
    /usr/local/go/src/fmt/print.go:214 +0x66
main.printMap(0xc054f807a0, 0xc00008c000)
    /Users/yaotianjia/go/src/test/test.go:27 +0x74
created by main.main
    /Users/yaotianjia/go/src/test/test.go:22 +0x2fa
exit status 2

Also my go version is

go version go1.12.7 darwin/amd64

J-CIC on 28 Jul 2019

idea for testing this (just a noobs brilliant idea):
can you use atomic read/writes to more precisely time the read and write to get more accurate results?