Go: encoding/xml: Unmarshalling empty XML attribute to int throws error

Created on 1 Mar 2017  Â·  18Comments  Â·  Source: golang/go

Go version:
1.8 (regression from 1.7.x --> 1.8.0)

Environment:
Dockerized Alpine (golang:1.8-alpine and golang:1.7-alpine)

Description:
In Go 1.7, an XML entity like <Object foo=""></Object> could be unmarshalled into a struct where Foo is an int value without throwing an error. When upgrading to Go 1.8.0, the same code throws an error due to strconv.ParseInt being called with the empty string as an argument.

This seems to be a behavioral regression between Go 1.7.x and Go 1.8.0.

Example code:
https://play.golang.org/p/GIsONzXQQQ

Expected behavior:
Example code should return without error, printing Foo: 0

Actual behavior:
An error is thrown, indicating that ParseInt was called with an empty string.

Compiling this same code with Go 1.7.x results in the expected behavior.

FrozenDueToAge NeedsDecision

Most helpful comment

Here is a test program. It covers the cases in this issue (attributes) and the ones in #13417 (elements):

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {
    type X struct {
        XMLName xml.Name `xml:"X"`
        A       int      `xml:",attr"`
        C       int
    }

    var tests = []struct {
        input string
        ok    bool
    }{
        {`<X></X>`, true},
        {`<X A=""></X>`, true},
        {`<X A="bad"></X>`, false},
        {`<X></X>`, true},
        {`<X><C></C></X>`, true},
        {`<X><C/></X>`, true},
        {`<X><C>bad</C></X>`, false},
    }

    for _, tt := range tests {
        err := xml.Unmarshal([]byte(tt.input), new(X))
        if err != nil {
            fmt.Printf("%-20s ERROR %v\n", tt.input, err)
        } else {
            fmt.Printf("%-20s ok\n", tt.input)
        }
    }
}

Go 1.2 through Go 1.7 were consistent: attributes unchecked, children strictly checked:

$ go1.7 run /tmp/x.go
<X></X>              ok
<X A=""></X>         ok
<X A="bad"></X>      ok
<X></X>              ok
<X><C></C></X>       ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C/></X>          ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax
$

Go 1.8 made attributes strictly checked, matching children:

$ go1.8 run /tmp/x.go
<X></X>              ok
<X A=""></X>         ERROR strconv.ParseInt: parsing "": invalid syntax
<X A="bad"></X>      ERROR strconv.ParseInt: parsing "bad": invalid syntax
<X></X>              ok
<X><C></C></X>       ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C/></X>          ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax
$ 

Go 1.9 will (at least right now) relax things so that only non-empty bad inputs are checked:

$ go run /tmp/x.go  # Go 1.9 development
<X></X>              ok
<X A=""></X>         ok
<X A="bad"></X>      ERROR strconv.ParseInt: parsing "bad": invalid syntax
<X></X>              ok
<X><C></C></X>       ok
<X><C/></X>          ok
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax
$ 

For Go 1.8.1, I will revert the checking behavior back to Go 1.7 for now. We can try another attempt at attribute checking, with @adams-sarah's strictness relaxation applied both to attributes and children, in Go 1.9.

$ go run /tmp/x.go  # Go 1.8.1 development
<X></X>              ok
<X A=""></X>         ok
<X A="bad"></X>      ok
<X></X>              ok
<X><C></C></X>       ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C/></X>          ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax

All 18 comments

Recently tried upgrading to 1.8 and ran into this.

Many of our XML-based 3rd parties leave attributes empty for prices, price="". I think unmarshalling to the type default is expected here.

This appears to have been introduced by 2c58cb36f971aed484e880769eb2b0a21654459a which returns an error that was previously ignored.

This is a wider change that may have affected several types; I'm unsure if it breaks the compatibility promise, or was fixing a bug. I'm inclined to think that it doesn't make sense for ints to unmarshal to their zero value in XML, but this is one of those weird cases where Go types/XML aren't really compatible and any mapping between them will have to do some funky stuff, and if it's always been doing this maybe we should special case it and ignore the err result of strconv.

/cc @ericlagergren (author), @bradfitz (reviewer)

@rsc, do you have opinions here?

Also worth noting: before that commit unmarshaling values did return an strconv error, it was only unmarshaling attributes that ignored it. Test:

// Issue 19333. Unmarshaling empty attr or element into int must not error.
func TestUnmarsdhalInt(t *testing.T) {
    t.Run("Attr", func(t *testing.T) {
        v := &struct {
            XMLName Name `xml:"int"`
            Foo     int  `xml:"foo,attr"`
        }{}
        if err := Unmarshal([]byte(`<int foo=""></int>`), v); err != nil {
            t.Errorf("did not expect error, but got `%s`", err)
        }
        if v.Foo != 0 {
            t.Errorf("want 0, have %d", v.Foo)
        }
    })
    t.Run("Value", func(t *testing.T) {
        v := &struct {
            XMLName Name `xml:"int"`
            Foo     int  `xml:"foo"`
        }{}
        if err := Unmarshal([]byte(`<int><foo></foo></int>`), v); err != nil {
            t.Errorf("did not expect error, but got `%s`", err)
        }
        if v.Foo != 0 {
            t.Errorf("want 0, have %d", v.Foo)
        }
    })
}

Like @SamWhited said, it makes the most sense to special case attr="" as the Go type's zero value.

I created the original issue because I tried to marshal a single letter into a byte and instead of returning an error saying, "you can't do that" it just silently left the field as 0.

The alternative (reverting https://github.com/golang/go/commit/2c58cb36f971aed484e880769eb2b0a21654459a) means if the client sends an invalid XML request (e.g., attr="foobar" into an int64) the server has no way to determine whether it was a bad request or the field was meant to be 0.

I believe that behavior is definitely a bug in the XML package.

Reopening for backport I guess.

I can't figure out what is going on here. The linked commit 0a0186f _changes_ the behavior of the XML package beyond this bug fix. It is not appropriate for Go 1.8.1. Leaving this open because we need to decide whether to live with the bug or prepare a different release-specific fix.

Could we revert 2c58cb3 for 1.8.1 only?

Russ—I think the linked commit only happened to fix this issue and was in
response to another bug. See Sarah Adams' comments.
On Wed, Apr 5, 2017 at 7:22 AM Sam Whited notifications@github.com wrote:

Could we revert 2c58cb3
https://github.com/golang/go/commit/2c58cb36f971aed484e880769eb2b0a21654459a
for 1.8.1 only?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19333#issuecomment-291876668, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFnwZ9PBKXM9qfwHhLsA_k1DPtfG6JVpks5rs6OKgaJpZM4MPDYh
.

OK but the bug was marked Go 1.8.1 and closed with a link to that commit. That commit is an inappropriate fix for Go 1.8.1 (and not cherry-picked yet anyway), so reopening.

@SamWhited, yes, that seems to be the solution. I will send a CL.

My apologies for closing.

No worries @adams-sarah, it's all very confusing around point release milestones. We hope to have a bot help with this soon.

Here is a test program. It covers the cases in this issue (attributes) and the ones in #13417 (elements):

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {
    type X struct {
        XMLName xml.Name `xml:"X"`
        A       int      `xml:",attr"`
        C       int
    }

    var tests = []struct {
        input string
        ok    bool
    }{
        {`<X></X>`, true},
        {`<X A=""></X>`, true},
        {`<X A="bad"></X>`, false},
        {`<X></X>`, true},
        {`<X><C></C></X>`, true},
        {`<X><C/></X>`, true},
        {`<X><C>bad</C></X>`, false},
    }

    for _, tt := range tests {
        err := xml.Unmarshal([]byte(tt.input), new(X))
        if err != nil {
            fmt.Printf("%-20s ERROR %v\n", tt.input, err)
        } else {
            fmt.Printf("%-20s ok\n", tt.input)
        }
    }
}

Go 1.2 through Go 1.7 were consistent: attributes unchecked, children strictly checked:

$ go1.7 run /tmp/x.go
<X></X>              ok
<X A=""></X>         ok
<X A="bad"></X>      ok
<X></X>              ok
<X><C></C></X>       ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C/></X>          ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax
$

Go 1.8 made attributes strictly checked, matching children:

$ go1.8 run /tmp/x.go
<X></X>              ok
<X A=""></X>         ERROR strconv.ParseInt: parsing "": invalid syntax
<X A="bad"></X>      ERROR strconv.ParseInt: parsing "bad": invalid syntax
<X></X>              ok
<X><C></C></X>       ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C/></X>          ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax
$ 

Go 1.9 will (at least right now) relax things so that only non-empty bad inputs are checked:

$ go run /tmp/x.go  # Go 1.9 development
<X></X>              ok
<X A=""></X>         ok
<X A="bad"></X>      ERROR strconv.ParseInt: parsing "bad": invalid syntax
<X></X>              ok
<X><C></C></X>       ok
<X><C/></X>          ok
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax
$ 

For Go 1.8.1, I will revert the checking behavior back to Go 1.7 for now. We can try another attempt at attribute checking, with @adams-sarah's strictness relaxation applied both to attributes and children, in Go 1.9.

$ go run /tmp/x.go  # Go 1.8.1 development
<X></X>              ok
<X A=""></X>         ok
<X A="bad"></X>      ok
<X></X>              ok
<X><C></C></X>       ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C/></X>          ERROR strconv.ParseInt: parsing "": invalid syntax
<X><C>bad</C></X>    ERROR strconv.ParseInt: parsing "bad": invalid syntax

CL https://golang.org/cl/39607 mentions this issue.

Cherry-picked.

Was this page helpful?
0 / 5 - 0 ratings