Google-cloud-go: datastore: embedded anonymous struct regression?

Created on 8 Mar 2017  路  18Comments  路  Source: googleapis/google-cloud-go

This issue copied from stackoverflow.
Original issue here: http://stackoverflow.com/questions/42659697/google-datastore-breaking-change-re-anonymous-struct-fields
Original reporter: @ianrose14

Report body:

We have a struct definition like this:

type TimeMs struct {
    time.Time
}

And various of our datastore entity definitions include fields of this type, e.g.:

type Whatever struct {
    Created TimeMs
}

In the past, this always worked fine without any special hackery on our part. If you examine these entities in the cloud console, the field's name was "Created." (note the trailing dot) which was a little odd but didn't seem to cause any problems. However, the latest versions of the cloud datastore package don't seem to play nicely with this. When we attempt to do a client.Put, we get an error of:

unexpected error: rpc error: code = 3 desc = The flattened property.name is the empty string.

We also saw a recent commit of "if nested struct implements PLS, use it" which seemed potentially useful. So we tried making the TimeMs struct implement PropertyLoadSaver (saving its anonymously-embedded time.Time as a property with Name=""), but this results in the same error.

It sure seems like this is a regression. Am I missing something? Is there some kind of workaround that we haven't thought of?

Please note that "just make it implement PropertyLoadSaver and save the Time with some non-empty Name" isn't really viable since we already have (literally) millions of entities stored using "" so it doesn't seem viable to migrate all of them.

datastore p1 bug

All 18 comments

Hey Ian!
Is your datastore pkg up to date? I cannot repro.

Here is the example app I'm using to test:

type TimeMs struct {
    time.Time
}

type Whatever struct {
    Created TimeMs
}

func main() {
    ctx := context.Background()
    dc, err := datastore.NewClient(ctx, "my-project-id")
    if err != nil {
        panic(err)
    }

    k := datastore.NameKey("Entity", "stringID", nil)
    e := &Whatever{Created: TimeMs{time.Now().UTC()}}

    if _, err := dc.Put(ctx, k, e); err != nil {
        panic(err)
    }

    e = new(Whatever)
    if err := dc.Get(ctx, k, e); err != nil {
        panic(err)
    }

    fmt.Println(e)
}

Out:

$ go run main.go
&{2017-03-07 15:36:51.503854 -0800 PST}

Does your program differ from this ^ in any way?

@adams-sarah Sorry! I omitted an important piece of information. The latest datastore pkg produces properties that omit the trailing "." which makes it backward incompatible with our existing entities (hence our attempts at implementing PropertyLoadSaver to paper over these differences).

Here is a slight modification of your test program where I print the properties in Load and Save:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "cloud.google.com/go/datastore"
)

type TimeMs struct {
    time.Time
}

type Whatever struct {
    Created TimeMs
}

func (t *Whatever) Load(props []datastore.Property) error {
    for _, prop := range props {
        log.Printf("in Load, found prop %s", prop.Name)
    }
    return datastore.LoadStruct(t, props)
}

func (t *Whatever) Save() ([]datastore.Property, error) {
    props, err := datastore.SaveStruct(t)
    for _, prop := range props {
        log.Printf("in Save, found prop %s", prop.Name)
    }
    return props, err
}

func main() {
    ctx := context.Background()
    dc, err := datastore.NewClient(ctx, "dev")
    if err != nil {
        panic(err)
    }

    k := datastore.NameKey("Entity", "stringID", nil)
    e := &Whatever{Created: TimeMs{time.Now().UTC()}}

    if _, err := dc.Put(ctx, k, e); err != nil {
        panic(err)
    }

    e = new(Whatever)
    if err := dc.Get(ctx, k, e); err != nil {
        panic(err)
    }

    fmt.Println(e)
}

When I run this with the latest datastore pkg, I get:

2017/03/07 21:28:14 in Save, found prop Created
2017/03/07 21:28:14 in Load, found prop Created
&{2017-03-07 21:28:14.293082 -0500 EST}

Whereas when I run this with old datastore code, I get:

2017/03/07 21:30:06 in Save, found prop Created.
2017/03/07 21:30:06 in Load, found prop Created.
&{2017-03-07 21:30:06.415073 -0500 EST}

To sanity check that this is _really_ a problem with our existing data, here is another version that pulls a real entity from our datastore:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "cloud.google.com/go/datastore"
)

type TimeMs struct {
    time.Time
}

type Session struct {
    Created           TimeMs
    AnonymizePlayback bool
    Version           int
    DeletionStatus    int
    NumPages          int
    LastPage          string
    LastPageTime      TimeMs
    City              string
    NumEvents         int
}

func (s *Session) Load(props []datastore.Property) error {
    for _, prop := range props {
        log.Printf("in Load, found prop %s", prop.Name)
    }
    return datastore.LoadStruct(s, props)
}

func (s *Session) Save() ([]datastore.Property, error) {
    props, err := datastore.SaveStruct(s)
    for _, prop := range props {
        log.Printf("in Save, found prop %s", prop.Name)
    }
    return props, err
}

func main() {
    ctx := context.Background()
    dc, err := datastore.NewClient(ctx, "fs-staging")
    if err != nil {
        panic(err)
    }

    userKey := datastore.IDKey("User", 4831228070985728, nil)
    userKey.Namespace = "ianthomasrose.com"

    sessionKey := datastore.IDKey("Session", 5629499534213120, userKey)
    sessionKey.Namespace = "ianthomasrose.com"

    var session Session
    if err := dc.Get(ctx, sessionKey, &session); err != nil {
        panic(err)
    }

    fmt.Println(session)
}

When I run this with the latest datastore pkg, I get:

2017/03/07 21:32:24 in Load, found prop DeletionStatus
2017/03/07 21:32:24 in Load, found prop NumPages
2017/03/07 21:32:24 in Load, found prop LastPage
2017/03/07 21:32:24 in Load, found prop Version
2017/03/07 21:32:24 in Load, found prop AnonymizePlayback
2017/03/07 21:32:24 in Load, found prop Created.
2017/03/07 21:32:24 in Load, found prop LastPageTime.
2017/03/07 21:32:24 in Load, found prop NumEvents
2017/03/07 21:32:24 in Load, found prop City
panic: datastore: cannot load field "LastPageTime." into a "main.Session": no such struct field

goroutine 1 [running]:
panic(0x405980, 0xc420373b30)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
main.main()
    /Users/ianrose/Code/fullstorydev/mn/projects/fullstory/datastore-test-gcloud.go:58 +0x37a
exit status 2

Whereas when I run this with old datastore code, I get:

2017/03/07 21:31:36 in Load, found prop NumEvents
2017/03/07 21:31:36 in Load, found prop NumPages
2017/03/07 21:31:36 in Load, found prop LastPage
2017/03/07 21:31:36 in Load, found prop Version
2017/03/07 21:31:36 in Load, found prop Created.
2017/03/07 21:31:36 in Load, found prop LastPageTime.
2017/03/07 21:31:36 in Load, found prop City
2017/03/07 21:31:36 in Load, found prop AnonymizePlayback
2017/03/07 21:31:36 in Load, found prop DeletionStatus
{2016-10-06 03:40:34.162001 -0400 EDT false 0 0 0  0000-12-31 19:00:00 -0500 EST  0}

Ah! Yes.
So we made this change consciously, before the beta release. The dot at the end of field names was a mistake carried over from the old appengine datastore code.

I'm so sorry for the inconvenience.

I'd suggest writing a script, that loops through your entities and fixes the names to eliminate the "." suffix, something like this (below). Alternatively, you could bake this behavior into your code.

package main

import (
    "context"
    "fmt"
    "strings"
    "time"

    "cloud.google.com/go/datastore"
)

type TimeMs struct {
    time.Time
}

type Whatever struct {
    Created TimeMs
}

func (t *Whatever) Load(props []datastore.Property) error {
    for i, p := range props {
        if strings.HasSuffix(p.Name, ".") {
            props[i].Name = strings.TrimSuffix(p.Name, ".")
        }
    }

    return datastore.LoadStruct(t, props)
}

func (t *Whatever) Save() ([]datastore.Property, error) {
    return datastore.SaveStruct(t)
}

func main() {
    ctx := context.Background()
    dc, err := datastore.NewClient(ctx, "shadams-dev-testing")
    if err != nil {
        panic(err)
    }

    k := datastore.NameKey("Entity", "stringKey", nil)

    e := new(Whatever)
    if err := dc.Get(ctx, k, e); err != nil {
        panic(err)
    }

    if _, err := dc.Put(ctx, k, e); err != nil {
        panic(err)
    }
}

LMK if this doesn't help, or if there's anything else I can do.

So sorry again for the inconvenience. Now that the package is in beta, I don't anticipate that we'll make any breaking changes like this again. Once it's GA (soon), we guarantee that we won't.

Alternatively, you could bake this behavior into your code.

We considered that as well, but unfortunately won't that break our queries? In other words if we have some entities using Created. and some using Created, there is no way to query across both types of entities (e.g. with "Created > some value").

Also I should note that we still have appengine code that is reading and writing these entities too. I don't see a path forward for these 2 packages to coexist since one will read/write Created. and the other will read/write Created.

:(( yea.
This is something we talked about and didn't really have a good path forward for. I think we decided to just break interop, because we thought this was such a special case that it wouldn't really come up.

@jba @zombiezen @adamtanner any thoughts on how we should proceed?

For reference, this change took effect at 1e032f388e6fff73e1ce068a900955bfa2cca1d2 , when internal/fields usage went into datastore.

Relevant issues:
https://github.com/GoogleCloudPlatform/google-cloud-go/issues/441
https://github.com/golang/appengine/issues/41

Yes, we've got local mods to both cloud/datastore and now internal/fields to work around this issue so that it works like AppEngine, but we have to re-port the hacks every time we update.

If we could put a PLS _on TimeMs itself_ (and have it respected in both the AppEngine and Cloud frameworks) we could do a slow migration. But putting a PLS on all of our entities that contain a TimeMs would be a huge pain.

Hmm, it looks like if you port https://github.com/golang/appengine/pull/52 to cloud/datastore SDK that would be super helpful

:( unfortunately, https://github.com/golang/appengine/pull/52 is not compatible with cloud/datastore. We're using a different mechanism to act as a codec for field-loading to and from datastore, go structs. This mechanism (in internal/fields) is new and shared among all the cloud packages, ensuring their behavior is the same.

Most folks over here are out of the office this week for Google Cloud NEXT. I'll try to rally folks early next week to talk about paths forward for this.

You may be able to solve this already by making TimeMs implement a PLS. The cloud code should handle this well. The appengine datastore package will not. So your appengine code could load up "Created."s with the default behavior, and your cloud code could do so with the PLS.
Would that work for you guys?

Yes, that would actually work if we could write the embedded struct PLS on the cloud side is allowed to use an empty property name (since so that the containing struct can prepend e.g. Created. to the empty string that comes from PLS.

@adams-sarah by the way, I ran into a bug in the latest version of saveStructProperty:

--- a/projects/fullstory/go/src/cloud.google.com/go/datastore/save.go
+++ b/projects/fullstory/go/src/cloud.google.com/go/datastore/save.go
@@ -66,9 +66,6 @@ func saveStructProperty(props *[]Property, name string, opts saveOpts, v reflect
                return nil
        }

-       // Check if v implements PropertyLoadSaver.
-       pls, isPLS := v.Interface().(PropertyLoadSaver)
-
        switch x := v.Interface().(type) {
        case *Key, time.Time, GeoPoint:
                p.Value = x
@@ -101,18 +98,30 @@ func saveStructProperty(props *[]Property, name string, opts saveOpts, v reflect
                        v = v.Elem()
                        fallthrough
                case reflect.Struct:
+                       if !v.CanAddr() {
+                               return fmt.Errorf("datastore: unsupported struct field: value is unaddressable")
+                       }
+
+                       // Check if v implements PropertyLoadSaver.
+                       pls, isPLS := v.Addr().Interface().(PropertyLoadSaver)
                        if isPLS {
                                subProps, err := pls.Save()
                                if err != nil {
                                        return err
                                }
-                               p.Value = &Entity{Properties: subProps}
-                               break
+                               if opts.flatten {
+                                       // Add subProps to our props
+                                       for _, subProp := range subProps {
+                                               subProp.Name = name + "." + subProp.Name
+                                               *props = append(*props, subProp)
+                                       }
+                                       return nil
+                               } else {
+                                       p.Value = &Entity{Properties: subProps}
+                                       break
+                               }
                        }

-                       if !v.CanAddr() {
-                               return fmt.Errorf("datastore: unsupported struct field: value is unaddressable")
-                       }
                        sub, err := newStructPLS(v.Addr().Interface())
                        if err != nil {
                                return fmt.Errorf("datastore: unsupported struct field: %v", err)

Otherwise, the PLS only triggers on embedded struct _pointers_ and won't trigger on embedded structs. (I also added support for opts.flatten)

Okay, I got this correctly round-tripping in a backwards compatible way, but I also had to modify propertyLoader.loadOneElement to actually delegate to the PLS for flattened properties...

--- a/projects/fullstory/go/src/cloud.google.com/go/datastore/load.go
+++ b/projects/fullstory/go/src/cloud.google.com/go/datastore/load.go
@@ -135,6 +135,15 @@ func (l *propertyLoader) loadOneElement(codec fields.List, structValue reflect.V

                var err error
                if field.Type.Kind() == reflect.Struct {
+                       // If the matched field implements PLS, delegate to the PLS and stop processing.
+                       if pls, ok := v.Addr().Interface().(PropertyLoadSaver); ok {
+                               p.Name = strings.TrimPrefix(p.Name, field.Name+".")
+                               if err := pls.Load([]Property{p}); err != nil {
+                                       return err.Error()
+                               }
+                               return ""
+                       }
+
                        codec, err = structCache.Fields(field.Type)
                        if err != nil {
                                return err.Error()

This may require some tweaking to work on embedded pointer-to-struct; I only tested it on embedded structs.

Hey @dragonsinth! In the future, it'd be great to track separate problems in a different gh issue, so folks can find this down the road if they're having a similar problem. And if your issue ends up being a real bug, we could close the one issue without closing off the original.

So the opts.flatten support you added won't make it into master b/c that should be implemented in the PLS Save() impl itself.

But I think your point about embedded struct values whose pointer type implements PLS not being respected is a bug. I've filed here: https://github.com/GoogleCloudPlatform/google-cloud-go/issues/551
Feel free to add more details. I'll work on fixing it today.

@adams-sarah the missing opts.flatten matters on both load and save. Both are currently broken with respect to embedded PLS structs. On the save side, if you skip checking opts.flatten you'll turn the returned props into an Entity, which changes the storage format to not respect the flatten, which won't be compatible with AppEngine. On the load side, what's missing is explicit support for routing "Foo.X" to the PLS for Foo... without the change I suggested you end up with a "missing field" error. Instead of checking the PLS for Foo, you end up loading the StructCodec for Foo and trying to resolve "X" which might not even be a real struct property.

Again, the underlying issue here is supporting the existing persisted data formats as well as interop with AppEngine code.

ah, i see what you're saying.
will make a separate issue.

closing, as we're resolving these issues in #552, #551
please reopen if this is incorrect.

@adams-sarah I patched in your changes and things look great! Thanks so much!

Awesome!

Was this page helpful?
0 / 5 - 0 ratings