Google-cloud-node: Spanner table.read() returns a list of column name-value pairs. Why not an object?

Created on 17 Mar 2017 · 14Comments · Source: googleapis/google-cloud-node

The response of calling table.read(query) is a list of rows, but the row isn't a native object, instead it's a key-value pair-style object:

[ [ { name: 'id', value: { [Number: 1] value: '1' } } ],
 [ { name: 'id', value: { [Number: 2] value: '2' } } ],
 [ { name: 'id', value: { [Number: 3] value: '3' } } ]

Why wouldn't we match these up as a proper object? Since we don't customers end up having to write a helper-snippet just so they can pass around a useful object... That is, it'd be nice if the rows returned were...

[ {id: '1'}, {'id': '2'}, {'id': '3'} ]

spanner p2

Source

jgeewax

Most helpful comment

The guidelines of this library have always been: make the common case easy and the advanced case possible.

In this situation, it seems that we're making the common case annoying to make the advanced case possible, which doesn't make sense to me.

I believe there are a few reasons where we may have either missing names or ambiguous results, such as expressions (COUNT(*)) or duplicate names (due to interleaved tables). I think the right thing to do here is:

No matter what, the results should be addressable by their position in the query. That is, the results of SELECT a, b, c ... should always have the value of a as row[0] and b as row[1], etc.
If there is one and only one unambiguous value for a column with a name, that should be addressable by that name on the result object. That is, SELECT SingerName FROM Singers WHERE SingerId = 1 should return a row where I can get the result by addressing row.SingerName.
If there is ambiguity anywhere in the results (such as duplicate names), we should throw a helpful error saying:

"Sorry, you asked for $ColumnNameHere in the result, but there were multiple fields with that name. You should either use an alias in your query or address the result you want by its position."

How's that sound?

/cc @vkedia

jgeewax on 17 Mar 2017

❤1 👍1

All 14 comments

Another clash of idiomatic vs API "what-ifs". Call .toJSON() on a row to get the most JS-like thing we could do.

I lost access to the repos we developed in, so I can't link to the history of the decision.

stephenplusplus on 17 Mar 2017

@vkedia This seems... silly. What were the what-ifs so we can sort this out?

jgeewax on 17 Mar 2017

👍1

I believe it was mostly because columns aren't always named.

Solution: don't include them in the default response, make them available in the apiResponse argument
Problem: The default response shouldn't exclude data

stephenplusplus on 17 Mar 2017

Yes the issue is that columns might not have names or there might be
duplicate names. So it's not guaranteed that each spanner row can be
converted to a json object.

On Mar 17, 2017 9:38 AM, "Stephen Sawchuk" notifications@github.com wrote:

I believe it was mostly because columns aren't always named.

Solution: don't include them in the default response, make them available
in the apiResponse argument
Problem: The default response shouldn't exclude data

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2103#issuecomment-287406133,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATdef_xv90p1bfpgznGfIObanj7jOIcDks5rmrcZgaJpZM4MgxAX
.

vkedia on 17 Mar 2017

The guidelines of this library have always been: make the common case easy and the advanced case possible.

In this situation, it seems that we're making the common case annoying to make the advanced case possible, which doesn't make sense to me.

No matter what, the results should be addressable by their position in the query. That is, the results of SELECT a, b, c ... should always have the value of a as row[0] and b as row[1], etc.
If there is one and only one unambiguous value for a column with a name, that should be addressable by that name on the result object. That is, SELECT SingerName FROM Singers WHERE SingerId = 1 should return a row where I can get the result by addressing row.SingerName.
If there is ambiguity anywhere in the results (such as duplicate names), we should throw a helpful error saying:

"Sorry, you asked for $ColumnNameHere in the result, but there were multiple fields with that name. You should either use an alias in your query or address the result you want by its position."

How's that sound?

/cc @vkedia

jgeewax on 17 Mar 2017

❤1 👍1

/cc @stephenplusplus : I think the question then becomes about the best way to have a thing that acts partly like an array, but with a few addressable properties for things falling into item (2) above.

jgeewax on 17 Mar 2017

Yeah, this was my original solution as well. Wish I could link to the original discussion, hoping to get access back soon.

stephenplusplus on 17 Mar 2017

But basically:

row = {
  0: 'value-1',
  1: 'value-2',
  2: 'value-3', // does not have a col name

  namedProp1: 'value-1',
  namedProp2: 'value-2'
}

for (var prop in row) {
  // only iterates named properties
}

The hard part is how do we communicate the ambiguity error. If we throw or return it, that takes users down the non-success execution path of their code. Could we just use the docs to state "Here's how we return the response and here's why we do it" and leave it to the user to consider when building their query?

stephenplusplus on 17 Mar 2017

We also want:

row.forEach(function (column, i) {
  // i = 0, column = { name: ... , value: ...}
  // but we DON'T want i = namedProp1, column = value
});

Regarding the ambiguity, I see a couple different cases:

You're specifying the columns in a query and it results in having duplicate names.

Throwing an actual error here doesn't seem like a bad idea. You know your query, you'll see it and go "oh got it", and fix the error during development.
You're selecting * and it results in duplicates.

Again, throwing an error here seems reasonable. You can find this during development. It's not that scary to see "you are asking for something that is not clear... you should be more clear". Again, an actionable error message here is very important.
You're selecting * and right now there are no duplicates, but when you add a new column later to a table it results in duplicates.

Throwing an error here is a smidge scary because it means that production code may stop working due to the schema change. Then again, you're altering the schema on your live production instance while not being specific about what you want. I think the better thing to do here is to recommend that you be specific in how you address fields (via an index) or don't duplicate names (in the global docs we do this, by suggesting you avoid the vague name of Id and instead use specific names like SingerId).

Being totally honest here, I suspect that most people won't even think about this and will blindly take the results we have now and turn them into an object. In that case, the last value found (for a duplicate) would win, and people would go looking for why the value for their id column is wrong, not realizing that their little helper function blasted out the first one and swapped it with the second. If anything, throwing an error here is a very good idea unless Spanner gives us enough information to tell which table a field came from (but I don't believe it does).

So my vote still is for the "if you ask for a named property that is ambiguous, we throw an actionable error message saying so". We also document in the run() method that if you're using SELECT *, you should be careful because you may grab stuff that conflicts and that may cause an error if you ask for something by it's name and we don't know which one you mean.

jgeewax on 17 Mar 2017

So I went through the earlier discussion and @stephenplusplus had suggested the exact solution there as well. But my objection at that point was around duplicate column names. I am fine if column with duplicate names cannot be retrieved by name and we just document that fact. But I would also like to have some way for user to know those column names. So lets says columns 0 and 1 both have the name 'a'. I am ok if both of them are just addressable by index. But there should be some way for user to get the names for both these columns. It can be some sort of extra metadata property on the row object which just contains the names and types for each column.

vkedia on 17 Mar 2017

But there should be some way for user to get the names for both these columns.

I'm confused what you mean by this. Can you explain?

jgeewax on 17 Mar 2017

So user should be able to know what the name of a column whose index is i is. It can be as some sort of metadata on the row object.

vkedia on 17 Mar 2017

could the metadata only exist if there is a conflict? adding a metadata object to every row sounds really expensive...

chadbr on 20 Mar 2017

This issue was moved to googleapis/nodejs-spanner#27

stephenplusplus on 2 Nov 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

safeSearchDetection response is of incorrect type.

nicolasgarnier · 4Comments

Circle: OS X failures

stephenplusplus · 3Comments

@google-cloud/speech Operation not returning.

jackzampolin · 3Comments

GCP Node.js How-To Guide contains conflicting information regarding app.yaml handlers

bamapookie · 5Comments

Should our promises be documented as first class?

stephenplusplus · 4Comments