The response of calling table.read(query) is a list of rows, but the row isn't a native object, instead it's a key-value pair-style object:
[ [ { name: 'id', value: { [Number: 1] value: '1' } } ],
[ { name: 'id', value: { [Number: 2] value: '2' } } ],
[ { name: 'id', value: { [Number: 3] value: '3' } } ]
Why wouldn't we match these up as a proper object? Since we don't customers end up having to write a helper-snippet just so they can pass around a useful object... That is, it'd be nice if the rows returned were...
[ {id: '1'}, {'id': '2'}, {'id': '3'} ]
Another clash of idiomatic vs API "what-ifs". Call .toJSON() on a row to get the most JS-like thing we could do.
I lost access to the repos we developed in, so I can't link to the history of the decision.
@vkedia This seems... silly. What were the what-ifs so we can sort this out?
I believe it was mostly because columns aren't always named.
Solution: don't include them in the default response, make them available in the apiResponse argument
Problem: The default response shouldn't exclude data
Yes the issue is that columns might not have names or there might be
duplicate names. So it's not guaranteed that each spanner row can be
converted to a json object.
On Mar 17, 2017 9:38 AM, "Stephen Sawchuk" notifications@github.com wrote:
I believe it was mostly because columns aren't always named.
Solution: don't include them in the default response, make them available
in the apiResponse argument
Problem: The default response shouldn't exclude data—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2103#issuecomment-287406133,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATdef_xv90p1bfpgznGfIObanj7jOIcDks5rmrcZgaJpZM4MgxAX
.
The guidelines of this library have always been: make the common case easy and the advanced case possible.
In this situation, it seems that we're making the common case annoying to make the advanced case possible, which doesn't make sense to me.
I believe there are a few reasons where we may have either missing names or ambiguous results, such as expressions (COUNT(*)) or duplicate names (due to interleaved tables). I think the right thing to do here is:
No matter what, the results should be addressable by their position in the query. That is, the results of SELECT a, b, c ... should always have the value of a as row[0] and b as row[1], etc.
If there is one and only one unambiguous value for a column with a name, that should be addressable by that name on the result object. That is, SELECT SingerName FROM Singers WHERE SingerId = 1 should return a row where I can get the result by addressing row.SingerName.
If there is ambiguity anywhere in the results (such as duplicate names), we should throw a helpful error saying:
"Sorry, you asked for $ColumnNameHere in the result, but there were multiple fields with that name. You should either use an alias in your query or address the result you want by its position."
How's that sound?
/cc @vkedia
/cc @stephenplusplus : I think the question then becomes about the best way to have a thing that acts partly like an array, but with a few addressable properties for things falling into item (2) above.
Yeah, this was my original solution as well. Wish I could link to the original discussion, hoping to get access back soon.
But basically:
row = {
0: 'value-1',
1: 'value-2',
2: 'value-3', // does not have a col name
namedProp1: 'value-1',
namedProp2: 'value-2'
}
for (var prop in row) {
// only iterates named properties
}
The hard part is how do we communicate the ambiguity error. If we throw or return it, that takes users down the non-success execution path of their code. Could we just use the docs to state "Here's how we return the response and here's why we do it" and leave it to the user to consider when building their query?
We also want:
row.forEach(function (column, i) {
// i = 0, column = { name: ... , value: ...}
// but we DON'T want i = namedProp1, column = value
});
Regarding the ambiguity, I see a couple different cases:
You're specifying the columns in a query and it results in having duplicate names.
Throwing an actual error here doesn't seem like a bad idea. You know your query, you'll see it and go "oh got it", and fix the error during development.
You're selecting * and it results in duplicates.
Again, throwing an error here seems reasonable. You can find this during development. It's not that scary to see "you are asking for something that is not clear... you should be more clear". Again, an actionable error message here is very important.
You're selecting * and right now there are no duplicates, but when you add a new column later to a table it results in duplicates.
Throwing an error here is a smidge scary because it means that production code may stop working due to the schema change. Then again, you're altering the schema on your live production instance while not being specific about what you want. I think the better thing to do here is to recommend that you be specific in how you address fields (via an index) or don't duplicate names (in the global docs we do this, by suggesting you avoid the vague name of Id and instead use specific names like SingerId).
Being totally honest here, I suspect that most people won't even think about this and will blindly take the results we have now and turn them into an object. In that case, the last value found (for a duplicate) would win, and people would go looking for why the value for their id column is wrong, not realizing that their little helper function blasted out the first one and swapped it with the second. If anything, throwing an error here is a very good idea unless Spanner gives us enough information to tell which table a field came from (but I don't believe it does).
So my vote still is for the "if you ask for a named property that is ambiguous, we throw an actionable error message saying so". We also document in the run() method that if you're using SELECT *, you should be careful because you may grab stuff that conflicts and that may cause an error if you ask for something by it's name and we don't know which one you mean.
So I went through the earlier discussion and @stephenplusplus had suggested the exact solution there as well. But my objection at that point was around duplicate column names. I am fine if column with duplicate names cannot be retrieved by name and we just document that fact. But I would also like to have some way for user to know those column names. So lets says columns 0 and 1 both have the name 'a'. I am ok if both of them are just addressable by index. But there should be some way for user to get the names for both these columns. It can be some sort of extra metadata property on the row object which just contains the names and types for each column.
But there should be some way for user to get the names for both these columns.
I'm confused what you mean by this. Can you explain?
So user should be able to know what the name of a column whose index is i is. It can be as some sort of metadata on the row object.
could the metadata only exist if there is a conflict? adding a metadata object to every row sounds really expensive...
This issue was moved to googleapis/nodejs-spanner#27
Most helpful comment
The guidelines of this library have always been: make the common case easy and the advanced case possible.
In this situation, it seems that we're making the common case annoying to make the advanced case possible, which doesn't make sense to me.
I believe there are a few reasons where we may have either missing names or ambiguous results, such as expressions (
COUNT(*)) or duplicate names (due to interleaved tables). I think the right thing to do here is:No matter what, the results should be addressable by their position in the query. That is, the results of
SELECT a, b, c ...should always have the value ofaasrow[0]andbasrow[1], etc.If there is one and only one unambiguous value for a column with a name, that should be addressable by that name on the result object. That is,
SELECT SingerName FROM Singers WHERE SingerId = 1should return a row where I can get the result by addressingrow.SingerName.If there is ambiguity anywhere in the results (such as duplicate names), we should throw a helpful error saying:
How's that sound?
/cc @vkedia