Ecto: Consider adding u.settings["field"] for reading JSON fields

Created on 25 Jan 2020  路  7Comments  路  Source: elixir-ecto/ecto

The biggest questions are:

  1. Can we make it work for multiple databases?
  2. What happens when cascading and the field does not exist? For example, u.settings["unknown"]["omg"]
  3. What happens when cascading and the field is not a submap? For example, u.settings["field"]["omg"]
  4. Can the field be a dynamic value?

Most helpful comment

An additional question is if the syntax should also be supported for fields that are embedded schemas. Should u.settings["field"] work if :settings is an embeds_one? Or should the syntax be u.settings.field? Should embeds_many be supported and what would that syntax be?

EDIT: Solving it for embeds_one may be an easier first step than JSON fields. For embedded schemas we know the types so question 2, 3, and 4 of @josevalim's original questions are non-issues.

All 7 comments

Here are some results for pg 12.1 & mysql 8.0.18, as you can see below we could transform
u.settings["field"] into json _object_ path lookup, u.settings[0] into json _array_ path
lookup, and we can mix them up too:

pg:

select json_extract_path('{"a":{"b":"c"}}', 'a', 'b');
"c"

-- select non-existent field
select json_extract_path('{"a":{"b":"c"}}', 'a', 'bad');
NULL

select json_extract_path('{"a":{"b":"c"}}', 'a', 'bad', 'bad');
NULL

-- operation on non-json value
wojtek=# select json_extract_path(42, 'a', 'b');
ERROR:  function json_extract_path(integer, unknown, unknown) does not exist
LINE 1: select json_extract_path(42, 'a', 'b');
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

-- arrays
select json_extract_path('[[1, 2], [3, 4]]', '1', '1');
4

select json_extract_path('[[1, 2], [3, 4]]', '9', '9');
NULL

-- array & object
select json_extract_path('[{"a": [42]}]', '0', 'a', '0');
42

-- dynamic fields
select json_extract_path('[[1, 2], [3, 4]]', (select 2-1)::text, (select 2-1)::text);
4

mysql:

select json_extract('{"a":{"b":"c"}}', '$.a.b');
"c"

-- select non-existent field
select json_extract('{"a":{"b":"c"}}', '$.a.bad');
NULL

select json_extract('{"a":{"b":"c"}}', '$.a.bad.bad');
NULL

-- operation on non-json value
select json_extract(42, '$.a.b');
ERROR 3146 (22032): Invalid data type for JSON data in argument 1 to function json_extract; a JSON string or JSON type is required.

-- arrays
json_extract('[[1, 2], [3, 4]]', '$[1][1]');
4

select json_extract('[[1, 2], [3, 4]]', '$[9][9]');
json_extract('[[1, 2], [3, 4]]', '$[9][9]');
NULL

-- array & object
select json_extract('[{"a": [42]}]', '$[0].a[0]');
42

-- dynamic fields
select json_extract('[[1, 2], [3, 4]]', concat('$[', (select 2-1), ']', '[', (select 2-1), ']'));
4

At least for pg & mysql, we have an answer to all 4 questions!

Yeah, as far as I remember when we first talked about this when embeds_many was introduced, the issue was arrays and integers. I think we finally settled on the view that foo[0] could be confusing, because it would only work in ecto syntax, while it doesn't work with regular lists.

Good point.

To play the devil's advocate, in from(p in "packages", select: count(p), where: p.inserted_at >= ^~N[2020-01-01 00:00:00]), comparisons do the right thing in the DB but not in Elixir. I personally don't know where to draw the line.

I think if we support just json objects, but not json arrays, the feature would feel incomplete. And hey perhaps that was exactly the consensus not to add it back then!

Maybe we should collect more data how other non-sql data stores handles this, help with that would be appreciated.

An additional question is if the syntax should also be supported for fields that are embedded schemas. Should u.settings["field"] work if :settings is an embeds_one? Or should the syntax be u.settings.field? Should embeds_many be supported and what would that syntax be?

EDIT: Solving it for embeds_one may be an easier first step than JSON fields. For embedded schemas we know the types so question 2, 3, and 4 of @josevalim's original questions are non-issues.

Some more data points:

sqlite3:

sqlite> select json_extract('{"a":{"b":"c"}}', '$.a');
{"b":"c"}

sqlite> select json_extract('{"a":{"b":"c"}}', '$.a.b');
c

sqlite> select json_extract('{"a":{"b":"c"}}', '$.a.bad.bad');
  -- (empty)

sqlite> select json_extract('[[1, 2], [3, 4]]', '$[1][1]');
4

mssql:

mssql> select json_value('{"a":{"b":"c"}}', '$.a');
NULL

mssql> select json_query('{"a":{"b":"c"}}', '$.a');
{"b":"c"}

mssql> select json_value('{"a":{"b":"c"}}', '$.a.b');
c

mssql> select json_value('{"a":{"b":"c"}}', '$.a.bad.bad');
NULL

mssql> select json_value('[[1, 2], [3, 4]]', '$[1][1]');
4

notice, json_value vs json_query, json_value is for retrieving _scalar_ values (numbers, strings) and json_query is for arrays/objects. This presents a problem because ideally we'd use the same function to retrieve all possible values in a json column, same as in other DB engines. These semantics are also part of SQL:2016 (zip pdf).

Worth mentioning that SQL:2016 also specifies _modes_ for json path, the default is "lax" which returns nulls on errors (e.g. we try to access a field that doesn't exist, or we try to use json_value on an object). There's also a "strict" mode which works like this:

mssql> select json_value('{"a":{"b":"c"}}', 'strict $.a');
Msg 13623, Level 16, State 2, Server 797ca3270cbd, Line 1
Scalar value cannot be found in the specified JSON path.

but to be consistent with other DB engines we'd probably use the lax mode anyway.

For embeds we'd know what type a given field is so we'd use the proper function but for the f["a"]["b"] syntax we don't and we'd return NULL. That would have to be a limitation of a mssql adapter.

For embeds we'd know what type a given field is so we'd use the proper function but for the f["a"]["b"] syntax we don't and we'd return NULL. That would have to be a limitation of a mssql adapter.

This limitation for mssql (and other sql standard-compliant json implementations), is it a deal-breaker? On one hand we won't be able to provide good user experience, people will be getting NULLs when they'd expect maps and there's nothing we can do about it. This is only a problem for deeply nested json columns, all the other use cases should work well on mssql.

If we drop f["a"]["b"] syntax and just do f.a.b then that's already very valuable for the users but people usually shouldn't use that in the migrations (they usually should use tables and not schemas as sources for queries) so we limit usefulness of this feature.

Thoughts?

We have been ok with having inconsistencies between databases, because otherwise we would always limit ourselves to the lowest common denominator. Maybe for MSSQL it is best to not support this feature at all than support it slightly different than the other databases. Then in the future we can add json_value and json_query as API functions (instead of syntax sugar) if all databases adopt it according to the standard.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jbence picture jbence  路  3Comments

shahryarjb picture shahryarjb  路  3Comments

ZhengQingchen picture ZhengQingchen  路  4Comments

nathanjohnson320 picture nathanjohnson320  路  4Comments

atsheehan picture atsheehan  路  4Comments