Jooq: Add support for JSON / JSONB types

Created on 1 Mar 2018  路  21Comments  路  Source: jOOQ/jOOQ

A lot of people are using vendor-specific JSON (or in PostgreSQL: JSONB) data types. We should support those types natively and bind them to a string representation in Java.

If users prefer to bind the types to something like Jackson or gson, they can do so easily by using a Converter. The same can be done for JSR 367 bindings.

Some useful information about JSON standards compliance:
https://obartunov.livejournal.com/200076.html

Support is being added for:

  • [x] ~DB2~ (in a later release: #8948)
  • [x] MySQL
  • [x] Oracle
  • [x] PostgreSQL
  • [x] SQL Server (no code generation support yet)
  • [x] Others: In a later release: #8948

See also: #7235

Functionality All Editions High Fixed Enhancement Incompatible change

Most helpful comment

Implemented for the most popular dialects in jOOQ 3.12 (see list in issue description). More dialects will have JSON support in jOOQ 3.13: #8948

All 21 comments

Would it be possible to at least provide an example implementation of Binding that simply translates any Postgres jsonb field to String and vice versa?

I read the docs (https://www.jooq.org/doc/3.11/manual/code-generation/custom-data-type-bindings/) and while most of it looks relatively straight-forward, from the example the sql() method looks relatively advanced and (more importantly) prone to errors and bugs when the implementor does not know anything about Jooq internals (but maybe it's not - hard to judge from my perspective).

In general I am a little disappointed that basic "jsonb as String" support is still not currently supported in Jooq. In plain JDBC it's actually relatively simple to deal with jsonb by manually dealing with PGobject, which I also wouldn't mind Jooq providing. Jooq implicitly does this, by just returning the JDBC object as Object by default, which happens to be a PGobject in the case of jsonb. But unfortunately it's a little ugly to deal with (with @Deprecated annotations and manual type casting).

I understand you have a lot on your plate, so perhaps this reference implementation of Binding that people can copy-paste without much worry could be a good first step to at least help people on their way.

Thanks for the ping, @t9t

Indeed, it has already been decided that we should offer a Binding<"JSON", String> data type mapping out of the box, and let users override this with a more sophisticated binding if needed. I'm not sure anymore, why this got postponed again, but I think we should finally deliver on this, given the number of upvotes on the issue.

Exposing the PGobject is not an option, as this is vendor specific. A jOOQ provided solution must also work with MySQL, Oracle, SQL Server, etc.

I'll look into what this means in detail right away.

An additional option: We could roll our own JSON type that wraps a string and might offer additional functionality in the future, e.g. once the JDK offers a standard JSON binding API... jOOQ already uses https://github.com/fangyidong/json-simple internally, to import/export JSON data. Reusing that library is another option.

Having a custom JSON type seems imperative to type safety, otherwise, users would not be able to distinguish between ordinary string types and actual JSON types, or even compare them, or pass JSON expressions to VARCHAR expecting functions.

One PostgreSQL specific caveat will be the distinction between json and jsonb types. It is not unlikely for jOOQ to need to know the exact type, not just knowing that it is a "json-ish" type, as jOOQ often has to generate explicit type casts...

This will be a breaking change in PostgreSQL, as users will have implemented their own Binding<Object, ?> implementations. Whatever we do now, the database type will no longer be Object

A simple benchmark shows that the distinction between json and jsonb types in PostgreSQL is relevant:

create table if not exists t_json (j json);
create table if not exists t_jsonb (j jsonb);

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 200000;
  v_json text := '{}';
  rec RECORD;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_json values (v_json::json);
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_json values (v_json::jsonb);
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_jsonb values (v_json::json);
    END LOOP;

    RAISE INFO 'Run %, Statement 3: %', r, (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_jsonb values (v_json::jsonb);
    END LOOP;

    RAISE INFO 'Run %, Statement 4: %', r, (clock_timestamp() - v_ts); 
    RAISE INFO '';
  END LOOP;
END$$;

drop table if exists t_json;
drop table if exists t_jsonb;

The output is consistently slower when a cast is needed. It doesn't matter in which direction we're casting, and it doesn't matter if we're using json or jsonb in this simple case:

00000: Run 1, Statement 1: 00:00:00.524751
00000: Run 1, Statement 2: 00:00:00.659905
00000: Run 1, Statement 3: 00:00:00.628354
00000: Run 1, Statement 4: 00:00:00.51718
00000: 
00000: Run 2, Statement 1: 00:00:00.516217
00000: Run 2, Statement 2: 00:00:00.64406
00000: Run 2, Statement 3: 00:00:00.642708
00000: Run 2, Statement 4: 00:00:00.526859
00000: 
00000: Run 3, Statement 1: 00:00:00.502883
00000: Run 3, Statement 2: 00:00:00.636907
00000: Run 3, Statement 3: 00:00:00.636343
00000: Run 3, Statement 4: 00:00:00.525677
00000: 
00000: Run 4, Statement 1: 00:00:00.499228
00000: Run 4, Statement 2: 00:00:00.650017
00000: Run 4, Statement 3: 00:00:00.638345
00000: Run 4, Statement 4: 00:00:00.524202
00000: 
00000: Run 5, Statement 1: 00:00:00.504289
00000: Run 5, Statement 2: 00:00:00.649801
00000: Run 5, Statement 3: 00:00:00.635239
00000: Run 5, Statement 4: 00:00:00.535

The results are "obvious" of course. We'd hope that using the correct data type should be faster than needing to convert it, but I was curious about how significant this difference is. So we need to get this right. I guess native support for jsonb for all dialects would be an option here. After all, varchar and varbinary are standard SQL types as well, even if on a very technical level, they mean the same thing.

The SQL standard doesn't distinguish between the two physical JSON representations, and I'm not aware of any other dialects than PostgreSQL doing this.

In the mean time I had created this little one-class library: https://github.com/t9t/jooq-postgresql-json

At least to have a reusable away to bind json and jsonb fields to String in my projects. My plan was to also add the various jsonb functions and operators (such as the ?, |?, ->>, etc) to the jooq-postgresql-json library in the coming weeks.

I naively and without any performance test just used ?::json there, but it looks like you have pretty conclusively proven that that is not a good idea. :)

In the mean time I had created this little one-class library: https://github.com/t9t/jooq-postgresql-json

Yes, I've seen it :)

My plan was to also add the various jsonb functions and operators (such as the ?, |?, ->>, etc) to the jooq-postgresql-json library in the coming weeks.

Sure, that's something that would be useful in PostgresDSL, for example.

I naively and without any performance test just used ?::json there, but it looks like you have pretty conclusively proven that that is not a good idea. :)

That was a very naive benchmark indeed, casting only empty objects {}. I'm assuming this gets worse for larger JSON data points. Easy to reproduce:

create table if not exists t_json (j json);
create table if not exists t_jsonb (j jsonb);

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 200000;
  v_json text := '{"a":1,"b":2,"c":[3, 4, 5, 6]}';
  rec RECORD;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_json values (v_json::json);
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_json values (v_json::jsonb);
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_jsonb values (v_json::json);
    END LOOP;

    RAISE INFO 'Run %, Statement 3: %', r, (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      insert into t_jsonb values (v_json::jsonb);
    END LOOP;

    RAISE INFO 'Run %, Statement 4: %', r, (clock_timestamp() - v_ts); 
    RAISE INFO '';
  END LOOP;
END$$;

drop table if exists t_json;
drop table if exists t_jsonb;

Now getting:

00000: Run 1, Statement 1: 00:00:00.569626
00000: Run 1, Statement 2: 00:00:01.169486
00000: Run 1, Statement 3: 00:00:01.034296
00000: Run 1, Statement 4: 00:00:00.855842
00000: 
00000: Run 2, Statement 1: 00:00:00.567955
00000: Run 2, Statement 2: 00:00:01.154019
00000: Run 2, Statement 3: 00:00:01.033361
00000: Run 2, Statement 4: 00:00:00.897972
00000: 
00000: Run 3, Statement 1: 00:00:00.57026
00000: Run 3, Statement 2: 00:00:01.159787
00000: Run 3, Statement 3: 00:00:01.019552
00000: Run 3, Statement 4: 00:00:00.853984
00000: 
00000: Run 4, Statement 1: 00:00:00.576009
00000: Run 4, Statement 2: 00:00:01.193102
00000: Run 4, Statement 3: 00:00:01.027302
00000: Run 4, Statement 4: 00:00:00.916919
00000: 
00000: Run 5, Statement 1: 00:00:00.570805
00000: Run 5, Statement 2: 00:00:01.154218
00000: Run 5, Statement 3: 00:00:01.02407
00000: Run 5, Statement 4: 00:00:00.861866

Again, the conversion between types is the worst case. json -> json performs best, because no data processing is being done, which is a well known property of the json type. However, when reading and querying json or jsonb, jsonb will heavily outperform json.

This is still for very small json documents.

An interesting article can be seen here:
https://www.2ndquadrant.com/en/blog/processing-json

Apart from performance, if you're always casting to ::json, you will not be able to correctly use a bunch of operators / functions, which may work with json and/or jsonb, so we really have to maintain this type information in jOOQ's runtime, in order to apply the correct cast.

This was surprisingly easy to do... I've added support for PostgreSQL's json and jsonb data types. Will look into supporting other dialects later this week

Once we have support for these types, we can also support some operators / functions out of the box. It is unlikely we'll support PostgreSQL's ASCII syntax directly, but other databases (like MySQL) have useful functions that should be "obvious" to emulate on PostgreSQL as well: #8944

I guess we'll have to do the same thing for XML, add an org.jooq.XML type, and support it in generated code... I've added another feature request for this: #8945

In Oracle, we have to check the all_json_columns dictionary view to see if a column is a JSON column, as Oracle doesn't have an actual JSON data type for DDL - it's a constraint:

CREATE TABLE t (j VARCHAR2 (4000)
   CONSTRAINT ensure_json CHECK (po_document IS JSON)
);

We could think of Oracle's BLOB backed JSON columns as JSONB:

CREATE TABLE j (
  s clob CHECK (s IS JSON),
  b blob CHECK (b IS JSON)
);

select *
from user_json_columns;

Yielding

TABLE_NAME  OBJECT_TYPE   COLUMN_NAME   FORMAT   DATA_TYPE
----------------------------------------------------------
J           TABLE         B             TEXT     BLOB
J           TABLE         S             TEXT     CLOB

Oracle's "JSONB" type has to be bound / inlined like ordinary blobs. ~The same is true for SQL Server~

SQL Server doesn't seem to offer a way of recognising JSON columns like Oracle. We'll think about this in a future version of jOOQ (check constraints could be reverse engineered).

Implemented for the most popular dialects in jOOQ 3.12 (see list in issue description). More dialects will have JSON support in jOOQ 3.13: #8948

It was breaking my expectations that the only way to retrieve the value from JSON and JSONB instances is using toString() and not an accessor (eg. getData()). And to me it is a little bit awkward that in the case of an SQL NULL, toString() returns null (at least in the case of PostgreSQL).

I would definitely be nice if there was some documentation what happens in the cases of SQL NULL and the JSON null value (which translate to null and "null" respectively, for PostgreSQL).

Thanks for your comment, @t9t. You're right, we'll fix these two issues ASAP: https://github.com/jOOQ/jOOQ/issues/9265

Much appreciated! I'll track that one and related issues.

Hi Lukas,

I'm having some troubles with the new JSONB type. Maybe I misunderstood something:

Prior to the JSONB type in jooq, I had this (simplified) query:

jooq.insertInto(MY_TABLE)
    .set(MY_TABLE.MY_COLUMN, DSL.val(aString))
    .execute();

If aString was null, I would have a null postgres column ;
If aString was equal to "null", I would have a null jsonb object in my column ;

With the new version of jooq, MY_COLUMN is typed as a JSONB on the java side, the query becames:

jooq.insertInto(MY_TABLE)
    .set(MY_TABLE.MY_COLUMN, DSL.val(JSONB.valueOf(aString)))
    .execute();

If aString is null, I will have a null jsonb object in my column due to the wrapping of the JSONB class, instead of null value in my column.

Since the JSONB class seems to be a convenient wrapper around the jsonb data, I was expecting the same behaviour as before:

  • JSONB.valueOf(null) would represent a null column
  • JSONB.valueOf("null") would represent a null jsonb JSONB object in my column

Currently, JSONB.valueOf(null) and JSONB.valueOf("null") are not equals, but will have both insert a null jsonb object in database, which I find a little confusing.

Of course, I could simply work around that by cheking the nullity of the String value at every insert.
I was just wondering if I misunderstood something, and if there is a way that would be easier to keep the JSONB type and the previous behaviour ?

@paulomatias: Thanks for your message.

If aString is null, I will have a null jsonb object in my column due to the wrapping of the JSONB class, instead of null value in my column.

To get a null value, please use DSL.val(null, SQLDataType.JSONB) instead. You don't even have to wrap things in DSL.val(), you can just write this:

    .set(MY_TABLE.MY_COLUMN, aString == null ? null : JSONB.valueOf(aString))

Currently, JSONB.valueOf(null) and JSONB.valueOf("null") are not equals, but will have both insert a null jsonb object in database, which I find a little confusing.

Yes, see: https://github.com/jOOQ/jOOQ/issues/10343

I hope this helps

Was this page helpful?
0 / 5 - 0 ratings