Jooq: Add support for SQLite's fts3 and fts4 virtual tables

Created on 1 Jan 2021  Â·  19Comments  Â·  Source: jOOQ/jOOQ

Use case:

I'm currently trying to insert data into fts4 4table, created according to https://www.sqlite.org/fts3.html, but jOOQ can't detect data types in this table, whilst they are quite simple, for example here:

create virtual table pages using fts4(
content TEXT,
book_id INTEGER REFERENCES title(id)
);

Possible solution you'd like to see:

I believe generator could detect data types in such kind of tables

Versions:

Code Generation C SQLite Parser All Editions Medium Fixed Enhancement

All 19 comments

Thanks for your suggestion. We can't easily detect the types with queries on SQLITE_MASTER and PRAGMA_TABLE_INFO, as you can verify here:

select * 
from sqlite_master t, pragma_table_info(t.tbl_name) c
where t.tbl_name = 'pages';

Resulting in:

|type|name|tbl_name|rootpage|sql|cid|name|type|notnull|dflt_value|pk|
|----|----|--------|--------|---|---|----|----|-------|----------|--|
|table|pages|pages|0|CREATE VIRTUAL TABLE pages using fts4(content TEXT,book_id INTEGER REFERENCES title(id))|0|content||0||0|
|table|pages|pages|0|CREATE VIRTUAL TABLE pages using fts4(content TEXT,book_id INTEGER REFERENCES title(id))|1|book_id||0||0|

The type column being empty.

We could attempt to parse the SQL text and reverse engineer data types from there, though. That's what we're already doing to detect check constraints. I guess the parser would have to be able to parse and ignore the virtual and using clauses.

Do you have any other ideas on how to reverse engineer these columns, whose type information / affinity is missing?

Interesting, this syntax isn't documented in CREATE TABLE: https://sqlite.org/lang_createtable.html

It's described in separate place: https://sqlite.org/lang_createvtab.html

Not only for check constraints, but also for views, we've already parsed the SQL sources to find out missing data types: #8278

Regarding types, it actually looks like a bug in SQLite:

create virtual table pages using fts4(
  content TEXT,
  book_id INTEGER REFERENCES title(id)
);

insert into pages values (1, 'a');
-- Query OK, 1 row affected
select * from pages;
-- +---------+---------+
-- | content | book_id |
-- +---------+---------+
-- | 1       | a       |
-- +---------+---------+

If it's meant that SQL should be typesafe — it isn't here.

Also according to docs if one creates a virtual table without arguments (using fts3();) resulting table will contain a single non-hidden content column.

Everything derived from such virtual tables won't have types also:

create temp table tmp2 as select * from pages limit 1;
PRAGMA table_xinfo(tmp2);

This PRAGMA statement won't output any types.

If it's meant that SQL should be typesafe — it isn't here.

"Feature", not a bug: https://www.sqlite.org/datatype3.html. It's called "type affinity".

Oh, shoot. But then there is a bug in the documentation:

Each column in an SQLite 3 database is assigned one of the following type affinities

And we've just found a way to create columns without affinities :)

Anyway. While we're at it, I wonder if we should implement some logic to recognise all the implementation tables and ignore them? Creating this table:

CREATE VIRTUAL TABLE t_11172 USING fts4 (
  content text,
  id integer not null primary key
)

I'm getting these implementation tables for the virtual table in sqlite_master:

|type|name|tbl_name|rootpage|sql|
|----|----|--------|--------|---|
|table|t_11172|t_11172|0|CREATE VIRTUAL TABLE t_11172 USING fts4 (content text, id integer not null primary key)|
|table|t_11172_content|t_11172_content|2|CREATE TABLE 't_11172_content'(docid INTEGER PRIMARY KEY, 'c0content', 'c1id')|
|table|t_11172_segments|t_11172_segments|4|CREATE TABLE 't_11172_segments'(blockid INTEGER PRIMARY KEY, block BLOB)|
|table|t_11172_segdir|t_11172_segdir|5|CREATE TABLE 't_11172_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx))|
|table|t_11172_docsize|t_11172_docsize|9|CREATE TABLE 't_11172_docsize'(docid INTEGER PRIMARY KEY, size BLOB)|
|table|t_11172_stat|t_11172_stat|10|CREATE TABLE 't_11172_stat'(id INTEGER PRIMARY KEY, value BLOB)|

I don't think users would like to see any of them in generated code, apart from the one they created? But how to recognise them formally?

I think this is implementation-specific detail. Ideally, we should know that fts3(4) will create this set of implementation tables, as well as that fts3(4) without arguments will create a table with a single content column.

Thinking about this again...

I think this is implementation-specific detail. Ideally, we should know that fts3(4) will create this set of implementation tables

Users can just exclude all .*?_(content|segments|segdir|docsize|stat) tables from being generated in SQLite. Or they don't if they don't care about the extra classes.

I'll also add a flag <includeSystemTables/> (similar to other <includeSystemXYZ/> flags) that allows for doing this automatically, defaulting to true: https://github.com/jOOQ/jOOQ/issues/11176

as well as that fts3(4) without arguments will create a table with a single content column.

Thanks for the reminder. This could be done entirely in the parser

Thank you very much, I believe this is perfect decision!

But the latest fix looks not completely correct, for example

CREATE VIRTUAL TABLE stat2 USING dbstat();
describe stat2;

outputs

| cid | name       | type    | notnull | dflt_value | pk |
+-----+------------+---------+---------+------------+----+
| 0   | name       | TEXT    | 0       | <null>     | 0  |
| 1   | path       | TEXT    | 0       | <null>     | 0  |
| 2   | pageno     | INTEGER | 0       | <null>     | 0  |
| 3   | pagetype   | TEXT    | 0       | <null>     | 0  |
| 4   | ncell      | INTEGER | 0       | <null>     | 0  |
| 5   | payload    | INTEGER | 0       | <null>     | 0  |
| 6   | unused     | INTEGER | 0       | <null>     | 0  |
| 7   | mx_payload | INTEGER | 0       | <null>     | 0  |
| 8   | pgoffset   | INTEGER | 0       | <null>     | 0  |
| 9   | pgsize     | INTEGER | 0       | <null>     | 0  |
+-----+------------+---------+---------+------------+----+

Well, how can we reliably handle this for all possible values of USING <xyz> without re-implementing 80% of SQLite in jOOQ? After all, this is just a somewhat wonky workaround for something that is really a flaw in SQLite...

... I mean, one solution is to just ignore these tables again, if they don't have an explicit column list.

Another solution would be to perform sequential checks: 1. Rad table descriptions. If types are there — we're fine and don't need to parse CREATE TABLE. 2. Parse DDL and try to find types there. 3. If nothing found — fallback to CLOB or Any. CLOB should work for anything because of type affinity. And a user will be able to define custom type if CLOB won't work for them.

If types are there — we're fine

That's already being done this way

Parse DDL and try to find types there

That too

If nothing found — fallback to CLOB or Any

I prefer to stick with OTHER (which also produces a @Deprecated annotation) than pretend we know what it is

Agreed, thank you for the clarification!

There, I reverted that part of the change.

Perhaps, a reasonable solution for implicit column definitions appears in the future. The current solution will definitely work for what you've presented in your original issue description, which I believe will be a more common case. After all, you probably want to have a foreign key in these tables, and possibly, some additional meta data.

Was this page helpful?
0 / 5 - 0 ratings