Testcontainers-java: ScriptUtils parsing failed when using field name with 'end' keyword

Created on 13 Jun 2019  路  8Comments  路  Source: testcontainers/testcontainers-java

I'm using runInitScript and it seems to falsely parse the sql:

create database if not exists ttt;

use ttt;

create table aaa
(
    id                  bigint auto_increment   primary key,
    end_time            datetime     null       COMMENT 'end_time',
    data_status         varchar(16)  not null
) comment 'aaa';

create table bbb
(
    id                  bigint auto_increment   primary key
) comment 'bbb';

I've stepped into the code and found that splitSqlScript in ScriptUtils thinks there are 3 statements: create database, use and combined 2 create table. The last statement, of course, throws sql syntax error.

modulejdbc resolutioacknowledged typbug

Most helpful comment

Glad to see it shipped 馃 You guys are awesome!

All 8 comments

I've digged a little deeper and figured that the key is naming. There's field named end_time. This field name is recognized as END in PROCEDURE, which ruins the parsing process.

If we modify the splittable.sql used in ScriptUtilsTest, adding this name:

CREATE TABLE bar (
  end_time VARCHAR(255)
);

The test is broken.

Seems to me the ScriptUtils is only roughly tested. I wonder why isn't a more robust solution is used, such as a SQL syntax parser.

Hey @skyline75489, thanks a lot for digging into this. There are indeed different quirks like this in ScriptUtils.

Can you please point use to a more robust solution we could use if you know one that supports different vendor specific SQL?

The ScriptUtils class we use is taken from the Spring-JDBC project and is used thereby implicitly in thousands of projects. I would therefore refrain from calling it roughly tested, but of course we would be happy if you could point us to a better solution.

I agree ScriptUtils is widely used. But I doubt if it is as robust as dedicated parsers used in druid or something similar.

However, for testcontainer maybe ScriptUtils is enough, though. We are not trying to build some connection pool or fully-function JDBC library, after all. We just need to run some sql scripts. This commit
https://github.com/testcontainers/testcontainers-java/commit/1d03cbacf398f82758bd4c8d71d609c155b40fa5#diff-5023355fa84dc1f1c805f42aafe61a8e introduced compound statements support and also this parsing bug. I think we can figure out a way to fix this without replacing ScriptUtils entirely.

I analysed the problem and the following two ifs are the problem:

if (!inLiteral && !inComment && containsSubstringAtOffset(lowerCaseScriptContent, "BEGIN", i)) {
    compoundStatementDepth++;
}
if (!inLiteral && !inComment && containsSubstringAtOffset(lowerCaseScriptContent, "END", i) && compoundStatementDepth > 0) {
    compoundStatementDepth--;
}
final boolean inCompoundStatement = compoundStatementDepth != 0;

Inside the method containsSubstringAtOffset it just checks the substring with startsWith. So also fields where the name contains the word "begin" have problems.

I will create a pull request with the fix when it is ready. I have to find a better way to check if its the beginning or end of a procedure.

However, for testcontainer maybe ScriptUtils is enough, though. We are not trying to build some connection pool or fully-function JDBC library, after all. We just need to run some sql scripts. This commit
1d03cba#diff-5023355fa84dc1f1c805f42aafe61a8e introduced compound statements support and also this parsing bug. I think we can figure out a way to fix this without replacing ScriptUtils entirely.

I agree. This is, I think, the way I'd prefer to go - I spent some time looking for a splitter/parser that we could use, but keep coming back to:

  • we just need to split the script
  • we don't actually care about dialect or even malformed SQL; it's OK to let the database do the syntax validation

All the parsers I've seen would push us towards a parser/configuration-per-dialect, which I fear would be just be a lot of work given the number of DBs we have to support. Additionally, any cases where the parser has bugs or is overly strict would cause an unnecessary rejection. We just want to split the script, not validate it.

I think that #1627 fixes this particular issue, and seems to be solid at splitting very messy or malformed scripts correctly.

Fix was released in 1.12.0.

Glad to see it shipped 馃 You guys are awesome!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andredasilvapinto picture andredasilvapinto  路  3Comments

lovepoem picture lovepoem  路  3Comments

dabraham02124 picture dabraham02124  路  3Comments

McKratt picture McKratt  路  4Comments

vmassol picture vmassol  路  3Comments