Nim: db_sqlite needs sqlPrepared

Created on 2 Mar 2020  路  9Comments  路  Source: nim-lang/Nim

Summary

db_sqlite needs sqlPrepared for Prepared Statements, just like db_postgres.

API design and docs can be based from the db_postgres.
Can not break APIs because is a new feature.

Performance improvements may be possible with this,
also SQLite is used a lot on mobile and embedded.

So other ORM/Frameworks/Libs can build on top of that, SQLite itself has that feature too.

Feature Stdlib

All 9 comments

I ran a small benchmark to compare Nim and Python with SQLite. Nim is about 30% slower, I'm guessing because Python's sqlite3 module caches prepared queries automatically.

Here's the Python test code and run:

import os
import sqlite3

dbname = 'bttest2.py.db'
if os.path.exists(dbname):
  os.remove(dbname)
db = sqlite3.connect(dbname)
db.execute("pragma journal_mode = off")
db.execute("pragma synchronous = off")
db.execute("pragma legacy_file_format = off")
db.execute("pragma page_size = %d" % 8192)
db.executescript('''
    create table t (
        k    integer primary key,
        v    integer);
''')

nrows = 1000000
for i in xrange(nrows):
  db.execute('insert into t values (?, ?)', (i, i))
db.commit()

for i in xrange(3):
  for j in xrange(nrows):
    v = db.execute('select v from t where k = ?', (j,)).fetchone()[0]
    db.execute('update t set v = ? where k = ?', (v + 1, j))
db.commit()

v = db.execute('select v from t where k = ?', (13,)).fetchone()[0]
print v
ms:nim jim$ /usr/bin/time -l python bttest.py
16
       43.92 real        43.76 user         0.10 sys
   8962048  maximum resident set size
      2483  page reclaims
       191  page faults
        96  block input operations
        33  block output operations
       175  voluntary context switches
        19  involuntary context switches

Here's the Nim test code and run:

import os
import db_sqlite
import strutils

const
  dbname = "bttest2.nim.db"

var
  v: int;

#let nrows = 10_000_000
#let nrows = 100
let nrows = 1_000_000
os.removeFile(dbname)
let db = db_sqlite.open(dbname, "", "", "")

db.exec(sql("""create table t (
    k    integer primary key,
    v    integer)"""))

db.exec(sql"begin")
for i in 0..<nrows:
  db.exec(sql"insert into t values (?,?)", i, i)
db.exec(sql"commit")

db.exec(sql"begin")
for i in 0..<3:
  for j in 0..<nrows:
    v = parseInt(db.getValue(sql"select v from t where k = ?", j))
    db.exec(sql"update t set v = ? where k = ?", v+1, j)
db.exec(sql"commit")

echo db.getValue(sql"select v from t where k = ?", 13)
db.close()
ms:nim jim$ /usr/bin/time -l ./bttest2
16
       58.03 real        57.80 user         0.13 sys
   3182592  maximum resident set size
       788  page reclaims
        11  page faults
        11  block output operations
        41  voluntary context switches
        31  involuntary context switches

I started to look into hacking on Nim's db_sqlite as a way to get my feet wet with Nim. I didn't realize there was already an interface to prepared queries for the Postgres version.

db_postgres exposes names for prepared queries because apparently Postgres caches them internally by that name. SQLite doesn't give names to prepared queries and doesn't automatically cache them, so caching would have to be done in db_sqlite. The logical thing would be to attach a table to DbConn for the prepared statement cache. But this wiould break programs that pass a db handle to both the db_sqlite and sqlite3 modules.

Personally, I don't like the idea of naming prepared stmts. The Python sqlite3 module caches queries automatically using a hash key. I have around 380 select statements in HashBackup and would not want to give them all names, so I'd have to hash them before db_sqlite. This would prevent using string literals in db_sqlite calls because I'd need to pass the hash code as a prepared stmt name. So I'd have to put all queries in a temp variable first and pass that with the hash code. It's doable, but it seems the easier way for users is to not expose prepared stmt names and leave this as an implementation detail inside the db_postgres module.

My other problem with the db_xxx modules is that all parameters are treated as strings. I don't want (and don't use) a fancy ORM thingy, but do think it's necessary to have direct access to some basic db types that would work across all databases, for example: text, null, integer, float, blob. SQLite has all of these and I'm sure the other db's do too. If you start getting into timestamps, booleans, different sized-integers, etc. that's more difficult.

But there are problems with the way it is coded now. For example, you can't set a column to null; I tried with nil, and it bombs at compile time like this:

/Users/jim/nim/sqtest.nim(17, 3) Error: type mismatch: got <DbConn, SqlQuery, int literal(7), typeof(nil)>
but expected one of: 
proc exec(db: DbConn; query: SqlQuery; args: varargs[string, `$`])
  first type mismatch at position: 4
  required type for args: varargs[string]
  but expression 'nil' is of type: typeof(nil)

expression: exec(db,
  SqlQuery(r"insert into t values (?,?)"), 7, nil)

I tried to use "null", thinking it wouldn't work, and it doesn't:

import os
import db_sqlite
import strutils

const
  dbname = "sqtest.db"

var
  v: int;

let db = db_sqlite.open(dbname, "", "", "")

db.exec(sql"drop table if exists t")
db.exec(sql("""create table t (
    k    integer primary key,
    v    integer)"""))
db.exec(sql"insert into t values (?,?)", 7, "null")
db.close()

ms:nim jim$ sqlite3 sqtest.db
-- Loading resources from /Users/jim/.sqliterc

SQLite version 3.7.13 2012-07-17 17:46:21
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from t;
k           v         
----------  ----------
7           null      
sqlite> select * from t where v is null;
sqlite> 

So the only way to null a column is to make that part of the query string text.

Blobs are handled as strings, which doesn't work either. To insert a blob as quoted value, X'xxyyzz' is used, where xx yy and zz are the hex values for each byte. This is inefficient, especially for large blobs. A blob bind is way better. Nim treats everything as strings. Here's a test program to insert a 3-byte blob with Nim:

# try inserting a blob containing zeroes

import os
import db_sqlite
import strutils

const
  dbname = "sqtest.db"

var
  v: int
#  s = "abc"            # this works with db_sqlite
  s = "\x00\x01\x02"   # this fails at runtime

echo repr(s)

let db = db_sqlite.open(dbname, "", "", "")

db.exec(sql"drop table if exists t")
db.exec(sql("""create table t (
    k    integer primary key,
    v    clob)"""))

db.exec(sql"insert into t values (?,?)", 7, s)
db.close()
Hint: /Users/jim/nim/sqtest2  [Exec]
0x10f3aa058"\0\1\2"
/Users/jim/nim/sqtest2.nim(24) sqtest2
/Users/jim/nim-1.2.0/lib/impure/db_sqlite.nim(207) exec
/Users/jim/nim-1.2.0/lib/impure/db_sqlite.nim(142) dbError
Error: unhandled exception: unrecognized token: "'" [DbError]
Error: execution of an external program failed: '/Users/jim/nim/sqtest2 '

I think (but am not sure) the leading \x00 terminates the query string, causing an SQLite syntax error.

Another thing I kinda don't get is the sql"select ..." syntax. I can see that sql"..." might be used to catch sql syntax errors at compile time, but wouldn't that require having a complete grammar for each databases inside the corresponding module? SQL isn't standard enough across databases to just have 1 grammar.

It would be nice if the syntax of exec stayed the same, only with automatic caching of prepared statements and type-specific column binding for both actual arguments and result columns.

@hashbackup Also check out my "Ormin" project here https://github.com/Araq/ormin

Which takes care of prepared statements and provides a typed interface, not based on strings.

I just check this issue now, I will unify the api compare to other db_*

@bung87 thank you for working on this. I'm new to Nim and don't have the skills yet to make these changes.

I revised the test program:

import os
import db_sqlite
import strutils

const
  dbname = "bttest3.nim.db"

var
  v: int

#let nrows = 10_000_000
#let nrows = 1_000_000
let nrows = 100
os.removeFile(dbname)
let db = db_sqlite.open(dbname, "", "", "")

db.exec(sql("""create table t (
    k    integer primary key,
    v    integer)"""))

db.exec(sql"begin")
let insstmt = db.prepare("insert into t values (?,?)")
for i in 0..<nrows:
  db.exec(insstmt, i, i)
db.exec(sql"commit")

db.exec(sql"begin")
let getstmt = db.prepare("select v from t where k = ?")
let updstmt = db.prepare("update t set v = ? where k = ?")
for i in 0..<3:
  for j in 0..<nrows:
    v = parseInt(db.getValue(getstmt, j))
    db.exec(updstmt, v+1, j)
db.exec(sql"commit")

echo db.getValue(sql"select v from t where k = ?", 13)
db.close()

Q: shouldn't prepare() take an SqlQuery type instead of string, to be consistent with the rest of the API? The Postgres version does.

First I got a compile error, because the new exec() didn't stringify its args:

/Users/jim/nim/bttest3.nim(26, 5) Error: type mismatch: got <DbConn, SqlPrepared, int, int>
but expected one of: 
proc exec(db: DbConn; query: SqlQuery; args: varargs[string, `$`])
  first type mismatch at position: 2
  required type for query: SqlQuery
  but expression 'stmt' is of type: SqlPrepared
proc exec(db: DbConn; stmtName: SqlPrepared; args: varargs[string])
  first type mismatch at position: 3
  required type for args: varargs[string]
  but expression 'i' is of type: int
proc exec(para1: PSqlite3; sql: cstring; para3: Callback; para4: pointer;
         errmsg: var cstring): int32
  first type mismatch at position: 2
  required type for sql: cstring
  but expression 'stmt' is of type: SqlPrepared

expression: exec(db, stmt, i, i)

I changed prepare() to add stringify like the other exec(). Then it compiled, but fails at runtime:

Hint: 37554 LOC; 0.957 sec; 46.598MiB peakmem; Debug build; proj: /Users/jim/nim/bttest3; out: /Users/jim/nim/bttest3 [SuccessX]
ms:nim jim$ ./bttest3
/Users/jim/nim/bttest3.nim(24) bttest3
/Users/jim/nim/db_sqlite.nim(227) exec
/Users/jim/nim/db_sqlite.nim(143) dbError
Error: unhandled exception: not an error [DbError]

Also, this doesn't unify the db_* API because the Postgres version of prepare() takes 4 args while the db_sqlite version takes 2 args.

hi, this pr fix that db_sqlite does not have prepare api, the acceptable param is full sql query string, without placeholder, since the bindParam api not implemented yet.

it just get benefit from db layer ,not parse same query twice.

btw, wait for next PR ,there's also a issue related to bindParam, wait for it closed.

Thanks this is useful.

Here's sort of what I was looking for, though it doesn't cache prepared statements either:
https://github.com/GULPF/tiny_sqlite

Here is a version of the test program using tiny_sqlite with recently added prepared statements. It has a nice interface to SQLite, runs 3x faster than the Python test, and 4x faster than db_sqlite. This is more like what I was expecting from Nim's standard library. I'm surprised it's 3x faster than Python, because there doesn't seem to be much Python code executed in the test. Python's sqlite3 interface is all written in C, but I guess there is still a lot of overhead in importing and exporting Python variables.

import os
import tiny_sqlite

const
  dbname = "bttest4.nim.db"

let nrows = 1_000_000
#let nrows = 100
os.removeFile(dbname)

let db = openDatabase(dbname)

db.exec("""create table t (
  k    integer primary key,
  v    integer)""")

db.transaction:
  for i in 0..<nrows:
    db.exec("insert into t values (?,?)", i, i)

db.transaction:
  for i in 0..<3:
    for j in 0..<nrows:
      for row in db.rows("select v from t where k = ?", j):
        let (v,) = row.unpack((int,))
        db.exec("update t set v = ? where k = ?", v+1, j)

for row in db.rows("select v from t where k = 13"):
  let (v,) = row.unpack((int,))
  echo v

db.close()

Hint: 40998 LOC; 1.348 sec; 58.27MiB peakmem; Dangerous Release build; proj: /Users/jim/nim/bttest4; out: /Users/jim/nim/bttest4 [SuccessX]
ms:nim jim$ /usr/bin/time -l ./bttest4
16
       14.42 real        14.21 user         0.11 sys
   3211264  maximum resident set size
       791  page reclaims
        15  page faults
         2  block output operations
        47  voluntary context switches
        32  involuntary context switches
Was this page helpful?
0 / 5 - 0 ratings