Boostnote: Using SQL based db as primary data storage (along regular files)

Created on 2 Aug 2018 · 9Comments · Source: BoostIO/Boostnote

Concepts

This issue is more like a discussion of an architectural change.

Summary: what is using sql based storages a primary/operational storage for the application.

Detail: Currently it is using the filesystem with CSON file format (CSON is not so widely supported). All operation on data is relying on file manipulation and every change what changes data structure and algorithm on data structure requires "handmade" code. As an example: if we would need a new relationship to implement between data it has to change storage format (what is obvious) and need to implement the corresponding algorithm (+ UI change, obviously, otherwise none of the improvement makes sense :) )

At first, I would consider using SQLite as embeddable, self-containing SQL storage engine.

The concept would be that from live, operational point of view using SQL based storage have several advantages (not necessarily in importance/relevancy):

single file only. Easier to manage
proven good performance on larger datasets. In my Evernote notebook there are more then 9k notes. I have converted and noticed some unconfortable lag for anything I am doing with it.
almost endless flexibility in searching. Currently, only those search capabilities are implemented what are (well) manually done. SQL is taking care of the search algorithm. (keep in mind I am always talking about ease of development for the "backend" because corresponding UI change is always needed anyway)
extension of datastructure is much more straightforward/easy
open the possibility to use "enterprise" databases (aka running on a remote server) what could be a great collaboration platform for teams.

(Remark: On the other hand, we should not ignore filesystem as secondary storage if the file system would be more human readable (neither file names nor CSON structure helps too much in it). I think a simple SQL file is much more easy to process with existing tooling.)

This concept could be found in ZimWiki. When starting up it is building an SQLite database and then it is used internally for efficiency.

Common use case examples

list of directories

select * from folder order by name

list nodes in directories

select note_id, key, title from note where folder = :folder order by upper(title) asc
-- with pagination
select note_id, key, title from note where folder = 'temp' 
order by note_id asc
limit 200, 100

list node with (any?) sort criteria

select note_id,... from note where... order by {title|is_starred|is_pinned|created_at, updated_at}

list all node with directoreis

select *
from note
join folder on (note.folder = folder.key)
order by folder.name, note.title -- etc any other sort criteria

counting nodes in folders

select folder.name, count(1)
from note
join folder on (note.folder = folder.key)
group by folder.name
order by folder.name

search by tags

select title, tag 
from note
join note_tag using (note_id) 
where tag = 'toolbox'

as you could see it has endless flexibility in search and the heavy lifting is done by the sql engine.

snippets having java fragment

select note.*
from snippet
join note using (note_id)
where snippet.mode = 'SQL'

Let me demonstrate some potential improvements

storeing pre-compiled html

Although I did not notice any performance issue when rendering html from markdown I already noticed some delay on rendering diagrams (e.g. plantuml).

If the rendered html is "cached" in the system (including diagram images, etc) there would be no delay just show the html part.

Actually, it worth testing is there any issue of rendering large documents (as for a moment there is not such a thing as note hierarchy so there is no "natural" way of splitting larger writings - e.g into sections).

note cross reference

Once the note is prepared it could include the graph of cross-reference pages for easy navigation (e.g. "Related pages") based on links, tags, parent elements (wither tag, node or folder parent)

folder/note/tag hierarchy

This is the simplest implementation (for tags) to add but probably not so efficient when querying. There are other models (like closure table) with better query characteristics (and a little more complicated insert/update logic)

alter table tag add parent varchar;
insert into tag (tag, parent) values 
('tag1',null),
('tag11','tag1'),
('tag12','tag1'),
('tag121','tag12'),
('tag122','tag12'),
('tag13','tag1'),
('tag2',null),
('tag21','tag2');

select * from tag;

with RECURSIVE
under_tag (root, tag, level) as (
    select tag, tag, 0 
    from tag
    where parent is null
 union all
    select under_tag.root, tag.tag, under_tag.level+1
    from tag
    join under_tag on tag.parent = under_tag.tag
    order by 3 desc
)
select root, tag, level
from under_tag;

root |tag |level
-----|-------|-----
tag1 |tag1 |0
tag1 |tag11 |1
tag1 |tag12 |1
tag1 |tag121 |2
tag1 |tag122 |2
tag1 |tag13 |1
tag2 |tag2 |0
tag2 |tag21 |1

root: where they are belonging to
tag: the current one
level: could be used "indentation" indicator

searching for a child of specific tag

with RECURSIVE
under_tag (root, tag, level) as (
    select tag, tag, 0 
    from tag
    where tag = 'tag12' 
 union all
    select under_tag.root, tag.tag, under_tag.level+1
    from tag
    join under_tag on tag.parent = under_tag.tag
    order by 3 desc
)
select root, tag, level
from under_tag

root |tag |level
------|-------|------
tag12 |tag12 |0
tag12 |tag121 |1
tag12 |tag122 |1

ex: parent path (as nexted directory structure breadcrums)

with RECURSIVE
under_tag (root, tag, ppath) as (
    select tag, tag, tag 
    from tag
    where parent is null
 union all
    select under_tag.root, tag.tag, under_tag.ppath||'/'||tag.tag
    from tag
    join under_tag on tag.parent = under_tag.tag
)
select root, tag, ppath from under_tag

root |tag |ppath
-----|-------|------------------
tag1 |tag1 |tag1
tag2 |tag2 |tag2
tag1 |tag11 |tag1/tag11
tag1 |tag12 |tag1/tag12
tag1 |tag13 |tag1/tag13
tag2 |tag21 |tag2/tag21
tag1 |tag121 |tag1/tag12/tag121
tag1 |tag122 |tag1/tag12/tag122

ex: count by parent aggregate tag. aka sum of child counts

with RECURSIVE
under_tag (ancestor, tag, parent) as (
    select tag, tag, parent 
    from tag
 union all
    select under_tag.ancestor, tag.tag, tag.parent
    from tag
    join under_tag on tag.parent = under_tag.tag
)
select ancestor, count(1)-1 from under_tag group by ancestor

ancestor |count(1)-1
---------|-----------
tag1 |5
tag11 |0
tag12 |2
tag121 |0
tag122 |0
tag13 |0

In general, the same logic could be applied to folders and notes too.

According to my investigation, the following issues could gain advantages from this implementation:

Why use a two-layer structure instead of a tree structure? #2201
Files and directory mode for note management #2126
Hierarchy in tags (like #tag_h1#tag_h2#tag_h3) #2099
[feature request] muti-level folder #2078
Feature Request: Allow Notes (and Folders?) to be arranged as parent/children in a tree/hierarchy #586
Filter tags / folder list #2027 : see flexibility of search
[FeatureRequest][Question] Show thirdparty folders and notes through proxy service #2001 : imagine as the proxy is a remote database server
Recover .md files frome .cson (encoding conversion) #1901 : as the content is stored as it plain text it is trivial
[Feature Request] Create Subfolders #1789
Error handling for corrupted files #1732 : having file corruption in sqlite is much more difficult than in plain text files.
Multi-level directory support #1541
[Feature Request] add Nested sub folder support #1518
[Feature Request] Filter Notes with todos/tasks #1508 : with the combination of preparing (and creating explicit todo "index" the search is very easy)
[Feature Request] Save all changes / history function #1431: in the database world, having such a history table for a note implemented transparently by triggers is quite common and easy.
hope to add tree mode #1322
Performance issues #1195
[Feature Request] Add per-folder sort #933 : this definitely needs extension in data model. it would be not so complicated using some folder level attribute what is used after in the order by is an easy win.
Sub-note Request #819

Used SQL data structure:

        create table folder(
            key varchar(256),
            name varchar(256),
            color varchar(256)
        )

        create table note(
            note_id integer primary key,
            key varchar,
            type varchar(256),
            folder varchar(256),
            title varchar,
            description varchar,
            content text,
            is_starred bool,
            is_trashed bool,
            is_pinned bool,
            created_at timestamp,
            updated_at timestamp
        )

        create table tag(
            tag varchar primary key
        )

        create table note_tag(
            note_id integer, 
            tag varchar
        )

        create table snippet(
            note_id integer,
            name varchar,
            mode varchar,
            content text
        )

        create table attachment (
            key varchar,
            name varchar,
            file blob
        )

For experimental purposes, I could provide a script what is converting boosnote files into this database structure.

discussion

Source

takacsot

👍5

Most helpful comment

Hi. I started using boostnote because I like the fact I have "physical files" I can move about, and store in different devices. If I wanted a database I would just create one directly...

On the other hand, early Evernote used a db for metadata IIRC while keeping the content in files, so perhaps that is a the best of both worlds?

gotofritz on 15 Nov 2018

👍5

All 9 comments

for many features a database is absolutely required. for example, auto completion is strongly tied to syntax. whether human readable format or database each are good solutions with advantages and disadvantages. what matter s is acknowledging their limitations and taking advantage of either one.

ghost on 3 Aug 2018

if you want edit markdown with external tools focus on this part, don't bloat it with features. if you want feature rich software use a database.

ghost on 3 Aug 2018

To be honest, i don't think it will be implemented in boostnote. But at the moment, there is an ongoing discussion to use a database for notes in the new version of boostnote that is developed from scratch at the moment. Please feel free to a) help to implement it and b) join the discussion

a) : https://github.com/BoostIO/Boost
b) slack group please join the channel "renewal"

ehhc on 3 Aug 2018

if you want edit markdown with external tools focus on this part, don't bloat it with features. if you want feature rich software use a database.

I think we could keep both. If any external process changes a note the app just picks it up and integrate into the database. By this file could be editable but we still gain from the use of database. (the usability of current filesystem storage by any external tooling is another topic to discuss - there open issues about it already)

takacsot on 3 Aug 2018

Hi. I started using boostnote because I like the fact I have "physical files" I can move about, and store in different devices. If I wanted a database I would just create one directly...

On the other hand, early Evernote used a db for metadata IIRC while keeping the content in files, so perhaps that is a the best of both worlds?

gotofritz on 15 Nov 2018

👍5

maybe not a good idea:

database is not a plaintext to read. cson file can be read without boostnote.
incremental synchronization is become impossible. now we can use dropbox to incremental synchronization. but if use sqlite , how to synchronize?

xvusrmqj on 20 Dec 2018

👍2

Considering the current "plaintext" cson format as something to read without Boostnote is far from reality. Neither the naming convention, nor the format is suitable for a human to work with.

With real plain text with front-matter to store metadata AND and a practical file naming convention could make it useful for non-boostnote use.

Incremental synchronization should not be an issue. As I mentioned it could be imagined as "primary" or "operational" storage. Where boostnote would be working. It would be a perfectly viable solution to "synchronize" between file system periodically and frequently. By this you could get the best of both solution.

Plus imagine if you intend to make a "corporate" level document synchronization? just replace sqlite to some remote database solution and you got shared documents automatically. (keep in mind that locking a file while modifying is an issue today because the files could be modified in parallel - so having a reliable file level or db level locking is needed to implement anyway)

takacsot on 7 Jan 2019

Here are issues regarding the readability of the cson files: #1447 and #490

Flexo013 on 15 Aug 2019

CSON isn't ideal but I can 100% open them in Atom, VS Code, SED, AWK, Python or any other text based tool and manipulate them (global search and replace across all notes for example). The format is more suitable for humans to work with than some of the other file formats I work with at work every day. Remember this is a tool for programmers.

I was using One Note on Mac at work and it got screwed up such that there are notes on my local drive in some database format that Microsoft will not disclose and can't fix that never got sync'd to the server (that is somehow broken now) so I'm having to copy out years of notes manually into Boost Note because One Note is holding them hostage in some database. I don't want to see a repeat of that dumpster fire ever again. I guess I could just use md files and a directory structure with Atom / VS Code if Boost Note is going to start requiring putting notes into a DB. More complexity, more points of failure, less reliability.