This issue is more like a discussion of an architectural change.
Summary: what is using sql based storages a primary/operational storage for the application.
Detail: Currently it is using the filesystem with CSON file format (CSON is not so widely supported). All operation on data is relying on file manipulation and every change what changes data structure and algorithm on data structure requires "handmade" code. As an example: if we would need a new relationship to implement between data it has to change storage format (what is obvious) and need to implement the corresponding algorithm (+ UI change, obviously, otherwise none of the improvement makes sense :) )
At first, I would consider using SQLite as embeddable, self-containing SQL storage engine.
The concept would be that from live, operational point of view using SQL based storage have several advantages (not necessarily in importance/relevancy):
(Remark: On the other hand, we should not ignore filesystem as secondary storage if the file system would be more human readable (neither file names nor CSON structure helps too much in it). I think a simple SQL file is much more easy to process with existing tooling.)
This concept could be found in ZimWiki. When starting up it is building an SQLite database and then it is used internally for efficiency.
list of directories
select * from folder order by name
list nodes in directories
select note_id, key, title from note where folder = :folder order by upper(title) asc
-- with pagination
select note_id, key, title from note where folder = 'temp'
order by note_id asc
limit 200, 100
list node with (any?) sort criteria
select note_id,... from note where... order by {title|is_starred|is_pinned|created_at, updated_at}
list all node with directoreis
select *
from note
join folder on (note.folder = folder.key)
order by folder.name, note.title -- etc any other sort criteria
counting nodes in folders
select folder.name, count(1)
from note
join folder on (note.folder = folder.key)
group by folder.name
order by folder.name
search by tags
select title, tag
from note
join note_tag using (note_id)
where tag = 'toolbox'
as you could see it has endless flexibility in search and the heavy lifting is done by the sql engine.
snippets having java fragment
select note.*
from snippet
join note using (note_id)
where snippet.mode = 'SQL'
Although I did not notice any performance issue when rendering html from markdown I already noticed some delay on rendering diagrams (e.g. plantuml).
If the rendered html is "cached" in the system (including diagram images, etc) there would be no delay just show the html part.
Actually, it worth testing is there any issue of rendering large documents (as for a moment there is not such a thing as note hierarchy so there is no "natural" way of splitting larger writings - e.g into sections).
Once the note is prepared it could include the graph of cross-reference pages for easy navigation (e.g. "Related pages") based on links, tags, parent elements (wither tag, node or folder parent)
This is the simplest implementation (for tags) to add but probably not so efficient when querying. There are other models (like closure table) with better query characteristics (and a little more complicated insert/update logic)
alter table tag add parent varchar;
insert into tag (tag, parent) values
('tag1',null),
('tag11','tag1'),
('tag12','tag1'),
('tag121','tag12'),
('tag122','tag12'),
('tag13','tag1'),
('tag2',null),
('tag21','tag2');
select * from tag;
with RECURSIVE
under_tag (root, tag, level) as (
select tag, tag, 0
from tag
where parent is null
union all
select under_tag.root, tag.tag, under_tag.level+1
from tag
join under_tag on tag.parent = under_tag.tag
order by 3 desc
)
select root, tag, level
from under_tag;
root |tag |level
-----|-------|-----
tag1 |tag1 |0
tag1 |tag11 |1
tag1 |tag12 |1
tag1 |tag121 |2
tag1 |tag122 |2
tag1 |tag13 |1
tag2 |tag2 |0
tag2 |tag21 |1
searching for a child of specific tag
with RECURSIVE
under_tag (root, tag, level) as (
select tag, tag, 0
from tag
where tag = 'tag12'
union all
select under_tag.root, tag.tag, under_tag.level+1
from tag
join under_tag on tag.parent = under_tag.tag
order by 3 desc
)
select root, tag, level
from under_tag
root |tag |level
------|-------|------
tag12 |tag12 |0
tag12 |tag121 |1
tag12 |tag122 |1
ex: parent path (as nexted directory structure breadcrums)
with RECURSIVE
under_tag (root, tag, ppath) as (
select tag, tag, tag
from tag
where parent is null
union all
select under_tag.root, tag.tag, under_tag.ppath||'/'||tag.tag
from tag
join under_tag on tag.parent = under_tag.tag
)
select root, tag, ppath from under_tag
root |tag |ppath
-----|-------|------------------
tag1 |tag1 |tag1
tag2 |tag2 |tag2
tag1 |tag11 |tag1/tag11
tag1 |tag12 |tag1/tag12
tag1 |tag13 |tag1/tag13
tag2 |tag21 |tag2/tag21
tag1 |tag121 |tag1/tag12/tag121
tag1 |tag122 |tag1/tag12/tag122
ex: count by parent aggregate tag. aka sum of child counts
with RECURSIVE
under_tag (ancestor, tag, parent) as (
select tag, tag, parent
from tag
union all
select under_tag.ancestor, tag.tag, tag.parent
from tag
join under_tag on tag.parent = under_tag.tag
)
select ancestor, count(1)-1 from under_tag group by ancestor
ancestor |count(1)-1
---------|-----------
tag1 |5
tag11 |0
tag12 |2
tag121 |0
tag122 |0
tag13 |0
In general, the same logic could be applied to folders and notes too.
According to my investigation, the following issues could gain advantages from this implementation:
#tag_h1#tag_h2#tag_h3) #2099order by is an easy win. create table folder(
key varchar(256),
name varchar(256),
color varchar(256)
)
create table note(
note_id integer primary key,
key varchar,
type varchar(256),
folder varchar(256),
title varchar,
description varchar,
content text,
is_starred bool,
is_trashed bool,
is_pinned bool,
created_at timestamp,
updated_at timestamp
)
create table tag(
tag varchar primary key
)
create table note_tag(
note_id integer,
tag varchar
)
create table snippet(
note_id integer,
name varchar,
mode varchar,
content text
)
create table attachment (
key varchar,
name varchar,
file blob
)
For experimental purposes, I could provide a script what is converting boosnote files into this database structure.
for many features a database is absolutely required. for example, auto completion is strongly tied to syntax. whether human readable format or database each are good solutions with advantages and disadvantages. what matter s is acknowledging their limitations and taking advantage of either one.
if you want edit markdown with external tools focus on this part, don't bloat it with features. if you want feature rich software use a database.
To be honest, i don't think it will be implemented in boostnote. But at the moment, there is an ongoing discussion to use a database for notes in the new version of boostnote that is developed from scratch at the moment. Please feel free to a) help to implement it and b) join the discussion
a) : https://github.com/BoostIO/Boost
b) slack group please join the channel "renewal"
if you want edit markdown with external tools focus on this part, don't bloat it with features. if you want feature rich software use a database.
I think we could keep both. If any external process changes a note the app just picks it up and integrate into the database. By this file could be editable but we still gain from the use of database. (the usability of current filesystem storage by any external tooling is another topic to discuss - there open issues about it already)
Hi. I started using boostnote because I like the fact I have "physical files" I can move about, and store in different devices. If I wanted a database I would just create one directly...
On the other hand, early Evernote used a db for metadata IIRC while keeping the content in files, so perhaps that is a the best of both worlds?
maybe not a good idea:
Considering the current "plaintext" cson format as something to read without Boostnote is far from reality. Neither the naming convention, nor the format is suitable for a human to work with.
With real plain text with front-matter to store metadata AND and a practical file naming convention could make it useful for non-boostnote use.
Incremental synchronization should not be an issue. As I mentioned it could be imagined as "primary" or "operational" storage. Where boostnote would be working. It would be a perfectly viable solution to "synchronize" between file system periodically and frequently. By this you could get the best of both solution.
Plus imagine if you intend to make a "corporate" level document synchronization? just replace sqlite to some remote database solution and you got shared documents automatically. (keep in mind that locking a file while modifying is an issue today because the files could be modified in parallel - so having a reliable file level or db level locking is needed to implement anyway)
Here are issues regarding the readability of the cson files: #1447 and #490
CSON isn't ideal but I can 100% open them in Atom, VS Code, SED, AWK, Python or any other text based tool and manipulate them (global search and replace across all notes for example). The format is more suitable for humans to work with than some of the other file formats I work with at work every day. Remember this is a tool for programmers.
I was using One Note on Mac at work and it got screwed up such that there are notes on my local drive in some database format that Microsoft will not disclose and can't fix that never got sync'd to the server (that is somehow broken now) so I'm having to copy out years of notes manually into Boost Note because One Note is holding them hostage in some database. I don't want to see a repeat of that dumpster fire ever again. I guess I could just use md files and a directory structure with Atom / VS Code if Boost Note is going to start requiring putting notes into a DB. More complexity, more points of failure, less reliability.
Most helpful comment
Hi. I started using boostnote because I like the fact I have "physical files" I can move about, and store in different devices. If I wanted a database I would just create one directly...
On the other hand, early Evernote used a db for metadata IIRC while keeping the content in files, so perhaps that is a the best of both worlds?