Zeronet: Optimize the function for building database

Created on 15 Sep 2018  路  13Comments  路  Source: HelloZeroNet/ZeroNet

rev3594
It takes about 5min to build the database for Horizon.And during building,i can't do anything on zeronet.it just says loading.

Most helpful comment

Not everyone would use SSD.And the small file read/write is not HHD good at.So it needs a solution to bypass small file read/write.Also cache the db read/write to file system

All 13 comments

For me: INFO Site:1CjMsv..uxBn Imported 10 data file in 11.4669420719s on a 20 USD/yr VPN which is I think pretty OK since it reads and inserts around 70MB of data

I will check the possibility to moving db operations to separate thread with moving to python3

70MB? It should be 216MB. I'm running zeronet on HDD not SSD,so it will be slower.Maybe some cache in RAM is needed. Also the data is growing,as i said.

$ gzip -l *.json.gz
         compressed        uncompressed  ratio uncompressed_name
            3303011            27374599  87.9% data_keywords.json
            2570148            21068213  87.8% data_main.json
            1006795             4339919  76.8% data_phrases.json
            1460239            15571250  90.6% data_relationship.json
            8340193            68353981  87.8% (totals)

SSD highly recommended for ZeroNet

Not everyone would use SSD.And the small file read/write is not HHD good at.So it needs a solution to bypass small file read/write.Also cache the db read/write to file system

The file reads are cached and a write is handled by the operating system. The db cache is handled by the sqlite module.

@HelloZeroNet i also have this issue, but 4 times slower in my case to load the .db (24 minutes) + CPU overload. As @blurHY says, not everyone will use HDD. And think about smartphone users.. I found this thread because i wanted to submit the issue about same thing. On mentioned Horizon site, it took my older Pentium computer 15 minutes of full CPU load (HDD activity was not exhausted whole time) to finish rechecking of Horizon site.

This is what i did on my Linux Ubuntu 16.04 computer with latest Zeronet:
cd ~/Apps/ZeroBundle/ZeroNet/data/
find ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn -delete
mkdir ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn
git clone https://github.com/blurHY/Horizon.git
mv Horizon/* ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn/
../zeronet.py sitePublish 1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn

Then i go to ZeroHello and click "Check files" next to Horizon site. Result was like 15 minut CPU overload of the computer, debug.log not went crazy, but i seen in Horizon site (0) menu that the site has 300MB .db
And this size is nothing special, Zeronet should be able to cope with dynamic sites having lets say 10GB databases etc.

cd ~/Apps/ZeroBundle/ZeroNet/data/1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn/data
$ gzip -l *.json.gz

         compressed        uncompressed  ratio uncompressed_name
            2743133            23285428  88.2% data_keywords1.json
            2208428            23940669  90.8% data_keywords2.json
            2602574            23455548  88.9% data_keywords3.json
            2656768            23706139  88.8% data_keywords4.json
             902385             8252620  89.1% data_keywords5.json
            2789194            24995191  88.8% data_main1.json
            2155699            15099977  85.7% data_main2.json
            3347288            19561521  82.9% data_phrases1.json
            2197791            23694010  90.7% data_relationship1.json
            2425075            24690472  90.2% data_relationship2.json
            1197466            14682409  91.8% data_relationship3.json
             303130             1297035  76.6% data_zites1.json
           25528931           226661019  88.7% (totals)

I think this happened to me several times on this site, because in debug-last.log i see:
Site:1CjMsv..uxBn Imported 12 data file in 1443.11739898s

It may be related to unsolved issue where ZeroMe db tooks days to rebuild: ZeroTalk topic, also described in this unsolved issue: https://github.com/HelloZeroNet/ZeroMe/issues/121

The problem is it's limited by IO/Sqlite, so we can't do much about it. You can try experiment it by removing some indexes as that's one of the factors of insert performance.

removing some indexes

Then it will be slower to query ?

It's not necessary going to be slower. Worth experimenting with it.

1min and 30 secs building after removed all of indexes.And the query seems quicker.Maybe the reason is that the cpu is idle .

what about making db writes async? that way it at least won't lock up the whole client

Each time I add this zite ( Horizon) to my poor vps (only have 500MB memery), the zeronet.py program would be killed by the system due to out of memery.

Is there any way to know that the database has been built? Also show progress bar when the db is building
It won't show progress when the db is big and not filled with user data.Then users don't know what are zeronet doing

Was this page helpful?
0 / 5 - 0 ratings