We should use VARCHAR data type for storing hexadecimal strings. Reason for that is ease of use, get rid of ENCODE/DECODE, and it's faster for reads (referenece: http://engineering.pivotal.io/post/bytea_versus_text_in_postgresql/).
We're using BYTEA data type for various fields in database:
publicKeysecondPublicKeysenderPublicKeyrecipientPublicKeygeneratorPublicKeyrequesterPublicKeysignaturesignSignatureblockSignaturepayloadHashbroadhashdata field of transaction type 0N/A
all
Here is a summary db physical size for each data type;
SELECT
pg_column_size(decode('b65aa5950acf1ade522bcf520f2b2491dcde2f312b4933f56443faff80ad8ebc', 'hex')::bytea) AS size_bytea,
pg_column_size('b65aa5950acf1ade522bcf520f2b2491dcde2f312b4933f56443faff80ad8ebc'::text) AS size_var,
pg_column_size(10001) as size_int,
pg_column_size(10001::bigint) as size_int
;
36, 68, 4, 8
On hold till https://github.com/LiskHQ/lisk/issues/1780
To see how this issue will impact the performance, I did some tests as below:
1) I imported mainnet snapshot to db.
dropdb lisk_main
createdb lisk_main
rm latest.gz
wget https://snapshot.lisknode.io/latest.gz
gunzip -fcq latest.gz | psql -U lisk -d lisk_main
2) On development branch, I started application with: node app.js -n mainnet.
3) I performed following ab operations:
GET /api/accounts:
~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts" > acc_bytea_n200000_c1.txt
13:52:08
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8000
Document Path: /api/accounts
Document Length: 3223 bytes
Concurrency Level: 1
Time taken for tests: 853.613 seconds
Complete requests: 200000
Failed requests: 198419
(Connect: 0, Receive: 0, Length: 198419, Exceptions: 0)
Keep-Alive requests: 0
Total transferred: 675609988 bytes
HTML transferred: 614609988 bytes
Requests per second: 234.30 [#/sec] (mean)
Time per request: 4.268 [ms] (mean)
Time per request: 4.268 [ms] (mean, across all concurrent requests)
Transfer rate: 772.92 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 66
Processing: 2 4 5.2 3 256
Waiting: 2 4 5.2 3 256
Total: 2 4 5.3 3 256
Percentage of the requests served within a certain time (ms)
50% 3
66% 4
75% 4
80% 4
90% 5
95% 7
98% 10
99% 12
100% 256 (longest request)
New Relic Results:

GET /api/accounts with publicKey filter:
~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2" > acc_bytea_n200000_c1_publicKey.txt
14:11:07
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8000
Document Path: /api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2
Document Length: 375 bytes
Concurrency Level: 1
Time taken for tests: 1516.012 seconds
Complete requests: 55979
Failed requests: 0
Keep-Alive requests: 0
Total transferred: 38065720 bytes
HTML transferred: 20992125 bytes
Requests per second: 36.93 [#/sec] (mean)
Time per request: 27.082 [ms] (mean)
Time per request: 27.082 [ms] (mean, across all concurrent requests)
Transfer rate: 24.52 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 2
Processing: 22 27 6.1 26 268
Waiting: 22 27 6.1 26 252
Total: 22 27 6.1 26 268
Percentage of the requests served within a certain time (ms)
50% 26
66% 27
75% 27
80% 27
90% 29
95% 30
98% 34
99% 42
100% 268 (longest request)
New Relic Results:

DROP TRIGGER protect_mem_accounts ON public.mem_accounts;
DROP FUNCTION public.revert_mem_account();
DROP VIEW public.full_blocks_list;
DROP VIEW public.trs_list;
DROP INDEX mem_accounts_get_delegates;
ALTER TABLE public.signatures
ALTER COLUMN "publicKey" TYPE varchar(64) USING ENCODE("publicKey", 'hex');
ALTER TABLE public.mem_accounts
ALTER COLUMN "publicKey" TYPE varchar(64) USING ENCODE("publicKey", 'hex');
ALTER TABLE public.rounds_rewards
ALTER COLUMN "publicKey" TYPE varchar(64) USING ENCODE("publicKey", 'hex');
CREATE VIEW public.full_blocks_list AS
SELECT b.id AS b_id,
b.version AS b_version,
b."timestamp" AS b_timestamp,
b.height AS b_height,
b."previousBlock" AS "b_previousBlock",
b."numberOfTransactions" AS "b_numberOfTransactions",
b."totalAmount" AS "b_totalAmount",
b."totalFee" AS "b_totalFee",
b.reward AS b_reward,
b."payloadLength" AS "b_payloadLength",
encode(b."payloadHash", 'hex'::text) AS "b_payloadHash",
encode(b."generatorPublicKey", 'hex'::text) AS "b_generatorPublicKey",
encode(b."blockSignature", 'hex'::text) AS "b_blockSignature",
t.id AS t_id,
t."rowId" AS "t_rowId",
t.type AS t_type,
t."timestamp" AS t_timestamp,
encode(t."senderPublicKey", 'hex'::text) AS "t_senderPublicKey",
t."senderId" AS "t_senderId",
t."recipientId" AS "t_recipientId",
t.amount AS t_amount,
t.fee AS t_fee,
encode(t.signature, 'hex'::text) AS t_signature,
encode(t."signSignature", 'hex'::text) AS "t_signSignature",
s."publicKey" AS "s_publicKey",
d.username AS d_username,
v.votes AS v_votes,
m.min AS m_min,
m.lifetime AS m_lifetime,
m.keysgroup AS m_keysgroup,
dapp.name AS dapp_name,
dapp.description AS dapp_description,
dapp.tags AS dapp_tags,
dapp.type AS dapp_type,
dapp.link AS dapp_link,
dapp.category AS dapp_category,
dapp.icon AS dapp_icon,
it."dappId" AS "in_dappId",
ot."dappId" AS "ot_dappId",
ot."outTransactionId" AS "ot_outTransactionId",
encode(t."requesterPublicKey", 'hex'::text) AS "t_requesterPublicKey",
tf.data AS tf_data,
t.signatures AS t_signatures
FROM (((((((((public.blocks b
LEFT JOIN public.trs t ON (((t."blockId")::text = (b.id)::text)))
LEFT JOIN public.delegates d ON (((d."transactionId")::text = (t.id)::text)))
LEFT JOIN public.votes v ON (((v."transactionId")::text = (t.id)::text)))
LEFT JOIN public.signatures s ON (((s."transactionId")::text = (t.id)::text)))
LEFT JOIN public.multisignatures m ON (((m."transactionId")::text = (t.id)::text)))
LEFT JOIN public.dapps dapp ON (((dapp."transactionId")::text = (t.id)::text)))
LEFT JOIN public.intransfer it ON (((it."transactionId")::text = (t.id)::text)))
LEFT JOIN public.outtransfer ot ON (((ot."transactionId")::text = (t.id)::text)))
LEFT JOIN public.transfer tf ON (((tf."transactionId")::text = (t.id)::text)));
ALTER TABLE public.full_blocks_list OWNER TO lisk;
CREATE VIEW public.trs_list AS
SELECT t.id AS t_id,
b.height AS b_height,
t."blockId" AS "t_blockId",
t.type AS t_type,
t."timestamp" AS t_timestamp,
t."senderPublicKey" AS "t_senderPublicKey",
m."publicKey" AS "m_recipientPublicKey",
upper((t."senderId")::text) AS "t_senderId",
upper((t."recipientId")::text) AS "t_recipientId",
t.amount AS t_amount,
t.fee AS t_fee,
encode(t.signature, 'hex'::text) AS t_signature,
encode(t."signSignature", 'hex'::text) AS "t_SignSignature",
t.signatures AS t_signatures,
(( SELECT (blocks.height + 1)
FROM public.blocks
ORDER BY blocks.height DESC
LIMIT 1) - b.height) AS confirmations,
t."rowId" AS "t_rowId"
FROM ((public.trs t
LEFT JOIN public.blocks b ON (((t."blockId")::text = (b.id)::text)))
LEFT JOIN public.mem_accounts m ON (((t."recipientId")::text = (m.address)::text)));
ALTER TABLE public.trs_list OWNER TO lisk;
CREATE FUNCTION public.revert_mem_account() RETURNS trigger
LANGUAGE plpgsql
AS $$ BEGIN IF NEW."address" <> OLD."address" THEN
RAISE WARNING 'Reverting change of address from % to %', OLD."address", NEW."address";
NEW."address" = OLD."address";
END IF;
IF NEW."u_username" <> OLD."u_username" AND NEW."u_username" IS NOT NULL AND OLD."u_username" IS NOT NULL THEN
RAISE WARNING 'Reverting change of u_username from % to %', OLD."u_username", NEW."u_username";
NEW."u_username" = OLD."u_username";
END IF;
IF NEW."username" <> OLD."username" AND NEW."username" IS NOT NULL AND OLD."username" IS NOT NULL THEN
RAISE WARNING 'Reverting change of username from % to %', OLD."username", NEW."username";
NEW."username" = OLD."username";
END IF;
IF NEW."publicKey" <> OLD."publicKey" AND OLD."publicKey" IS NOT NULL THEN
RAISE WARNING 'Reverting change of publicKey from % to %', OLD."publicKey", NEW."publicKey";
NEW."publicKey" = OLD."publicKey";
END IF;
IF NEW."secondPublicKey" <> OLD."secondPublicKey" AND OLD."secondPublicKey" IS NOT NULL THEN
RAISE WARNING 'Reverting change of secondPublicKey from % to %', ENCODE(OLD."secondPublicKey", 'hex'), ENCODE(NEW."secondPublicKey", 'hex');
NEW."secondPublicKey" = OLD."secondPublicKey";
END IF;
RETURN NEW;
END $$;
ALTER FUNCTION public.revert_mem_account() OWNER TO lisk;
CREATE TRIGGER protect_mem_accounts BEFORE UPDATE ON public.mem_accounts FOR EACH ROW EXECUTE PROCEDURE public.revert_mem_account();
CREATE INDEX mem_accounts_get_delegates ON public.mem_accounts USING btree (vote DESC, "publicKey") WHERE ("isDelegate" = 1);
CREATE UNIQUE INDEX mem_accounts_publicKey ON mem_accounts ("publicKey");
CREATE INDEX signatures_publicKey ON signatures ("publicKey");
CREATE INDEX rounds_rewards_publicKey ON rounds_rewards ("publicKey");
I refactored the source code as in 2314-convert-bytea-to-varchar branch.
I started application with node app.js -n mainnet and performed following ab operations.
~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts" > acc_var_n200000_c1_unique_index.txt
17:24:50
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8000
Document Path: /api/accounts
Document Length: 3170 bytes
Concurrency Level: 1
Time taken for tests: 980.870 seconds
Complete requests: 200000
Failed requests: 123345
(Connect: 0, Receive: 0, Length: 123345, Exceptions: 0)
Keep-Alive requests: 0
Total transferred: 707581190 bytes
HTML transferred: 646581190 bytes
Requests per second: 203.90 [#/sec] (mean)
Time per request: 4.904 [ms] (mean)
Time per request: 4.904 [ms] (mean, across all concurrent requests)
Transfer rate: 704.47 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.7 0 75
Processing: 2 5 8.7 3 1126
Waiting: 2 4 8.4 3 1125
Total: 3 5 8.7 4 1126
Percentage of the requests served within a certain time (ms)
50% 4
66% 4
75% 4
80% 5
90% 7
95% 9
98% 13
99% 24
100% 1126 (longest request)
New Relic Results:

~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2" > acc_varchar_n200000_c1_publicKey_index2.txt
12:31:24
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8000
Document Path: /api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2
Document Length: 375 bytes
Concurrency Level: 1
Time taken for tests: 1234.949 seconds
Complete requests: 91335
Failed requests: 0
Keep-Alive requests: 0
Total transferred: 62107800 bytes
HTML transferred: 34250625 bytes
Requests per second: 73.96 [#/sec] (mean)
Time per request: 13.521 [ms] (mean)
Time per request: 13.521 [ms] (mean, across all concurrent requests)
Transfer rate: 49.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 9
Processing: 2 13 25.8 4 1002
Waiting: 2 13 25.4 4 908
Total: 2 13 25.8 4 1002
Percentage of the requests served within a certain time (ms)
50% 4
66% 5
75% 6
80% 7
90% 54
95% 60
98% 69
99% 78
100% 1002 (longest request)
New Relic Results:

I run a small test directly on the PostgreSQL server.
create table mem_accounts_binary as table mem_accounts;
create table mem_accounts_varchar as table mem_accounts;
alter table mem_accounts_varchar alter column "publicKey" type CHAR(64) USING ENCODE("publicKey", 'hex');
CREATE INDEX ON mem_accounts_varchar ("publicKey”);
SELECT ENCODE("publicKey", 'hex') from mem_accounts_binary;
select "publicKey" from mem_accounts_varchar;
pgbench -c 30 -T 120 -n -f ./query_with_binary.sql -f ./query_with_varchar.sql -P 5 -S lisk_main_net
SQL script 1: ./query_with_binary.sql
- weight: 1 (targets 33.3% of total)
- 236 transactions (34.6% of total, tps = 1.876072)
- latency average = 8896.888 ms
- latency stddev = 2548.701 ms
SQL script 2: ./query_with_varchar.sql
- weight: 1 (targets 33.3% of total)
- 225 transactions (33.0% of total, tps = 1.788628)
- latency average = 7164.604 ms
- latency stddev = 2209.866 ms
So its clear that ENCODE actually don't have a noticeable overhead on the query.
During a meeting between @nazarhussain @yatki and @vitaly-t we discussed and decided to keep all hexadecimal strings as BYTEA data type in database and use ENCODE/DECODE to convert those to strings. For following reasons:
CHAR(64) and making index provide faster performance on queries with filter on public key, but it then decrease overall performance of data read from PostgreSQL to NodeJS. (not sure but possibly because of sting based IO)CHAR/VARCHAR require an index on column which then turns insertion to slow compared to BYTEA typeSo in general we will follow these guidelines.
BYTEA in databaseENCODEDECODE or convert hex string to Buffer from NodeJS before passing to query. @yatki Can go through the code once to make sure we are following the above guidelines everywhere.
@nazarhussain I found out we are storing hexadecimal values as varchar in the following columns in tables. I think it's better to create a separate issue to convert this fields to bytea. What do you think ?
Table => column:
mem_round_snapshot => delegate
mem_round => delegate
mem_accounts2u_multisignatures => dependentId
mem_accounts2u_delegates => dependentId
mem_accounts2multisignatures => dependentId
mem_accounts2delegates => dependentId
@yatki Yes I agree we should create seperate issue for above fields, and close this issue.
Closing this issue, the remaning tasks are being discussed here
Most helpful comment
During a meeting between @nazarhussain @yatki and @vitaly-t we discussed and decided to keep all hexadecimal strings as
BYTEAdata type in database and useENCODE/DECODEto convert those to strings. For following reasons:CHAR(64)and making index provide faster performance on queries with filter on public key, but it then decrease overall performance of data read from PostgreSQL to NodeJS. (not sure but possibly because of sting based IO)CHAR/VARCHARrequire an index on column which then turns insertion to slow compared toBYTEAtypeSo in general we will follow these guidelines.
BYTEAin databaseENCODEDECODEor convert hex string toBufferfrom NodeJS before passing to query.@yatki Can go through the code once to make sure we are following the above guidelines everywhere.