Orientdb: orientDB 3.0.0RC1 has big performance drop in comtrast to 2.2.13

Created on 6 Mar 2018 · 27Comments · Source: orientechnologies/orientdb

OS: Windows12
Java SDK : 1.8
OrientDB version :2.2.13 and 3.0.0.RC

When I upgrade the db server from 2.2.13 to orientdb 3.0.0RC1, the result in below shows

Update rate time consumption increase 20%
Read and insert rate decrease 10~20 times. Which is big retrogress

heapsize:2.2 GB; heapmaxsize:7.1 GB; heapFreesize:1.4 GB; numCPUs:8;cpuFrequencyInHz:2400000000;physical Memory:total: 32767Mb free: 26550Mb;swap Memory:total: 37631Mb free: 8349Mb;

MAM-TEST 8 CPU/32GB RAM 450 GB HD with Win2012R2

Source

hp1975

Most helpful comment

@tglman hi, I roughly run the same test to 3.0.0RC2 on my local CentOS 7, i noticed the performance is better than on windows. how come? let me finish those tests cases, and upload the result.

hp1975 on 22 Mar 2018

👍2

All 27 comments

One Device has 60 properties; insert 1 devices = insert 63 vertexes and related Edges;
same number of actions is applied to updating

hp1975 on 6 Mar 2018

Hi, could you try on RC2 first and send us feedback.

laa on 6 Mar 2018

Hi @hp1975

We need some additional information on how you are doing the operations. Is it via SQL or via API? Remote or plocal?

Having some sample code would help a lot

Thanks

Luigi

luigidellaquila on 6 Mar 2018

@luigidellaquila any way we need to try on RC2 first before we will proceed

laa on 6 Mar 2018

I've seen a similar drop in performance during transactions with remote protocol because the entire TX, with v3.0, is flushed back to the server before to execute a query. Could this be the case? @hp1975 could you please share a code/test-case for it?

lvca on 6 Mar 2018

We did the test by SQL and create graphFactory by
graphFactory = new OrientGraphFactory("remote:orientdb-primary:3424/isc", userName, userPassword).setupPool(initCapacity, poolsize * 2);

and, you can find the example code of our testing in the attachment.
PropertyPerformanceTest.txt

hp1975 on 7 Mar 2018

My suspects are correct. With v3.0 everytime you execute a query, the open transaction is flushed to the server (to have the right consistency level).

@tglman does make any sense to introduce something like OTransaction.flushOnCommand(boolean) to disable this behavior from the user? Or, even better, playing with the consistency level already in OTransaction:

  enum ISOLATION_LEVEL {
    READ_COMMITTED, REPEATABLE_READ
  }

lvca on 8 Mar 2018

@lvca any solution to my case? and what did i make mistake in my usage ?

hp1975 on 9 Mar 2018

@laa wait a moment for the another around to test this under RC2

hp1975 on 9 Mar 2018

3.0.0RC2 even worse

jia57196 on 16 Mar 2018

@hp1975 @jia57196 is it possible compare embedded to embedded on your server? So we will check which components have a problem. Also please use not 2.2.13 but latest 2.2.x. 2.2.13 has a bug when data only partially flushed so using 2.2.13 you actually risk to lose your data.

laa on 16 Mar 2018

@hp1975 @jia57196 you could also provide us a full benchmark, but if that is not possible, I bound to ask you to perform several different runs.

laa on 18 Mar 2018

Hi @hp1975 @jia57196 this issue is really important for us but without additional data, we can not help :-(

laa on 19 Mar 2018

I provide the test result, source code to make such testing. what extra additional data you want?
what you can do is to help youself and your team.

we actually decide to move on other kind database if such bad performance cannot be solved.

hp1975 on 20 Mar 2018

@hp1975 the problem seems to rely on using SQL + TRANSACTIONS + REMOTE protocol. This combination could result in worse performance because now OrientDB ha a higher level of consistency and demands the client to flush the transaction to the server for each command.

The solution's easy, for me, but I'm waiting for @tglman for validating it. If the user chooses a lower consistency level, the TX is not flushed. That's it. And you should see better performance than 2.2.x.

lvca on 20 Mar 2018

Hi all,

By academia there is no isolation level that include not read your own write (and it make no sense anyway), what OrientDB was doing in 2.x remote transaction was only batching changes and pushing them all together to the server, while the same API in embedded worked as a proper transaction, now embedded and remote have the same behavior, what we can do is introduce an API for batch write operations that do not interact with the server and can have the same behavior int embedded, this can be done simply introducing two new methods beginBatch() and flushBatch() that can be used as "replacement" of begin() and commit().

As temporary work around we can provide a setting that disable the transactions and make them work as 2.x so migration will be smoother.

Regards

tglman on 20 Mar 2018

👍1

I like both ideas. Can you do the 2nd quickly, so @hp1975 can measure if that was the problem?

lvca on 20 Mar 2018

Hi @hp1975,

After double checking your code I saw that the transaction flush is not the cause of the problem, the transaction flush happen only if you use the new query API and the new query executor, but you still using the 2.x code style, so nothing of this is triggered.

In the past we had some issue when the client and the server was of a different version, data was inserted correctly but the where some network issues that cause the client to reconnect at every operation slowing down a lot the operations, could you double check if the server and the client are using exactly the same OrientDB version ?

Regards

tglman on 20 Mar 2018

For each round test, we use corresponding orientdb, library.
we didnot mix the version among server and client. we are very careful in this.
all tests were done in a sole server, there is no network delay or bandwitdh problem.

hp1975 on 21 Mar 2018

I see @hp1975 I will try to reproduce your benchmark in a couple of days on windows to check the cause of the issue.

laa on 21 Mar 2018

Hi @hp1975,
I got your code and re-adapted to remove your custom parts, i ran it in 3.0.0 and last 2.2.x here the results:
3.3.0:

Devices:1000;insert Time(sec/1000device):     0.497;read Time(sec/1000device):     0.005;update Time(sec/1000device):     0.507
Devices:10000;insert Time(sec/1000device):     0.234;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.376
Devices:50000;insert Time(sec/1000device):     0.203;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.304
Devices:100000;insert Time(sec/1000device):     0.197;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.312
Devices:500000;insert Time(sec/1000device):     0.202;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.324
Devices:1000000;insert Time(sec/1000device):     0.199;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.341

2.2.x:

Devices:1000;insert Time(sec/1000device):     0.381;read Time(sec/1000device):     0.007;update Time(sec/1000device):     1.095
Devices:10000;insert Time(sec/1000device):     0.226;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.633
Devices:50000;insert Time(sec/1000device):     0.189;read Time(sec/1000device):     0.001;update Time(sec/1000device):     0.565
Devices:100000;insert Time(sec/1000device):     0.193;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.556
Devices:500000;insert Time(sec/1000device):     0.199;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.464
Devices:1000000;insert Time(sec/1000device):     0.204;read Time(sec/1000device):     0.002;update Time(sec/1000device):     0.554

I do not see huge differences between the two versions, anyway my environment is Linux (this should not make any difference) and I run the test in localhost (this may make a difference), did you run your test in localhost?

Would be cool if we can go through with you in the environment setup, do you have a support contract ?
I also noticed some miss usage of the orient API here and there that is better to correct as well in you existing application.

Regards

tglman on 21 Mar 2018

👍1

As I said in begining of this testing, one device has more than 50 properties which linked with EDGE. insertion will do insert device and plus those 50 properties,
I agree, if you just insert one device, the difference looks minor. but, be attention to those properties also to be inserted and udpated as well.

I run those testing in local machine, windows MAM-TEST 8 CPU/32GB RAM 450 GB HD with Win2012R2,
client and server were in same machine.

hp1975 on 22 Mar 2018

@tglman you can reach me haipeng.[email protected] on skype.

hp1975 on 22 Mar 2018

@tglman please add me to the chat too.

laa on 22 Mar 2018

@tglman hi, I roughly run the same test to 3.0.0RC2 on my local CentOS 7, i noticed the performance is better than on windows. how come? let me finish those tests cases, and upload the result.

hp1975 on 22 Mar 2018

👍2

@hp1975 could you provide a log of the server start on Windows server?

laa on 6 Apr 2018

Hi @hp1975,

I double check the code again ,and actually it was not correct for a couple of reasons, first the command did not have the execute so in a few case the query was not running at all, as well in 2.2.x the SQL in remote do not support transactions so the begin/commit has no effect(in 2.2.x).

Regards

tglman on 16 Apr 2018

Was this page helpful?

0 / 5 - 0 ratings